Oracle Context Option Administrator's Guide Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index



Go to previous file in sequence Go to next file in sequence

CHAPTER 5. Setting Up and Managing Text


This chapter provides details on how to use the command-line to set up and maintain text in ConText Option.

The process of administering text in a ConText Option system comprises the following tasks:

Note: Most of the text administration tasks can be performed from the administration tool. These tasks are noted throughout the chapter.

Loading Text

This section provides instructions for performing the following text management tasks from the command-line:

Using ctxload

Use ctxload to load text from a load file or from separate text files into the database. An example of the ctxload syntax from the command-line is:

	ctxload -user jsmith/welcome -name MY_DOCS \
-file docs.txt -log docload.log

In this example, the user's userid and password are 'jsmith/welcome'. Because the -thes option wasn't specified, by default, ctxload is used to load documents into a database table. The table to which the documents will be loaded is MY_DOCS and the load file being used is docs.txt.

In addition, this example generates a log file named docload.log.

For a complete description of ctxload requirements and options, as well as the structure and syntax of the load file, see "ctxload" in "Executables and Utilities (Chapter 8)".

Using ConText Servers

ConText servers can be used to automatically load text from operating system files as the files are created/copied into a directory. To use ConText servers to automatically load text from directory files into a column, perform the following tasks:

		begin
		ctx_ddl.set_attribute('DIRECTORIES', '/private/docs');
		ctx_ddl.create_preference('READER_PREF',
			'Directory scan for /private/docs',
			'DIRECTORY READER');
		end;

Note: If the files are already in the required format, this step can be skipped and the default Translator preference can be used.

		begin
		ctx_ddl.create_source('MY_SOURCE',
			'TEXT'
			'simple source for /private/docs directory',
			'reader_pref =>'READER_PREF');
		end;

		$ ctxsrv -user ctxsys/passwd -personality R &

Generating Document Textkeys

Each document loaded into a table must be assigned a value in the primary key column of the table. This value serves as the textkey for the document.

Textkeys can be assigned using the following methods:

Embedding Textkeys in Load File

To manually embed textkey values for a document in the load file, in each document header, create an entry which specifies the name of the primary key column in the table and the textkey value to be assigned to the document.

For example

	. . .
	<TEXTSTART: PK=1000, TITLE='DOC 1000'>
	DOC1000.TXT
	<TEXTEND>
	<TEXTSTART: PK=1001, TITLE='DOC 10001'>
	DOC1001.TXT
	<TEXTEND>
	. . .

In this example, the load file contains pointers to separate text files (DOC1000.TXT and DOC1001.TXT), rather than the text for each document. The primary key column for the table is PK and the values specified for PK are the values loaded into the primary key column when ctxload is run.

For a complete description of the structure and syntax of the load file, see "ctxload" in "Executables and Utilities (Chapter 8)".

Generating Textkeys Using Sequences and Triggers

To automatically generate textkey values for each document loaded into a table, create a trigger and sequence for the table.

The sequence generates unique values. The trigger calls the sequence each time a row (document) is loaded into the table and stores the textkey value in the primary key column for the table.

For example:

	create sequence doc_seq;
	create trigger doc_trigger
	 before insert on DOCS
	 for each row 
	 BEGIN 
	  select docs_seq.nextval into :new.pk from dual; 
	 END;

In this example, a sequence named DOC_SEQ and a trigger named DOC_TRIGGER are created for a table in which the primary key column is PK.

DOC_TRIGGER specifies that the next available value generated by DOC_SEQ is inserted into the DOCS table before each new row is inserted into the table.

Managing Preferences

This section provides details for using the CTX_DDL PL/SQL package to create and delete the preferences that you use for indexing in ConText Option.

In addition, this section describes how to create the following types of preferences:

Note: Do not use PL/SQL and SQL reserved words as the names of preferences.

In addition, certain words, such as ASCII, HTML, BLASTER, and FILTER, are used internally by ConText Option and, consequently, should not be used by themselves as preference names; however, they can be combined with other words to create descriptive names, such as 'HTML_FILTER_PREF'.

Creating a Preference

To create a preference in the ConText data dictionary, use the CTX_DDL.SET_ATTRIBUTE and CTX_DDL.CREATE_PREFERENCE PL/SQL procedures.

For example:

	begin
	ctx_ddl.set_attribute ('PATH',
	                       '/public/doc1:/public/doc2');
	ctx_ddl.create_preference ('PUB_DOCS',
	                           'Docs stored in files',
	                           'OSFILE');
	end;

Note: CTX_DDL.CREATE_PREFERENCE must be called immediately after CTX_DDL.SET_ATTRIBUTE to assign the specified attribute(s) to the preference that you are creating.

Also, errors that result from incorrect attribute values are not reported until CREATE_PREFERENCE is called.

In this example, PATH is the name of the Tile attribute (OSFILE Tile) to which you are assigning a value and /public/doc1:/public/doc2 is the value assigned to the PATH attribute. The multiple paths, separated by a colon ':', indicates the directories in which the external files are located.

PUB_DOCS is the name of the preference to be created and OSFILE specifies the Tile for the preference.

Specifying Multiple Values for Attributes

If you want to assign more than one value to a Tile attribute, you must call CTX_DDL.SET_ATTRIBUTE separately for each value that you want to set.

For example, using the GENERIC STOPWORD Tile, you can specify a list of words that you do not want ConText Option to index during text indexing. For each word that you want to include on the list, you must call CTX_DDL.SET_ATTRIBUTE with the stop word assigned to the STOP_WORD attribute.

Creating an Engine Preference

One of the most important preferences you will create is an Engine preference. In an Engine preference for a policy, you can specify the amount of indexing memory allocated for the column in the policy, as well as the storage clauses used for the automatically-generated tables that store the components of a index.

Because index strings for indexed words are built in memory before they are saved to the index tables, it is vital that you allocate as much indexing memory as possible to avoid excessive index fragmentation.

When you create an Engine preference, you use the INDEX_MEMORY attribute for the GENERIC ENGINE Tile to allocate indexing memory.

Suggestion: To ensure the best results for indexing, calculate the total amount of real memory (not virtual memory) available on the machine which will be used to create the index, then specify this amount when you create an Engine preference.

If you plan to use parallel indexing, the memory specified for the Engine preference should be the amount of real memory available divided evenly among the number of ConText servers that will perform the indexing in parallel.

For example, if you are going to use three ConText servers in parallel to create a index for a column and you have 100 Mb of memory available on the machine on which the servers will be running, you should create an Engine preference with an INDEX_MEMORY of 33 Mb, then specify the preference in the policy for the column.

For more information about creating policies, see "Managing Policies" in this chapter.

Creating a Stoplist Preference

A Stoplist preference is created by calling CTX_DDL.SET_ATTRIBUTE separately for each stop word in the list.

To define a Stoplist:

Note: The maximum number of terms (stop words) that a Stoplist preference can contain is 4095.

For example:

	begin
	ctx_ddl.set_attribute('STOP_WORD', 'OF', 1);
	ctx_ddl.set_attribute('STOP_WORD', 'TO', 2);
	ctx_ddl.set_attribute('STOP_WORD', 'A', 3);
	. . .
	. . .
	. . .
	ctx_ddl.set_attribute('STOP_WORD', 'NO', 90);
	ctx_ddl.set_attribute('STOP_WORD', 'ONLY', 91);
	ctx_ddl.set_attribute('STOP_WORD', 'SO', 92);
	ctx_ddl.set_attribute('STOP_WORD', 'MOST', 93);
	ctx_ddl.set_attribute('STOP_WORD', 'BANK', 94);
	ctx_ddl.set_attribute('STOP_WORD', 'MAY', 95);
	ctx_ddl.set_attribute('STOP_WORD', 'INTO', 96);
	ctx_ddl.set_attribute('STOP_WORD', 'ANY', 97);
	ctx_ddl.set_attribute('STOP_WORD', 'GOVERNMENT', 98);
	ctx_ddl.create_preference('MY_STOPLIST',
	                          'My list of stop words',
	                          'GENERIC STOP LIST');
	end;

Creating a Theme Lexer Preference

If you have English-language documents in a column and you want to create a theme index for the column to enable theme queries, you need to first create a Lexer preference that calls the theme lexer, then include the preference in the policy definition for the column.

Note: In general, you do not need to create a Theme Lexer preference; you can use the THEME_LEXER predefined preference provided by ConText Option.

To create your own theme lexer preference, call CTX_DDL.CREATE_PREFERENCE and specify the THEME LEXER Tile.

For example:

	begin
	ctx_ddl.create_preference ('MY_THEME_PREF',
	                           'Pref for theme indexes',
	                           'THEME LEXER');
	end;

Note: The THEME LEXER Tile does not have any attributes, so you do not have to set any attributes before calling CREATE_PREFERENCE.

For more information about creating a policy that uses the theme lexer preference, see "Creating a Theme Indexing Policy" in this chapter.

Creating Filter Preferences

When creating Filter preferences, the following issues determine which Tiles and attributes you use, as well as the values that you specify for each attribute:

For a complete list of the Filter Tiles and attributes, see "Tiles, Tile Attributes, and Attribute Values" in "ConText Data Dictionary (Chapter 9)."

Creating a Filter Preference for Internal Filters

Single-Format Columns:

For a single-format column using one of the internal filters, create a Filter preference that sets the FORMAT attribute (BLASTER FILTER Tile) to the format used in your column.

The following example illustrates creating a Filter preference for a column that contains documents only in MS Word for Windows format:

    begin
    ctx_ddl.set_attribute('FORMAT','11')
    ctx_ddl.create_preference('WP6_FILT',
                              'WP6 filter',
                              'BLASTER FILTER');
    end;

Multiple-Format Columns:

For multiple-format columns using internal filters, create a Filter preference that sets the FORMAT attribute (BLASTER FILTER Tile) for the Autorecognize filter:

    begin
    ctx_ddl.set_attribute('FORMAT','997')
    ctx_ddl.create_preference('MULTI_FILT',
                              'multiple internal filters',
                              'BLASTER FILTER');
    end;

Creating a Filter Preference for External Filters

Note: Before a Filter preference that uses external filters can be created, one or more filter executables must be created and stored in the bin directory of the ConText Option directory in your Oracle home directory.

You can choose to create your own external filter executables or use the executables provided with ConText Option.

Single-Format Columns:

For a single-format column that uses external filters, create a Filter preference that uses the COMMAND attribute (USER FILTER Tile) to specify the executable for the format used in your column.

The following example illustrates creating a Filter preference for a column that contains documents only in AMIPRO format and uses a filter executable named amipro.exe:

    begin
    ctx_ddl.set_attribute('COMMAND','amipro.exe')
    ctx_ddl.create_preference('AMIPRO_FILT',
                              'amipro external filter',
                              'USER FILTER');
    end;

Multiple-Format Columns:

For a multiple-format column that uses external filters only or external and internal filters, create a Filter preference that sets the EXECUTABLE attribute (BLASTER FILTER Tile) once for each of the external filters you want to use in your column.

Note: The EXECUTABLE attribute requires that you specify a format code which identifies the document format supported by the filter executable.

For a complete list of format codes for document formats, see "Supported Formats for Multiple-Format Columns" in "ConText Data Dictionary (Chapter 9)."

The following example illustrates creating a Filter preference for a column that contains documents in AMIPRO, Adobe Acrobat, and WordPerfect 6.0 formats:

    begin
    ctx_ddl.set_attribute('EXECUTABLE', 19,'amipro.sh')
    ctx_ddl.set_attribute('EXECUTABLE', 57,'acrobat.sh')
    ctx_ddl.create_preference('MULT_FILT',
                              'multiple ext/int filters',
                              'BLASTER FILTER');
    end;

Note: It is not necessary to explicitly specify the filter executable for WordPerfect 6.0, because ConText Option provides an internal filter for WordPerfect 6.0.

When the Filter preference for a column uses the EXECUTABLE attribute (BLASTER FILTER Tile), ConText Option uses internal filters for all supported formats, unless an external filter is explicitly specified in the preference.

Deleting a Preference

To delete a preference from the ConText data dictionary, use the PL/SQL procedure CTX_DDL.DROP_PREFERENCE.

For example:

	execute ctx_ddl.drop_preference('PUB_DOCS')

To use DROP_PREFERENCE, you only need to specify the name (in this example, PUB_DOCS) of the preference that you want to drop.

Note: If a preference is used in a policy, the policy must be deleted from the ConText data dictionary before the preference can be deleted.

Managing Policies

This section provides details for using the CTX_DDL PL/SQL package to create, update, and drop column policies for database columns.

In addition, this section describes how to create the following types of policies:

Note: Do not use PL/SQL and SQL reserved words as the names of policies.

In addition, certain words, such as ASCII, HTML, BLASTER, and FILTER, are used internally by ConText Option and, consequently, should not be used by themselves as policy names; however, they can be combined with other words to create descriptive names, such as 'ASCII_INDEXING_POLICY'.

Creating a Column Policy

To create a column policy, use the PL/SQL procedure CTX_DDL.CREATE_POLICY.

For example:

	begin
	ctx_ddl.create_policy (policy_name  => 'DOC_POL',
	                    colspec   => 'DOCS.ARTICLE',
	                    textkey   => 'PK',
	                    dstore_pref   => 'PUB_DOCS',
	                    engine_pref   => 'DOC_ENGINE',
	                    filter_pref   => 'WORD6',
	                    lexer_pref    => 'DOC_LINK',
	                    wordlist_pref => 'CTXSYS.SOUNDEX',
	                    stoplist_pref => 'MINI_STOP_LIST');
 	end;

In this example, the name of the policy is DOC_POL. The policy does not have a description, nor does it use a source policy or specify a master-detail relationship for storing text.

Note: In a policy, you can specify your preferences as well as preferences owned by others (including CTXSYS). For example. to specify a preference owned by CTXSYS, use the following syntax: CTXSYS.pref_name.

In addition, if you specify a source policy in a policy, you can specify either your template policies or the CTXSYS-owned template policies using the syntax above.

The text column is ARTICLE in the DOCS table, owned by CTXSYS. The textkey for the text column is PK.

Note: The column name in COLSPEC must include the table name, using the following syntax: table_name.column_name.

The predefined preference SOUNDEX is specified as the Wordlist preference for the policy. Because the predefined preference is owned by CTXSYS, the fully-qualified name of the preference is included.

Creating a Theme Indexing Policy

To create a theme indexing policy, use the PL/SQL procedure CTX_DDL.CREATE_POLICY and specify either your own theme lexer preference or the predefined preference (THEME_LEXER) provided by ConText Option.

The following example illustrates how to create a policy identical to the previous policy example, except that MY_THEME_PREF, the theme lexer preference created in the theme lexer example, is used in place of DOC_LINK, which is a preference that calls the basic (indexing) lexer:

	begin
	ctx_ddl.create_policy (policy_name  => 'DOC_POL',
	                    colspec   => 'DOCS.ARTICLE',
	                    textkey   => 'PK',
	                    dstore_pref   => 'PUB_DOCS',
	                    engine_pref   => 'DOC_ENGINE',
	                    filter_pref   => 'WORD6',
	                    lexer_pref    => 'MY_THEME_PREF',
	                    wordlist_pref => 'CTXSYS.SOUNDEX',
	                    stoplist_pref => 'MINI_STOP_LIST');
 	end;

When index creation is requested for this theme indexing policy, the theme lexer generates a theme index that can be used to perform theme queries.

Using Composite Textkeys in a Policy

To create a policy that uses a composite textkey, use the PL/SQL procedure CTX_DDL.CREATE_POLICY; however, when you specify the textkey for the column, reference each of the primary or unique key columns (up to 16) that constitute the composite textkey for the column.

For example:

	begin
	ctx_ddl.create_policy (policy_name  => 'DOC_POL',
	                    colspec   => 'DOCS.ARTICLE',
	                    textkey   => 'AUTH,TITLE',
	                    dstore_pref   => 'PUB_DOCS',
	                    engine_pref   => 'DOC_ENGINE',
	                    filter_pref   => 'WORD6',
	                    lexer_pref    => 'DOC_LINK',
	                    wordlist_pref => 'CTXSYS.SOUNDEX',
	                    stoplist_pref => 'MINI_STOP_LIST');
 	end;

In this example, the textkey for the ARTICLES column is a composite textkey consisting of the columns AUTH and TITLE in the DOCS tables. The names of the textkey columns are separated by commas and are registered in the ConText data dictionary in the order in which they are specified.

Note: There is a 256 character limit, including the comma separators, on the combined length of the column names in a composite textkey.

Also, there is a 256 character limit on the combined length of the columns in a composite textkey.

For more information about these limits, see "Textkeys" in "Text Concepts (Chapter 3)."

Creating a Template Policy

To create a template policy, use the PL/SQL procedure CTX_DDL.CREATE_TEMPLATE_POLICY.

For example:

	begin
	ctx_ddl.create_template_policy (
	                    policy_name  => 'TEMPLATE_POL',
	                    dstore_pref   => 'PUB_DOCS',
	                    engine_pref   => 'DOC_ENGINE',
	                    filter_pref   => 'WORD6',
	                    lexer_pref    => 'DOC_LINK',
	                    wordlist_pref => 'SOUNDEX_YES',
	                    stoplist_pref => 'MINI_STOP_LIST');
 	end;

In this example, the name of the policy is TEMPLATE_POL. The preferences for the policy are as specified above. If TEMPLATE_POL is specified as a source policy when creating a new policy, the preferences for TEMPLATE_POL are copied to the new policy.

Note: You can also use CTX_DDL.CREATE_POLICY to create a template policy; however, when you call CREATE_POLICY, do not specify a value for colspec.

Modifying a Policy

To modify a policy in the ConText data dictionary, use the PL/SQL procedure CTX_DDL.UPDATE_POLICY.

Note: If a policy has been used to create a index for the text column in the policy, the index must be dropped before the policy can be updated.

In addition, you cannot modify the attributes for a policy; you can only modify the description and preferences for a policy.

For example:

	begin
	ctx_ddl.update_policy (policy_name  => 'DOC_POL',
                     filter_pref   => 'HTML_DOC',
                     wordlist_pref => 'CTXSYS.NO_SOUNDEX');
	end;

In this example, the preference HTML_DOC for the Filter category replaces the existing preference for the category, while the preference SOUNDEX_NO for the Wordlist category replaces the existing preference for Wordlist category.

Deleting a Policy

To delete a policy from the ConText data dictionary, use the PL/SQL procedure CTX_DDL.DROP_POLICY.

For example:

	execute ctx_ddl.drop_policy ('DOC_POL')

To use DROP_POLICY, you only need to specify the name (DOC_POL) of the policy that you want to drop.

Note: If a policy has been used to create a index for the text column in the policy, the index must be deleted before the policy can be deleted.

Managing Indexes

This section provides details for using the CTX_DDL PL/SQL package to perform the following indexing tasks:

Creating an Index

To create an index in the ConText data dictionary, use the CTX_DDL.CREATE_INDEX procedure.

The only argument required for CREATE_INDEX is the name of the policy for the text column to be indexed.

For example:

	execute ctx_ddl.create_index('DOC_POL')

In this example, CREATE_INDEX creates an index for the text column defined in the DOC_POL policy.

Note: During indexing, ConText Option creates Oracle indexes for the index tables using the temporary tablespace for CTXSYS. To ensure successful creation of the Oracle indexes, the temporary tablespace for CTXSYS must have enough space to store the temporary segments used in creating the indexes.

The temporary tablespace for CTXSYS is defined during installation of ConText Option. For more information about defining the temporary tablespace for CTXSYS, see the ConText Option installation documentation specific to your operating system.

Using Parallel Indexing

You can optionally include a numeric value in the argument string for CTX_DDL.CREATE_INDEX to specify the number of ConText servers to use for parallel indexing.

Note: The value you specify cannot exceed the number of ConText servers currently running with the DDL personality. If you specify more ConText servers than the number of servers running, CREATE_INDEX will not execute.

For example:

	execute ctx_ddl.create_index('DOC_POL', 2)

In this example, CREATE_INDEX uses any two available ConText servers with the DDL personality to create an index in parallel for the text column defined in the DOC_POL policy.

Note: Indexing in parallel does not automatically cause the the Oracle indexes on the index tables to be created in parallel.

To have the Oracle7 Server create Oracle indexes in parallel, the parallel option for Oracle7 must be installed. In addition, the I1I_OTHER_PARAMS attribute must be set for the Engine preference in the policy for the column.

For example:

	begin
	ctx_ddl.set_attribute('I1I_OTHER_PARMS', ' PARALLEL 4');
	ctx_ddl.create_preference('PAR_INDEX',
                              'Parallel indexing x 4',
                              'GENERIC ENGINE');
	end;

In this example, an Engine preference named PAR_INDEX is created with a parallel parameter of 4 for I1I_OTHER_PARAMS. If PAR_INDEX is used in a policy, when an index is created for the policy, four Oracle server processes create the Oracle indexes for the index in parallel.

Indexing Existing Columns (Hot Upgrade)

ConText Option does not require you to create new tables or modify existing tables to create indexes for text already stored in a database. If you already have text stored in a column in an existing database, you can use ConText Option to index the text in the column without changing the structure of the table itself. Once the column has an index, queries can be submitted against the column.

The only requirements are:

The procedure for indexing an existing text column is identical to the procedure for indexing a new text column:

Updating an Index

Once an index is created for a text column, ConText Option automatically updates the index each time a document (row) is added, deleted, or modified in the table.

In addition, the index can be manually updated for a single document using CTX_DML.REINDEX.

Dropping an Index

To drop an existing index from the data dictionary, use the CTX_DDL.DROP_INDEX PL/SQL procedure .

For example:

	execute ctx_ddl.drop_index ('DOC_POL')

In this example, the index and associated tables for DOC_POL are deleted from the database. If you wanted to perform subsequent text queries against the text column for DOC_POL, the index for the column in DOC_POL must be recreated using CTX_DDL.CREATE_INDEX.

Optimizing an Index

Index optimization can be used to help reduce the size of indexes, as well as update indexes to reflect deleted/modified documents.

To optimize an index in the data dictionary, use the PL/SQL procedure, CTX_DDL.OPTIMIZE_INDEX.

For example:

	execute ctx_ddl.optimize_index('DOC_POL', \
ctx_ddl.defragment_to_new_table);

In this example, the optimization method used for the index for DOC_POL is DEFRAGMENT_TO_NEW_TABLE, which uses a second, mirror index table to compact the index fragments for all indexed terms with multiple fragments and remove references from the index strings for all deleted/modified documents.

Resuming Index Creation/Optimization

If an index operation (creation or optimization) fails, you can use the the PL/SQL procedure CTX_DDL.RESUME_FAILED_INDEX to resume the operation once the reason for the failure has been determined and corrected/removed.

For example:

	execute ctx_ddl.resume_failed_index('DOC_POL')

In this example, the text DDL operation is the default (index creation) and is resumed for the text column for the DOC_POL policy.

You can also choose to start the index operation from the beginning using CTX_DDL.CREATE_INDEX or CTX_DDL.OPTIMIZE_INDEX.

You can view the index log in the administration tool to determine when and where the index operation failed.

The log also can be used to determine whether to resume the operation or simply start the operation over, based on the stage at which the operation failed and/or the percentage of the operation completed before failure.




Go to previous file in sequence Go to next file in sequence
Prev Next
Oracle
Copyright © 1996 Oracle Corporation.
All Rights Reserved.
Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index