Oracle Context Option Administrator's Guide Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index



Go to previous file in sequence Go to next file in sequence

CHAPTER 9. ConText Data Dictionary


This chapter provides reference information for the ConText data dictionary objects provided with ConText Option.

The topics discussed in this chapter are:

Tiles, Tile Attributes, and Attribute Values

The following section lists all of the Tiles for which you can create preferences, as well as the attributes and values that you can assign to each Tile in a preference. In addition, a brief description of the Tile attributes and examples are provided.

The Tiles are grouped by preference category:

Data Store

Filter

Lexer

Engine

Wordlist

Stoplist

Reader (Text Loading)

Translator (Text Loading)

Engine (Text Loading)

Data Store

The Data Store category contains the following Tiles:

Tiles Attributes Attribute Values
DIRECT ** none ** N/A
MASTER DETAIL BINARY 0 (plain text)
1 (binary text)
OSFILE PATH path1:path2:...:pathn
URL TIMEOUT seconds (0 to 3600, default 30)
MAXTHREADS thread_num (0 to 1024, default 8)
MAXURLS buffer_length (1 to 231-1, default 256)
URLSIZE URL_length (32 to 65535, default 256)
MAXDOCSIZE doc_size (256 to 231-1, default 2000000)
HTTP_PROXY host_name
NO_PROXY string (up to 16 strings, separated by commas)
Table 9 - 1. Data Store Tiles (Page 1 of 1)



MASTER DETAIL Tile Attribute(s)

The BINARY attribute specifies whether the text in a master detail table is in binary format (1) or plain text format (0).

Text in binary format does not use newline characters to indicate the end of the line. Plain text uses newline characters at the end of each line to indicate the end of the line.

OSFILE Tile Attribute(s)

The PATH attribute specifies the location of text files that are stored externally in a file system.

Multiple paths can be specified for the PATH attribute, with the paths separated by a colon (:). File names are stored in the text column in the text table. If you do not use the PATH attribute to specify a path for external files, ConText Option requires the path to be included in the file names stored in the text column.

Note: If text is stored in external files rather than in a database, the files must be accessible from the host machine on which the ConText server is running. This can be accomplished by storing the files in the file system for the host machine or by mounting the file system where the files are stored to the host machine.

URL Tile Attribute(s)

The TIMEOUT attribute specifies the length of time, in seconds, that a network operation such as 'connect' or 'read' waits before timing out and returning a timeout error to the application. The valid range for TIMEOUT is 0 to 3600 and the default is 30.

Note: Since timeout is at the network operation level, the total timeout may be longer than the time specified for TIMEOUT.

The MAXTHREADS attribute specifies the maximum number of threads that can be running at the same time. The valid range for MAXTHREADS is 1 to 1024 and the default is 8.

Note: The upper range of MAXTHREADS corresponds to the number of file descriptors that the operating system can process at one time. If the number of files your operating system can process at one time is less than the value you set, you may receive an invalid socket error.

The MAXURLS attribute specifies the maximum number of rows that the internal buffer can hold for HTML documents (rows) retrieved from the text table. The valid range for MAXURLS is 1 to (231-1) and the default is 256.

The URLSIZE attribute specifies the maximum length, in bytes, that the URL data store supports for URLs stored in the database. If a URL is over the maximum set, an error is returned. The valid range for URLSIZE is 32 to 65535 and the default is 256.

The MAXDOCSIZE attribute specifies the maximum size, in bytes, that the URL data store supports for accessing HTML documents whose URLs are stored in the database. The valid range for MAXDOCSIZE is 1 to (231-1) and the default is 200000 (2 Mb).

The HTTP_PROXY attribute specifies the fully-qualified name of the host machine that serves as the proxy (gateway) for the machine on which ConText Option is installed.

The NO_PROXY attribute specifies the strings (up to sixteen, separate by commas) which, when encountered in a host name, cause the URL data store to ignore the machine as a proxy machine.

For example, if the string 'us.oracle.com, uk.oracle.com' is entered for NO_PROXY, any machines that contain either of these domains in their host names are ignored as proxy machines.

Data Store Example

The following example creates a preference named DOC_PREF for the OSFILE Tile:

begin
ctx_ddl.set_attribute     ('PATH', '/private/mydocs');
ctx_ddl.create_preference ('DOC_PREF', 'Path my for my documents' 'OSFILE');
end;

Note: This example illustrates usage of OSFILE for documents stored on a machine running a UNIX-based operating system.

Filter

The Filter category contains the following Tiles:

Tiles Attributes Attribute Values
FILTER NOP ** none ** N/A
HTML FILTER CODE_ CONVERSION 0 (conversion disabled)
1 (conversion enabled)
USER FILTER COMMAND filter_executable
BLASTER FILTER EXECUTABLE format_id, filter_executable, sequence
FORMAT 0 or 999 No filter (ASCII)
1 or 4 Word Perfect for Windows 5.x,
Word Perfect for DOS 5.0, 5.1
5 Word Perfect for Windows 6.x,
Word Perfect for DOS 6.0
2 MS Word for DOS 5.0, 5.5
6 MS Word for Mac 3, 4, 5.x
7 MS Word for Windows 2
11 MS Word for Windows 6.x, 7.0
8 AMIPRO for Windows 1, 2, 3
9 Lotus 1-2-3 for Windows 2, 3, 4, 5
Lotus 1-2-3 for DOS 4, 5
13 Xerox XIF for UNIX 5, 6
997 Autorecognize
Table 9 - 2. Filter Tiles (Page 1 of 1)



Note: If you use the USER FILTER Tile or the EXECUTABLE attribute (BLASTER Tile) to specify external filters for indexing and viewing text, the specified filter executable must be stored in the bin subdirectory in the ctx directory in your Oracle home directory.

For example, in a UNIX-based operating system, all filter executables must be stored in $ORACLE_HOME/ctx/bin.

HTML FILTER Tile Attribute(s)

The CODE_CONVERSION attribute specifies whether code conversion is enabled for documents which contain Japanese ASCII text with HTML formatting.

Code conversion is required for Japanese HTML documents if the documents use more than one of the three character sets supported for HTML text in Japanese. If code conversion is enabled, all Japanese HTML documents are converted to a single, common character set before indexing.

The default for CODE_CONVERSION is 0 (not enabled).

Note: For multiple-format columns that use Autorecognize (BLASTER Tile, FORMAT attribute = 997) or use external filters (BLASTER Tile, EXECUTABLE attribute) for all formats except HTML, code conversion is always enabled.

USER FILTER Tile Attributes(s)

The COMMAND attribute specifies the executable for the external filter used to filter all text stored in a column. If more than one document format is stored in the column, the external filter must recognize and handle all such formats.

BLASTER FILTER Tile Attribute(s)

The FORMAT attribute specifies the internal filter used for filtering text stored in a text column.

The EXECUTABLE attribute specifies the external filters that are used to filter text stored in a multiple-format text column. It has three values that must be specified:

Note: You cannot set both the FORMAT and EXECUTABLE attributes in a preference.

For a list of the format codes supported by the EXECUTABLE attribute, see "Supported Formats for Multiple-Format Columns" in this chapter.

Filter Example

The following example creates a preference named WORD6 for the BLASTER FILTER Tile:

begin
ctx_ddl.set_attribute     ('FORMAT', '11');
ctx_ddl.create_preference ('WORD6', 'Microsoft Word docs', 'BLASTER FILTER');
end;

Lexer

The Lexer category contains the following Tiles:

Tiles Attributes Attribute Values
BASIC LEXER PUNCTUATIONS character_string
PRINTJOINS character_string
SKIPJOINS character_string
NUMJOIN character_string
NUMGROUP character_string
CONTINUATION character_string
BASE_LETTER 0 (Disabled)
1 (Enabled)
THEME LEXER ** none ** N/A
JAPANESE V-GRAM LEXER KANJI_INDEXING 1
2
CHINESE V-GRAM LEXER HANZI_INDEXING 1
2
KOREAN LEXER ** none ** N/A
Table 9 - 3. Lexer Tiles (Page 1 of 1)



Note: The character_string for each BASIC LEXER Tile attribute can contain multiple characters. Each character in the string serves as a punctuation, join, or continuation character.

For example, if the string '.?!' is specified for the PUNCTUATIONS attribute, each individual character ('.', '?', '!') in the string is treated by ConText Option as a sentence delimiter.

BASIC LEXER Tile Attribute(s)

PUNCTUATIONS specifies the characters that indicate the end of a sentence.

PRINTJOINS specifies the characters that join words together when they appear between the words with no blank spaces. Words that contain PRINTJOINS characters are stored in the text index exactly as they appear in the text. For example, if you define '-' as a PRINTJOINS character, the word pseudo-intellectual is stored in the text index as pseudo-intellectual.

SKIPJOINS specifies the characters that join words together, but the characters are not stored in the text index. For example, if you define '-' as a SKIPJOINS character, the word pseudo-intellectual is stored in the text index as pseudointellectual.

Note: PRINTJOINS and SKIPJOINS are mutually exclusive. You cannot specify the same characters for both attributes.

NUMJOIN specifies the characters that, when they appear in a string of digits, cause ConText Option to index the string of digits as a single unit or word. For example, a period '.' may be defined as a NUMJOIN character because it often serves as a decimal point when it appears in a string of digits.

NUMGROUP specifies the characters that, when they appear in a string of digits, indicate that the digits are groupings within a larger single unit. For example, a comma ',' may be defined as a NUMGROUP character because it often indicates a grouping of thousands when it appears in a string of digits.

Note: The default values for NUMJOIN and NUMGROUP are determined by the NLS initialization parameters that are specified for the database. In general, you do not need to specify a value for either NUMJOIN or NUMGROUP when creating a Lexer preference for the BASIC LEXER Tile.

CONTINUATION specifies the characters that indicate a word continues on the next line. The most common CONTINUATION characters are a hyphen '-' and a backslash '\'.

BASE_LETTER specifies whether characters that have diacritical marks (umlats, cedillas, acute accents, etc.) are converted to their base form for text indexing and text queries.

JAPANESE V-GRAM LEXER Tile Attribute(s)

The KANJI_INDEXING attribute specifies the length of the character groups used for pattern matching while indexing.

A value of 1 for KANJI_INDEXING indicates that the Japanese lexer examines each character individually to determine token boundaries.

A value of 2 for KANJI_INDEXING indicates that the lexer examines characters in pairs to determine token boundaries.

The default is 2.

CHINESE V-GRAM LEXER Tile Attribute(s)

The HANZI_INDEXING attribute specifies the length of the character groups used for pattern matching while indexing.

A value of 1 for HANZI_INDEXING indicates that the Japanese lexer examines each character individually to determine token boundaries.

A value of 2 for HANZI_INDEXING indicates that the lexer examines characters in pairs to determine token boundaries.

The default is 2.

Lexer Example

The following example creates a preference named DOC_LINK for the BASIC LEXER Tile:

begin
ctx_ddl.Set_attribute     ('PRINTJOINS', '-*/');    
ctx_ddl.create_preference ('DOC_LINK', 'Dash, star, slash', 'BASIC LEXER' );
end;

Engine

The Engine category contains the following Tiles:

Tiles Attributes Attribute Values
GENERIC ENGINE INDEX_MEMORY integer (memory in bytes)
OPTIMIZE_DEFAULT opt_type
I1T_TABLESPACE
I1T_STORAGE
I1T_OTHER_PARMS
text_index_tablespace
text_index_storage_parameters
text_index_other_parameters
I1I_TABLESPACE
I1I_STORAGE
I1I_OTHER_PARMS
index_tablespace
index_storage_parameters
index_other_parameters
KTB_TABLESPACE
KTB_STORAGE
KTB_OTHER_PARMS
text_index_tablespace
text_index_storage_parameters
text_index_other_parameters
KID_TABLESPACE
KID_STORAGE
KID_OTHER_PARMS
KIK_TABLESPACE
KIK_STORAGE
KIK_OTHER_PARMS
index_tablespace
index_storage_parameters
index_other_parameters
index_tablespace
index_storage_parameters
index_other_parameters
LST_TABLESPACE
LST_STORAGE
LST_OTHER_PARMS
text_index_tablespace_name
text_index_storage_parameters
text_index_other_parameters
LIX_TABLESPACE
LIX_STORAGE
LIX_OTHER_PARMS
index_tablespace
index_storage_parameters
index_other_parameters
SQR_TABLESPACE
SQR_STORAGE
SQR_OTHER_PARMS
text_index_tablespace
text_index_storage_parameters
text_index_other_parameters
SRI_TABLESPACE
SRI_STORAGE
SRI_OTHER_PARMS
index_tablespace
index_storage_parameters
index_other_parameters
Table 9 - 4. Engine Tiles (Page 1 of 1)



GENERIC ENGINE Tile Attribute(s)

INDEX_MEMORY specifies the amount of memory, in bytes, allocated for indexing.

Note: When specifying a value for INDEX_MEMORY in a preference, you should specify as much real (not virtual) memory as is available on the machine which is running the ConText server that will be creating indexes.

If you plan to use parallel indexing, the memory specified should be the amount of available memory divided evenly among the number of ConText servers that will perform the indexing in parallel.

OPTIMIZE_DEFAULT specifies the type of optimization used when CTX_DDL.OPTIMIZE_INDEX is called without an optimization type.

If no value is specified for OPTIMIZE_DEFAULT, the default is DEFRAGMENT_TO_TWO_TABLE.

I1T_TABLESPACE, KTB_TABLESPACE, and LST_TABLESPACE specify the tablespaces to be used for the index tables created during indexing.

SQR_TABLESPACE specifies the tablespace to be used for the stored query expression result (SQR) table that is created, but not populated during indexing.

I1I_TABLESPACE, KID_TABLESPACE, KIK_TABLESPACE, and LIX_TABLESPACE specify the tablespaces to be used for storing the Oracle indexes generated for each index table during indexing.

SRI_TABLESPACE specifies the tablespace to be used for storing the Oracle index generated for each SQR table.

Note: For each TABLESPACE attribute that is not specified when creating an Engine preference, the text table owner's default tablespace is used for storing the ConText index objects (tables and indexes).

I1T_STORAGE, KTB_STORAGE, and LST_STORAGE specify the STORAGE clauses used for the index tables created during indexing.

SQR_STORAGE specifies the STORAGE clause used for the stored query expression result (SQR) table created during indexing.

I1I_STORAGE, KID_STORAGE, KIK_STORAGE, and LIX_STORAGE specify the STORAGE clauses used for the Oracle indexes generated for each index table.

SRI_STORAGE specifies the STORAGE clause used for the Oracle index generated for each SQR table.

I1T_OTHER_PARMS, KTB_OTHER_PARMS, and LST_OTHER_PARMS specify any additional parameters for the index tables created during indexing.

SQR_OTHER_PARMS specifies any additional parameters for the stored query expression result (SQR) table created during indexing.

I1I_OTHER_PARMS, KID_OTHER_PARMS, KIK_OTHER_PARMS, and LIX_OTHER_PARMS specify any additional parameters for the Oracle indexes generated for each index table.

SRI_OTHER_PARMS specifies any additional parameters for the Oracle index generated for each SQR table.

Note: In particular, the OTHER_PARAMS attributes are used to specify the PARALLEL parameter, which determines the degree of parallelism used by Oracle7 for operations such as generating Oracle indexes.

For more information about the storage clauses and other parameters that you can specify for a database table/index, see Oracle7 Server SQL Reference.

For a description of the ConText index tables, see "ConText Index Tables" or "SQR Table" in "ConText Index Tables and Indexes (Appendix C)."

For more information about SQEs, see Oracle ConText Option Application Developer's Guide.

Engine Example

The following example creates a preference named DOC_ENGINE for the GENERIC ENGINE Tile:

begin
  ctx_ddl.set_attribute ('INDEX_MEMORY',   30000000 );
  ctx_ddl.set_attribute ('I1T_TABLESPACE', 'DOCUMENTS' );
  ctx_ddl.set_attribute ('I1T_STORAGE',' initial 10M next 2M
                         maxextents 10');
  ctx_ddl.set_attribute ('I1T_OTHER_PARMS',' pctfree 20');
  ctx_ddl.set_attribute ('I1I_OTHER_PARMS',' parallel 2');
  ctx_ddl.create_preference ('DOC_ENGINE', 'Test case',
                             'GENERIC ENGINE' );
end;

Wordlist

The Wordlist category contains the following Tiles:

Tiles Attributes Attribute Values
GENERIC WORD LIST STCLAUSE STORAGE_clause for wordlist table
INSTCLAUSE STORAGE_clause for Oracle index on wordlist table
SOUNDEX_AT_INDEX 0 (disabled)
1 (enabled)
STEMMER 1 (English)
2 (English -- derivational)
3 (Dutch)
4 (French)
5 (German)
6 (Italian)
7 (Spanish)
FUZZY_MATCH 1 (English and other Western European languages)
2 (Japanese)
3 (Korean)
4 (Chinese)
Table 9 - 5. Wordlist Tiles (Page 1 of 1)



GENERIC WORD LIST Tile Attribute(s)

The STCLAUSE attribute specifies the STORAGE clause used to create the wordlist table.

The INSTCLAUSE attribute specifies the STORAGE clause used to create the Oracle index for the wordlist table.

The SOUNDEX_AT_INDEX attribute specifies whether ConText Option generates Soundex word mappings and stores them in the wordlist table during text indexing. If Soundex word mappings are not generated and stored in the wordlist table during indexing, queries that use Soundex will not be expanded.

The STEMMER attribute specifies the stemmer used for word stemming in text queries. For all the supported languages, the stemmers return standard inflected forms of a word, such as the plural form (e.g. department --> departments). For English, an additional stemmer is provided which returns standard inflected forms and derived forms (e.g. department --> departments, departmentalize).

The default is 1 (inflectional English)

The FUZZY_MATCH attribute specifies which fuzzy matching routines are used for the column. Fuzzy matching is currently supported only for English, Japanese, and, to a lesser extent, the Western European languages.

Note: The attribute values for Chinese and Korean are dummy attribute values that prevent the English and Japanese fuzzy matching routines from being used on Chinese and Korean text.

The default for FUZZY_MATCH is 1.

Wordlist Example

The following example creates a preference named SOUNDEX_YES for the GENERIC WORDLIST Tile:

begin
ctx_ddl.set_attribute('SOUNDEX_AT_INDEX', '1');
ctx_ddl.create_preference('SOUNDEX_YES',
'Will build the soundex mapping during indexing',
 'GENERIC WORDLIST');
end;

Stoplist

The Stoplist category contains the following Tiles:

Tiles Attributes Attribute Values
GENERIC STOP LIST STOP_WORD stop_word, sequence
Table 9 - 6. Stoplist Tiles (Page 1 of 1)



GENERIC STOP LIST Tile Attribute(s)

The STOP_WORD attribute has two values that must be specified:

Sequence is a value from 1 to 4095 and is used in a text index to record the stop words that proceed and follow an indexed term. ConText Option records up to eight preceding stop words and eight following stop words for each indexed term. This enables text queries for phrases which contain stop words.

For example, consider the sentence "he is at the top of the class" where at, the, top, and of are stop words. The sequence for each of the stop words is recorded as part of the text index entry for the term class, which allows users to query for the phrase "top of the class."

Stoplist Example

The following example creates a preference named MINI_STOP_LIST for the GENERIC STOPLIST Tile:

begin
ctx_ddl.set_attribute    ('STOP_WORD', 'A',   1);    
ctx_ddl.set_attribute    ('STOP_WORD', 'AND', 2);    
ctx_ddl.set_attribute    ('STOP_WORD', 'THE', 3);    
ctx_ddl.create_preference('MINI_STOP_LIST', 'Small', 'GENERIC STOP LIST' );
end;

Reader (Text Loading)

The Reader category contains the following Tiles:

Tiles Attributes Attribute Values
DIRECTORY READER DIRECTORIES directory_name

DIRECTORY READER Tile Attribute(s)

The DIRECTORIES attribute specifies the directory that the ConText server with the Loader personality scans when looking for new files to load into a column in a table or view.

Translator (Text Loading)

The Translator category contains the following Tiles:

Tiles Attributes Attribute Values
NULL TRANSLATOR SEPARATE N/A
USER TRANSLATOR COMMAND translator_executable

NULL TRANSLATOR Tile Attribute(s)

The SEPARATE attribute specifies that the files to be loaded by Loader servers do not contain the actual text of the documents to be loaded, but, rather, contain pointers to separate files where the text of the documents is stored.

For more information about how the separate option works for loading text, see "ctxload Utility" in "Executables and Utilities (Chapter 8)."

USER TRANSLATOR Tile Attribute(s)

The COMMAND attribute specifies the name of the executable used to translate a load file into the format required by ctxload.

Note: The specified translator executable must be stored in the bin subdirectory in the ctx directory in your Oracle home directory.

For example, in a UNIX-based operating system, all translator executables must be stored in $ORACLE_HOME/ctx/bin.

Engine (Text Loading)

The Engine category contains the following Tiles:

Tiles Attributes Attribute Values
GENERIC LOADER ** none ** N/A
The GENERIC LOADER does not have any attributes.

Predefined and Default Preferences

ConText Option provides the following predefined preferences, grouped according to preference category.

In the following list, the default preferences are highlighted and indicated by a double asterisk.

Data Store

Filter

Lexer

Engine

Wordlist

Stoplist

Reader (Text Loading)

Translator (Text Loading)

Engine (Text Loading)

Data Store

The following section provides descriptions of the predefined preferences for the Data Store category.

DEFAULT_DIRECT_DATASTORE

The DEFAULT_DIRECT_DATASTORE preference calls the DIRECT Tile which is used to indicate that text is stored directly in the text column of a text table.

DEFAULT_DIRECT_DATASTORE does not use any Tile attributes because the DIRECT Tile does not have attributes.

Note: DEFAULT_DIRECT_DATASTORE is the default preference for the Data Store preference category.

MD_TEXT

The MD_TEXT preference calls the MASTER DETAIL Tile which is used to indicate text is stored in a master detail table.

MD_TEXT uses the Tile attribute BINARY and a value of NO to indicate that the text in the table is stored as ASCII text.

MD_BINARY

The MD_BINARY preference calls the MASTER DETAIL Tile which is used to indicate text is stored in a master detail table.

MD_BINARY uses the BINARY Tile attribute and a value of YES to indicate that the text in the table is stored in binary format:

DEFAULT_OSFILE

The DEFAULT_OSFILE preference calls the OSFILE Tile which is used to indicate that text is stored as files in a file system.

DEFAULT_OSFILE uses the PATH Tile attribute and a hardcoded set of dummy directory paths to indicate the directories in which the text files are located.

The hard-coded paths, delimited by colons are: /oracle/data, /oracle/data2, /oracle/data3

Note: The DEFAULT_OSFILE preference requires modification to reflect the actual paths for your text files before the preference can be used in a policy.

DEFAULT_URL

The DEFAULT_URL preference calls the URL Tile which is used to indicate that text is stored as URLs.

DEFAULT_URL uses all of the attribute defaults for the URL Tile:

Filter

The following section provides descriptions of the predefined preferences for the Filter category.

DEFAULT_NULL_FILTER

The DEFAULT_NULL_FILTER preference calls the FILTER NOP Tile which indicates that the text column in a text table contains plain, unformatted (ASCII) text and does not require filtering for indexing and highlighting.

DEFAULT_NULL_FILTER does not use any Tile attributes because the FILTER NOP Tile does not have attributes.

Note: DEFAULT_NULL_FILTER is the default preference for the Filter preference category.

AUTOB

The AUTOB preference calls the BLASTER FILTER Tile which specifies the MasterSoft filter used to extract text from formatted documents in a text column.

AUTOB uses the FORMAT Tile attribute and a value of 997 to indicate that ConText Option uses the autorecognize filter to extract text. It can be used to filter text in a column the contains the following document formats:

WW6B

The WW6B preference calls the BLASTER FILTER Tile which specifies that, for the BLASTER FILTER Tile, the Microsoft Word for Windows 6 filter is used to extract text from Word for Windows 6 documents in a text column.

WW6B uses the FORMAT Tile attribute and a value of 11 to indicate ConText Option uses the Word for Windows 6 filter to extract text. It can be used in a column that contains only Word for Windows 6-formatted documents.

HTML_FILTER

The HTML_FILTER preference calls the HTML FILTER Tile and can be used to filter documents in a column that contains only HTML-formatted documents.

Lexer

The following section provides descriptions of the predefined preferences for the Lexer category.

DEFAULT_LEXER

The predefined DEFAULT_LEXER preference calls the BASIC LEXER Tile, which indicates the lexer settings used to identify word and sentence boundaries for text indexing and text queries.

DEFAULT_LEXER uses the following Tile attributes and values to indicate the lexer settings:

Attributes Values
PUNCTUATIONS .?!
PRINTJOINS NULL (indicates no characters defined as printjoins for the BASIC LEXER; instead, printjoins determined by NLS initialization parameters)
SKIPJOINS NULL (indicates no characters defined as skipjoins for the BASIC LEXER; instead, skipjoins determined by NLS initialization parameters)
CONTINUATION -\
Table 9 - 7. Attributes for DEFAULT_LEXER Preference (Page 1 of 1)



Note: DEFAULT_LEXER is the default preference for the Lexer preference category.

THEME_LEXER

The predefined THEME_LEXER preference calls the THEME LEXER Tile, which indicates the preference can be used in a column policy to create theme indexes for a column.

The THEME_LEXER preference does not set any attributes because the THEME LEXER preference doesn't have any attributes.

VGRAM_JAPANESE_1 and VGRAM_JAPANESE_2

The VGRAM_JAPANESE preferences call the JAPANESE V-GRAM LEXER Tile which indicates the preferences can be used for parsing Japanese text.

The 1 or 2 indicates that the preference uses either method 1 or 2 for identifying tokens in Japanese text.

VGRAM_CHINESE_1 and VGRAM_CHINESE_2

The VGRAM_CHINESE preferences call the CHINESE V-GRAM LEXER Tile which indicates the preferences can be used for parsing Chinese text.

The 1 or 2 indciates that the preference uses either method 1 or 2 for identifying tokens in Chinese text.

KOREAN

The KOREAN preference calls the KOREAN Tile and can be used for parsing Korean text. It has no attributes.

Engine

The following section provides descriptions of the predefined preferences for the Engine category.

DEFAULT_INDEX

The DEFAULT_INDEX preference contains the GENERIC ENGINE Tile which is used to specify the amount of memory reserved for indexing and the storage clauses for the indexes created by the GENERIC ENGINE Tile.

DEFAULT_INDEX uses the INDEX_MEMORY Tile attribute and the following calculation to specify the amount of memory allocated for indexing:

	12*power(2,20)

Wordlist

The following section provides descriptions of the predefined preferences for the Wordlist category.

NO_SOUNDEX

The NO_SOUNDEX preference contains the GENERIC WORD LIST Tile which specifies whether Soundex word mappings are generated during text indexing. Soundex can be used in text queries to expand the query to include words that sound similar to the query terms.

NO_SOUNDEX uses the SOUNDEX_AT_INDEX Tile attribute and a value of 0 to indicate that ConText Option does not generate Soundex word mappings during text indexing.

Note: NO_SOUNDEX is the default preference for the Wordlist preference category.

SOUNDEX

The SOUNDEX preference contains the GENERIC WORDLIST Tile which specifies whether Soundex word mappings are generated during text indexing. Soundex can be used in text queries to expand the query to include words that sound similar to the query terms.

SOUNDEX uses the SOUNDEX_AT_INDEX Tile attribute and a value of 1 to indicate that ConText Option generates Soundex word mappings during text indexing.

Stoplist

The following section provides descriptions of the predefined preferences for the Stoplist category.

DEFAULT_STOPLIST

The DEFAULT_STOPLIST preference specifies a list of stop words for the GENERIC STOP LIST Tile.

The preference uses the STOP_WORD Tile attribute to list each of the following stop words:

STOPWORD SEQ STOPWORD SEQ STOPWORD SEQ
A 3 HER 45 S 6
ABOUT 34 HIS 44 SO 73
AFTER 63 IF 58 SAYS 41
ALL 62 IN 4 SHE 25
ALSO 50 INC 48 SOME 55
AN 27 INTO 75 SUCH 69
ANY 76 IS 10 THAN 43
AND 5 IT 11 THAT 9
ARE 28 ITS 22 THE 7
AS 14 LAST 56 THEIR 47
AT 13 MORE 38 THERE 67
BE 23 MOST 74 THEY 37
BECAUSE 66 MR 18 THIS 35
BEEN 49 MRS 20 TO 2
BUT 30 MS 21 WAS 26
BY 16 MZ 19 WE 57
CAN 68 NO 71 WERE 52
CO 60 NOT 61 WHEN 65
CORP 53 ONLY 72 WHICH 36
COULD 70 OF 1 WHO 42
FOR 8 ON 12 WILL 31
FROM 17 ONE 40 WITH 15
HAD 51 OR 33 WOULD 39
HAS 29 OTHER 54 UP 46
HAVE 32 OUT 59
HE 24 OVER 64
Note: DEFAULT_STOPLIST is the default preference for the Stoplist preference category.

NO_STOPLIST

The NO_STOPLIST preference contains the GENERIC STOP LIST TILE and specifies that no list of stop words is used during text indexing. All words that ConText Option encounters are stored in the text index.

NO_STOPLIST contains no STOP_WORD attributes to indicate that there are no stopwords used during indexing.

Reader (Text Loading)

The following section provides descriptions of the predefined preferences for the Reader category.

DEFAULT_READER

The DEFAULT_READER preference uses the DIRECTORY READER Tile, which has a predefined directory set for the Tile.

Note: Because it is unknown which directory contains the files you want to load and pathnames are operating-system specific, this preference is provided only as a default and should not be used when creating a source.

Before creating a source, you should create your own Reader preference that specifies the directory where your files to be loaded are located.

Translator (Text Loading)

The following section provides descriptions of the predefined preferences for the Translator category.

DEFAULT_TRANSLATOR

The DEFAULT_TRANSLATOR preference uses the NULL TRANSLATOR TIle, which indicates no translator is used for loading text from files and the files are in the format required by ctxload.

Engine (Text Loading)

The following section provides descriptions of the predefined preferences for the Text Loading Engine category.

DEFAULT_LOADER

The DEFAULT_LOADER preference uses the GENERIC LOADER Tile, which indicates the preference can be used to load text from files in a operating system directory.

Template Policies

The following section provides a brief description of the template policies provided with ConText Option.

The template policies are owned by CTXSYS. A template policy can be specified as the source policy for a policy during creation.

ConText Option provides the following template policies:

DEFAULT_POLICY

The DEFAULT_POLICY policy can be used to create a policy which uses all of the default preferences. DEFAULT_POLICY is the default for SOURCE_POLICY in CTX_DDL.CREATE_POLICY

Preferences Characteristics
DEFAULT_DIRECT_DATASTORE Text stored in database
DEFAULT_NULL_FILTER No filter (text stored in plain, ASCII format)
DEFAULT_LEXER Basic lexer (standard punctuation and continuation characters, no printjoin or skipjoin characters)
DEFAULT_INDEX Indexing memory = 12582912 bytes, default storage/other clauses for index
NO_SOUNDEX No Soundex word mappings stored during text indexing
DEFAULT_STOPLIST Stoplist is active, default list of stop words
Table 9 - 8. Attributes for DEFAULT_LEXER Preference (Page 1 of 1)



TEMPLATE_DIRECT

The TEMPLATE_DIRECT policy can be used to create a policy for indexing basic text stored in a LONG or VARCHAR2 text column.

It uses all the default preferences.

TEMPLATE_MD

The TEMPLATE_MD policy can be used to create a policy for indexing plain text stored in the detail column in a master-detail table.

It uses the MD_TEXT predefined preference and all the remaining default preferences.

TEMPLATE_MD_BIN

The TEMPLATE_MD_BIN policy can be used to create a policy for indexing binary text stored in the detail column in a master-detail table.

It uses the MD_BINARY predefined preference and all the remaining default preferences.

TEMPLATE_AUTOB

The TEMPLATE_AUTOB policy can be used to create a policy for a text column that contains documents in multiple formats. The autorecognize Blaster filter is used to automatically identify the format of each document in a column and, if the format is supported by ConText Option, extract the text of the document for indexing.

TEMPLATE_AUTOB uses the AUTOB predefined preference and all the remaining default preferences.

TEMPLATE_WW6B

The TEMPLATE_WW6B policy can be used to create a policy for indexing text formatted for Microsoft Word for Windows 6.

It uses the WW6B predefined preference and all the remaining default preferences.

TEMPLATE_LONGTEXT_STOPLIST_OFF

The TEMPLATE_LONGTEXT_STOPLIST_OFF policy can be used to create a policy that does not use a stopword list during indexing.

It uses the NO_STOPLIST predefined preference and all the remaining default preferences.

TEMPLATE_LONGTEXT_STOPLIST_ON

The TEMPLATE_LONGTEXT_STOPLIST_ON policy can be used to create a policy that uses a stopword list during indexing.

It uses the DEFAULT_STOPLIST predefined preference and all the remaining default preferences.

Supported Formats for Multiple-Format Columns

The following section lists all of the formats that ConText Option supports for columns that use external filters for processing documents in more than one format.

For each format, the format ID is also listed. This is the value that must be specified when creating a Filter preference using the BLASTER FILTER Tile with the EXECUTABLE attribute.

Note: To index documents in any of these formats using external filters, the external filter must exist and the executable for the filter must be specified in a Filter preference using the EXECUTABLE attribute.

Document Format Format ID
Adobe Acrobat (PDF) 57
Ami Pro 1.x - 3.1 19
Ami Pro Graphics SDW Samna Draw 62
ASCII 90
AT&T Crystal Writer 46
AutoCAD (DXF, DXB) 53
CEOwrite 3.0 78
Computer Graphics Metafile (CGM) 79
CorelDraw 2.x and 3.x 59
CTOS DEF 75
DBase IV 1.0;
DBase III, III +
37
DCA/FFT - Final Form Text 27
DCA/RFT - Revisable Form Text 0
Digital DX 15
Digital WPS-PLUS 47
EBCDIC 89
Enable 1.1, 2.0, 2.15 11
Encapsulated PostScript Preview;
Encapsulated PostScript Bitmap
66
First Choice 3.0 Data Base 13
FrameMaker (MIF) 3.0;
FrameMaker (MIF) 3.0 Win
42
Framework III, 1.0, 1.1 22
FullWrite Professionl 1.0x 31
GIF (Graphical Interchange Format) 51
Harvard Graphics 87
HP Graphics Language (HPGL) 83
HTML Level 1, 2, 3 91
IBM Writing Assistant 1.0 16
IGES 52
Interleaf 5.2;
Interleaf 5.2 - 6.0
32
JPEG (Joint Photographic Experts Group) 58
Legacy 1.x, 2.0 41
Lotus 123 4.x;
Lotus 123 3.0;
Lotus 123 1A, 2.0, 2.1
20
Lotus Freelance 85
Lotus Manuscript 2.0, 2.1 26
Lotus PIC 67
Macintosh Paint 88
Microsoft Windows Paint 2.x 70
Macintosh QuickDraw (PICT) 64
MacWrite 4.5 - 5.0 29
MacWrite II 1.0 - 1.1 30
Mass 11, Version 8.0 -8.33 36
MastSoft Graphics (MSG) 49
Micrografx Designer (DRW) 60
MS Access 2.0 39
MS Excel 5.0 - 6.0;
MS Excel 4.0;
MS Excel 3.0;
MS Excel 2.1
21
MS RTF;
MS RTF (ANSI Char Set)
17
MS Word for DOS 6.0;
MS Word for DOS 5.0, 5.5;
MS Word for DOS 4.0;
MS Word for DOS 3.0, 3.1
8
MS Word for Mac 5.0, 5.1;
MS Word for Mac 4.0;
MS Word for Mac 3.0
28
MS Word for Windows 2.0;
MS Word for Windows 1.x
18
MS Word for Windows 6.0;
MS Word for Mac 6.0
68
MS Works for Windows 3.0 69
MS Write for Windows 3.x 7
MultiMate 4;
MultiMate Advantage II;
MultiMate Advantage I;
MultiMate 3.3
6
Navy DIF (GSA) 35
OfficePower 7;
OfficePower 6
44
OfficeWriter 6.0 - 6.2;
OfficeWriter 5.0;
OfficeWriter 4.0
9
OS/2 Bitmap;
Windows Bitmap (BMP);
Windows RLE
63
Paradox 3.5, 4.0 38
PC Paintbrush (PCX) 71
PeachText 5000 2.1.2 82
POWERPOINT 2, 3, 4 84
PFS:First Choice 3.0;
PFS:First Choice 2.0;
PFS:First Choice 1.0;
PFS:WRITE Ver C;
Professional Write 2.0 - 2.2;
Professional Write 1.0
12
Quattro Pro DOS;
Quattro Pro Windows
45
Q&A 4.0;
Q&A Write 1.x, Q&A 3.0
10
Rapid File 1.0 23
RGIP 61
Samna Word IV & IV + 1.0, 2.0 25
Sun Raster Graphics 65
TIFF (Tagged Image File Format) 50
Uniplex V7 - V8 77
Vokswriter 3, 4 74
Wang PC, Version 3 24
Wang WITA 55
Windows Clipboard 72
Windows ICON 73
Windows Metafile (WMF) 48
WiziDraw 86
WiziWord 56
Word For Word Intermediate Communications format (COM) 34
WordPerfect for Windows 6.1;
WordPerfect for Windows 6.0;
WordPerfect 6.0
1
WordPerfect 5.1 (Mail Merge) 2
WordPerfect for Windows 5.x;
WordPerfect 5.1;
WordPerfect 5.0
3
WordPerfect Graphics 1 (WPG) 4
WordPerfect Graphics 2 (WPG) 5
WordPerfect 4.2;
WordPerfect 4.1
80
WordPerfect Mac 1.0 81
WordPerfect Mac 3.0;
WordPerfect Mac 2.1;
WordPerfect Mac 2.0
33
WordStar 5.0, 5.5, 6.0, 7.0 40
WordStar 2000, Rel 3.0 14
WriteNow 3.0 54
Xerox - XIF 5.0, 6.0 43
XYWrite IV;
XyWrite III Plus
76




Go to previous file in sequence Go to next file in sequence
Prev Next
Oracle
Copyright © 1996 Oracle Corporation.
All Rights Reserved.
Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index