Oracle Context Option Administrator's Guide | ![]() Library |
![]() Product |
![]() Contents |
![]() Index |
The topics discussed in this chapter are:
The Tiles are grouped by preference category:
Tiles | Attributes | Attribute Values |
DIRECT | ** none ** | N/A |
MASTER DETAIL | BINARY | 0 (plain text) |
1 (binary text) | ||
OSFILE | PATH | path1:path2:...:pathn |
URL | TIMEOUT | seconds (0 to 3600, default 30) |
MAXTHREADS | thread_num (0 to 1024, default 8) | |
MAXURLS | buffer_length (1 to 231-1, default 256) | |
URLSIZE | URL_length (32 to 65535, default 256) | |
MAXDOCSIZE | doc_size (256 to 231-1, default 2000000) | |
HTTP_PROXY | host_name | |
NO_PROXY | string (up to 16 strings, separated by commas) | |
Text in binary format does not use newline characters to indicate the end of the line. Plain text uses newline characters at the end of each line to indicate the end of the line.
Multiple paths can be specified for the PATH attribute, with the paths separated by a colon (:). File names are stored in the text column in the text table. If you do not use the PATH attribute to specify a path for external files, ConText Option requires the path to be included in the file names stored in the text column.
Note: If text is stored in external files rather than in a database, the files must be accessible from the host machine on which the ConText server is running. This can be accomplished by storing the files in the file system for the host machine or by mounting the file system where the files are stored to the host machine.
Note: Since timeout is at the network operation level, the total timeout may be longer than the time specified for TIMEOUT.
The MAXTHREADS attribute specifies the maximum number of threads that can be running at the same time. The valid range for MAXTHREADS is 1 to 1024 and the default is 8.
Note: The upper range of MAXTHREADS corresponds to the number of file descriptors that the operating system can process at one time. If the number of files your operating system can process at one time is less than the value you set, you may receive an invalid socket error.
The MAXURLS attribute specifies the maximum number of rows that the internal buffer can hold for HTML documents (rows) retrieved from the text table. The valid range for MAXURLS is 1 to (231-1) and the default is 256.
The URLSIZE attribute specifies the maximum length, in bytes, that the URL data store supports for URLs stored in the database. If a URL is over the maximum set, an error is returned. The valid range for URLSIZE is 32 to 65535 and the default is 256.
The MAXDOCSIZE attribute specifies the maximum size, in bytes, that the URL data store supports for accessing HTML documents whose URLs are stored in the database. The valid range for MAXDOCSIZE is 1 to (231-1) and the default is 200000 (2 Mb).
The HTTP_PROXY attribute specifies the fully-qualified name of the host machine that serves as the proxy (gateway) for the machine on which ConText Option is installed.
The NO_PROXY attribute specifies the strings (up to sixteen, separate by commas) which, when encountered in a host name, cause the URL data store to ignore the machine as a proxy machine.
For example, if the string 'us.oracle.com, uk.oracle.com' is entered for NO_PROXY, any machines that contain either of these domains in their host names are ignored as proxy machines.
begin ctx_ddl.set_attribute ('PATH', '/private/mydocs'); ctx_ddl.create_preference ('DOC_PREF', 'Path my for my documents' 'OSFILE'); end;
Note: This example illustrates usage of OSFILE for documents stored on a machine running a UNIX-based operating system.
Tiles | Attributes | Attribute Values | |
FILTER NOP | ** none ** | N/A | |
HTML FILTER | CODE_ CONVERSION | 0 (conversion disabled) | |
1 (conversion enabled) | |||
USER FILTER | COMMAND | filter_executable | |
BLASTER FILTER | EXECUTABLE | format_id, filter_executable, sequence | |
FORMAT | 0 or 999 | No filter (ASCII) | |
1 or 4 | Word Perfect for
Windows 5.x, Word Perfect for DOS 5.0, 5.1 | ||
5 | Word Perfect for
Windows 6.x, Word Perfect for DOS 6.0 | ||
2 | MS Word for DOS 5.0, 5.5 | ||
6 | MS Word for Mac 3, 4, 5.x | ||
7 | MS Word for Windows 2 | ||
11 | MS Word for Windows 6.x, 7.0 | ||
8 | AMIPRO for Windows 1, 2, 3 | ||
9 | Lotus 1-2-3 for Windows 2, 3, 4, 5 Lotus 1-2-3 for DOS 4, 5 | ||
13 | Xerox XIF for UNIX 5, 6 | ||
997 | Autorecognize | ||
Note: If you use the USER FILTER Tile or the EXECUTABLE attribute (BLASTER Tile) to specify external filters for indexing and viewing text, the specified filter executable must be stored in the bin subdirectory in the ctx directory in your Oracle home directory.
For example, in a UNIX-based operating system, all filter executables must be stored in $ORACLE_HOME/ctx/bin.
Code conversion is required for Japanese HTML documents if the documents use more than one of the three character sets supported for HTML text in Japanese. If code conversion is enabled, all Japanese HTML documents are converted to a single, common character set before indexing.
The default for CODE_CONVERSION is 0 (not enabled).
Note: For multiple-format columns that use Autorecognize (BLASTER Tile, FORMAT attribute = 997) or use external filters (BLASTER Tile, EXECUTABLE attribute) for all formats except HTML, code conversion is always enabled.
The EXECUTABLE attribute specifies the external filters that are used to filter text stored in a multiple-format text column. It has three values that must be specified:
For a list of the format codes supported by the EXECUTABLE attribute, see "Supported Formats for Multiple-Format Columns" in this chapter.
begin ctx_ddl.set_attribute ('FORMAT', '11'); ctx_ddl.create_preference ('WORD6', 'Microsoft Word docs', 'BLASTER FILTER'); end;
Tiles | Attributes | Attribute Values |
BASIC LEXER | PUNCTUATIONS | character_string |
PRINTJOINS | character_string | |
SKIPJOINS | character_string | |
NUMJOIN | character_string | |
NUMGROUP | character_string | |
CONTINUATION | character_string | |
BASE_LETTER | 0 (Disabled) | |
1 (Enabled) | ||
THEME LEXER | ** none ** | N/A |
JAPANESE V-GRAM LEXER | KANJI_INDEXING | 1 |
2 | ||
CHINESE V-GRAM LEXER | HANZI_INDEXING | 1 |
2 | ||
KOREAN LEXER | ** none ** | N/A |
Note: The character_string for each BASIC LEXER Tile attribute can contain multiple characters. Each character in the string serves as a punctuation, join, or continuation character.
For example, if the string '.?!' is specified for the PUNCTUATIONS attribute, each individual character ('.', '?', '!') in the string is treated by ConText Option as a sentence delimiter.
PRINTJOINS specifies the characters that join words together when they appear between the words with no blank spaces. Words that contain PRINTJOINS characters are stored in the text index exactly as they appear in the text. For example, if you define '-' as a PRINTJOINS character, the word pseudo-intellectual is stored in the text index as pseudo-intellectual.
SKIPJOINS specifies the characters that join words together, but the characters are not stored in the text index. For example, if you define '-' as a SKIPJOINS character, the word pseudo-intellectual is stored in the text index as pseudointellectual.
Note: PRINTJOINS and SKIPJOINS are mutually exclusive. You cannot specify the same characters for both attributes.
Note: The default values for NUMJOIN and NUMGROUP are determined by the NLS initialization parameters that are specified for the database. In general, you do not need to specify a value for either NUMJOIN or NUMGROUP when creating a Lexer preference for the BASIC LEXER Tile.
BASE_LETTER specifies whether characters that have diacritical marks (umlats, cedillas, acute accents, etc.) are converted to their base form for text indexing and text queries.
A value of 1 for KANJI_INDEXING indicates that the Japanese lexer examines each character individually to determine token boundaries.
A value of 2 for KANJI_INDEXING indicates that the lexer examines characters in pairs to determine token boundaries.
The default is 2.
A value of 1 for HANZI_INDEXING indicates that the Japanese lexer examines each character individually to determine token boundaries.
A value of 2 for HANZI_INDEXING indicates that the lexer examines characters in pairs to determine token boundaries.
The default is 2.
begin ctx_ddl.Set_attribute ('PRINTJOINS', '-*/'); ctx_ddl.create_preference ('DOC_LINK', 'Dash, star, slash', 'BASIC LEXER' ); end;
Note: When specifying a value for INDEX_MEMORY in a preference, you should specify as much real (not virtual) memory as is available on the machine which is running the ConText server that will be creating indexes.
If you plan to use parallel indexing, the memory specified should be the amount of available memory divided evenly among the number of ConText servers that will perform the indexing in parallel.
OPTIMIZE_DEFAULT specifies the type of optimization used when CTX_DDL.OPTIMIZE_INDEX is called without an optimization type.
If no value is specified for OPTIMIZE_DEFAULT, the default is DEFRAGMENT_TO_TWO_TABLE.
I1T_TABLESPACE, KTB_TABLESPACE, and LST_TABLESPACE specify the tablespaces to be used for the index tables created during indexing.
SQR_TABLESPACE specifies the tablespace to be used for the stored query expression result (SQR) table that is created, but not populated during indexing.
I1I_TABLESPACE, KID_TABLESPACE, KIK_TABLESPACE, and LIX_TABLESPACE specify the tablespaces to be used for storing the Oracle indexes generated for each index table during indexing.
SRI_TABLESPACE specifies the tablespace to be used for storing the Oracle index generated for each SQR table.
Note: For each TABLESPACE attribute that is not specified when creating an Engine preference, the text table owner's default tablespace is used for storing the ConText index objects (tables and indexes).
I1T_STORAGE, KTB_STORAGE, and LST_STORAGE specify the STORAGE clauses used for the index tables created during indexing.
SQR_STORAGE specifies the STORAGE clause used for the stored query expression result (SQR) table created during indexing.
I1I_STORAGE, KID_STORAGE, KIK_STORAGE, and LIX_STORAGE specify the STORAGE clauses used for the Oracle indexes generated for each index table.
SRI_STORAGE specifies the STORAGE clause used for the Oracle index generated for each SQR table.
I1T_OTHER_PARMS, KTB_OTHER_PARMS, and LST_OTHER_PARMS specify any additional parameters for the index tables created during indexing.
SQR_OTHER_PARMS specifies any additional parameters for the stored query expression result (SQR) table created during indexing.
I1I_OTHER_PARMS, KID_OTHER_PARMS, KIK_OTHER_PARMS, and LIX_OTHER_PARMS specify any additional parameters for the Oracle indexes generated for each index table.
SRI_OTHER_PARMS specifies any additional parameters for the Oracle index generated for each SQR table.
Note: In particular, the OTHER_PARAMS attributes are used to specify the PARALLEL parameter, which determines the degree of parallelism used by Oracle7 for operations such as generating Oracle indexes.
For more information about the storage clauses and other parameters that you can specify for a database table/index, see Oracle7 Server SQL Reference.
For a description of the ConText index tables, see "ConText Index Tables" or "SQR Table" in "ConText Index Tables and Indexes (Appendix C)."
For more information about SQEs, see Oracle ConText Option Application Developer's Guide.
begin ctx_ddl.set_attribute ('INDEX_MEMORY', 30000000 ); ctx_ddl.set_attribute ('I1T_TABLESPACE', 'DOCUMENTS' ); ctx_ddl.set_attribute ('I1T_STORAGE',' initial 10M next 2M maxextents 10'); ctx_ddl.set_attribute ('I1T_OTHER_PARMS',' pctfree 20'); ctx_ddl.set_attribute ('I1I_OTHER_PARMS',' parallel 2'); ctx_ddl.create_preference ('DOC_ENGINE', 'Test case', 'GENERIC ENGINE' ); end;
The SOUNDEX_AT_INDEX attribute specifies whether ConText Option generates Soundex word mappings and stores them in the wordlist table during text indexing. If Soundex word mappings are not generated and stored in the wordlist table during indexing, queries that use Soundex will not be expanded.
The STEMMER attribute specifies the stemmer used for word stemming in text queries. For all the supported languages, the stemmers return standard inflected forms of a word, such as the plural form (e.g. department --> departments). For English, an additional stemmer is provided which returns standard inflected forms and derived forms (e.g. department --> departments, departmentalize).
The default is 1 (inflectional English)
Note: The attribute values for Chinese and Korean are dummy attribute values that prevent the English and Japanese fuzzy matching routines from being used on Chinese and Korean text.
The default for FUZZY_MATCH is 1.
begin ctx_ddl.set_attribute('SOUNDEX_AT_INDEX', '1'); ctx_ddl.create_preference('SOUNDEX_YES', 'Will build the soundex mapping during indexing', 'GENERIC WORDLIST'); end;
Tiles | Attributes | Attribute Values |
GENERIC STOP LIST | STOP_WORD | stop_word, sequence |
For example, consider the sentence "he is at the top of the class" where at, the, top, and of are stop words. The sequence for each of the stop words is recorded as part of the text index entry for the term class, which allows users to query for the phrase "top of the class."
begin ctx_ddl.set_attribute ('STOP_WORD', 'A', 1); ctx_ddl.set_attribute ('STOP_WORD', 'AND', 2); ctx_ddl.set_attribute ('STOP_WORD', 'THE', 3); ctx_ddl.create_preference('MINI_STOP_LIST', 'Small', 'GENERIC STOP LIST' ); end;
Tiles | Attributes | Attribute Values |
DIRECTORY READER | DIRECTORIES | directory_name |
Tiles | Attributes | Attribute Values |
NULL TRANSLATOR | SEPARATE | N/A |
USER TRANSLATOR | COMMAND | translator_executable |
For more information about how the separate option works for loading text, see "ctxload Utility" in "Executables and Utilities (Chapter 8)."
Note: The specified translator executable must be stored in the bin subdirectory in the ctx directory in your Oracle home directory.
For example, in a UNIX-based operating system, all translator executables must be stored in $ORACLE_HOME/ctx/bin.
Tiles | Attributes | Attribute Values |
GENERIC LOADER | ** none ** | N/A |
In the following list, the default preferences are highlighted and indicated by a double asterisk.
DEFAULT_DIRECT_DATASTORE does not use any Tile attributes because the DIRECT Tile does not have attributes.
Note: DEFAULT_DIRECT_DATASTORE is the default preference for the Data Store preference category.
MD_TEXT uses the Tile attribute BINARY and a value of NO to indicate that the text in the table is stored as ASCII text.
MD_BINARY uses the BINARY Tile attribute and a value of YES to indicate that the text in the table is stored in binary format:
DEFAULT_OSFILE uses the PATH Tile attribute and a hardcoded set of dummy directory paths to indicate the directories in which the text files are located.
The hard-coded paths, delimited by colons are: /oracle/data, /oracle/data2, /oracle/data3
Note: The DEFAULT_OSFILE preference requires modification to reflect the actual paths for your text files before the preference can be used in a policy.
DEFAULT_URL uses all of the attribute defaults for the URL Tile:
DEFAULT_NULL_FILTER does not use any Tile attributes because the FILTER NOP Tile does not have attributes.
Note: DEFAULT_NULL_FILTER is the default preference for the Filter preference category.
AUTOB uses the FORMAT Tile attribute and a value of 997 to indicate that ConText Option uses the autorecognize filter to extract text. It can be used to filter text in a column the contains the following document formats:
DEFAULT_LEXER uses the following Tile attributes and values to indicate the lexer settings:
Note: DEFAULT_LEXER is the default preference for the Lexer preference category.
The THEME_LEXER preference does not set any attributes because the THEME LEXER preference doesn't have any attributes.
The 1 or 2 indicates that the preference uses either method 1 or 2 for identifying tokens in Japanese text.
The 1 or 2 indciates that the preference uses either method 1 or 2 for identifying tokens in Chinese text.
DEFAULT_INDEX uses the INDEX_MEMORY Tile attribute and the following calculation to specify the amount of memory allocated for indexing:
12*power(2,20)
NO_SOUNDEX uses the SOUNDEX_AT_INDEX Tile attribute and a value of 0 to indicate that ConText Option does not generate Soundex word mappings during text indexing.
Note: NO_SOUNDEX is the default preference for the Wordlist preference category.
SOUNDEX uses the SOUNDEX_AT_INDEX Tile attribute and a value of 1 to indicate that ConText Option generates Soundex word mappings during text indexing.
The preference uses the STOP_WORD Tile attribute to list each of the following stop words:
STOPWORD | SEQ | STOPWORD | SEQ | STOPWORD | SEQ |
A | 3 | HER | 45 | S | 6 |
ABOUT | 34 | HIS | 44 | SO | 73 |
AFTER | 63 | IF | 58 | SAYS | 41 |
ALL | 62 | IN | 4 | SHE | 25 |
ALSO | 50 | INC | 48 | SOME | 55 |
AN | 27 | INTO | 75 | SUCH | 69 |
ANY | 76 | IS | 10 | THAN | 43 |
AND | 5 | IT | 11 | THAT | 9 |
ARE | 28 | ITS | 22 | THE | 7 |
AS | 14 | LAST | 56 | THEIR | 47 |
AT | 13 | MORE | 38 | THERE | 67 |
BE | 23 | MOST | 74 | THEY | 37 |
BECAUSE | 66 | MR | 18 | THIS | 35 |
BEEN | 49 | MRS | 20 | TO | 2 |
BUT | 30 | MS | 21 | WAS | 26 |
BY | 16 | MZ | 19 | WE | 57 |
CAN | 68 | NO | 71 | WERE | 52 |
CO | 60 | NOT | 61 | WHEN | 65 |
CORP | 53 | ONLY | 72 | WHICH | 36 |
COULD | 70 | OF | 1 | WHO | 42 |
FOR | 8 | ON | 12 | WILL | 31 |
FROM | 17 | ONE | 40 | WITH | 15 |
HAD | 51 | OR | 33 | WOULD | 39 |
HAS | 29 | OTHER | 54 | UP | 46 |
HAVE | 32 | OUT | 59 | ||
HE | 24 | OVER | 64 | ||
NO_STOPLIST contains no STOP_WORD attributes to indicate that there are no stopwords used during indexing.
Note: Because it is unknown which directory contains the files you want to load and pathnames are operating-system specific, this preference is provided only as a default and should not be used when creating a source.
Before creating a source, you should create your own Reader preference that specifies the directory where your files to be loaded are located.
ConText Option provides the following template policies:
It uses all the default preferences.
It uses the MD_TEXT predefined preference and all the remaining default preferences.
It uses the MD_BINARY predefined preference and all the remaining default preferences.
TEMPLATE_AUTOB uses the AUTOB predefined preference and all the remaining default preferences.
It uses the WW6B predefined preference and all the remaining default preferences.
It uses the NO_STOPLIST predefined preference and all the remaining default preferences.
It uses the DEFAULT_STOPLIST predefined preference and all the remaining default preferences.
For each format, the format ID is also listed. This is the value that must be specified when creating a Filter preference using the BLASTER FILTER Tile with the EXECUTABLE attribute.
Note: To index documents in any of these formats using external filters, the external filter must exist and the executable for the filter must be specified in a Filter preference using the EXECUTABLE attribute.
Document Format | Format ID |
Adobe Acrobat (PDF) | 57 |
Ami Pro 1.x - 3.1 | 19 |
Ami Pro Graphics SDW Samna Draw | 62 |
ASCII | 90 |
AT&T Crystal Writer | 46 |
AutoCAD (DXF, DXB) | 53 |
CEOwrite 3.0 | 78 |
Computer Graphics Metafile (CGM) | 79 |
CorelDraw 2.x and 3.x | 59 |
CTOS DEF | 75 |
DBase IV 1.0; DBase III, III + | 37 |
DCA/FFT - Final Form Text | 27 |
DCA/RFT - Revisable Form Text | 0 |
Digital DX | 15 |
Digital WPS-PLUS | 47 |
EBCDIC | 89 |
Enable 1.1, 2.0, 2.15 | 11 |
Encapsulated PostScript Preview; Encapsulated PostScript Bitmap | 66 |
First Choice 3.0 Data Base | 13 |
FrameMaker (MIF) 3.0; FrameMaker (MIF) 3.0 Win | 42 |
Framework III, 1.0, 1.1 | 22 |
FullWrite Professionl 1.0x | 31 |
GIF (Graphical Interchange Format) | 51 |
Harvard Graphics | 87 |
HP Graphics Language (HPGL) | 83 |
HTML Level 1, 2, 3 | 91 |
IBM Writing Assistant 1.0 | 16 |
IGES | 52 |
Interleaf 5.2; Interleaf 5.2 - 6.0 | 32 |
JPEG (Joint Photographic Experts Group) | 58 |
Legacy 1.x, 2.0 | 41 |
Lotus 123 4.x; Lotus 123 3.0; Lotus 123 1A, 2.0, 2.1 | 20 |
Lotus Freelance | 85 |
Lotus Manuscript 2.0, 2.1 | 26 |
Lotus PIC | 67 |
Macintosh Paint | 88 |
Microsoft Windows Paint 2.x | 70 |
Macintosh QuickDraw (PICT) | 64 |
MacWrite 4.5 - 5.0 | 29 |
MacWrite II 1.0 - 1.1 | 30 |
Mass 11, Version 8.0 -8.33 | 36 |
MastSoft Graphics (MSG) | 49 |
Micrografx Designer (DRW) | 60 |
MS Access 2.0 | 39 |
MS Excel 5.0 - 6.0; MS Excel 4.0; MS Excel 3.0; MS Excel 2.1 | 21 |
MS RTF; MS RTF (ANSI Char Set) | 17 |
MS Word for DOS 6.0; MS Word for DOS 5.0, 5.5; MS Word for DOS 4.0; MS Word for DOS 3.0, 3.1 | 8 |
MS Word for Mac 5.0, 5.1; MS Word for Mac 4.0; MS Word for Mac 3.0 | 28 |
MS Word for Windows 2.0; MS Word for Windows 1.x | 18 |
MS Word for Windows 6.0; MS Word for Mac 6.0 | 68 |
MS Works for Windows 3.0 | 69 |
MS Write for Windows 3.x | 7 |
MultiMate 4; MultiMate Advantage II; MultiMate Advantage I; MultiMate 3.3 | 6 |
Navy DIF (GSA) | 35 |
OfficePower 7; OfficePower 6 | 44 |
OfficeWriter 6.0 - 6.2; OfficeWriter 5.0; OfficeWriter 4.0 | 9 |
OS/2 Bitmap; Windows Bitmap (BMP); Windows RLE | 63 |
Paradox 3.5, 4.0 | 38 |
PC Paintbrush (PCX) | 71 |
PeachText 5000 2.1.2 | 82 |
POWERPOINT 2, 3, 4 | 84 |
PFS:First Choice 3.0; PFS:First Choice 2.0; PFS:First Choice 1.0; PFS:WRITE Ver C; Professional Write 2.0 - 2.2; Professional Write 1.0 | 12 |
Quattro Pro DOS; Quattro Pro Windows | 45 |
Q&A 4.0; Q&A Write 1.x, Q&A 3.0 | 10 |
Rapid File 1.0 | 23 |
RGIP | 61 |
Samna Word IV & IV + 1.0, 2.0 | 25 |
Sun Raster Graphics | 65 |
TIFF (Tagged Image File Format) | 50 |
Uniplex V7 - V8 | 77 |
Vokswriter 3, 4 | 74 |
Wang PC, Version 3 | 24 |
Wang WITA | 55 |
Windows Clipboard | 72 |
Windows ICON | 73 |
Windows Metafile (WMF) | 48 |
WiziDraw | 86 |
WiziWord | 56 |
Word For Word Intermediate Communications format (COM) | 34 |
WordPerfect for Windows 6.1; WordPerfect for Windows 6.0; WordPerfect 6.0 | 1 |
WordPerfect 5.1 (Mail Merge) | 2 |
WordPerfect for Windows 5.x; WordPerfect 5.1; WordPerfect 5.0 | 3 |
WordPerfect Graphics 1 (WPG) | 4 |
WordPerfect Graphics 2 (WPG) | 5 |
WordPerfect 4.2; WordPerfect 4.1 | 80 |
WordPerfect Mac 1.0 | 81 |
WordPerfect Mac 3.0; WordPerfect Mac 2.1; WordPerfect Mac 2.0 | 33 |
WordStar 5.0, 5.5, 6.0, 7.0 | 40 |
WordStar 2000, Rel 3.0 | 14 |
WriteNow 3.0 | 54 |
Xerox - XIF 5.0, 6.0 | 43 |
XYWrite IV; XyWrite III Plus | 76 |
![]() ![]() Prev Next |
![]() Copyright © 1996 Oracle Corporation. All Rights Reserved. |
![]() Library |
![]() Product |
![]() Contents |
![]() Index |