| Oracle Context Option Application Developer's Guide | Library |
Product |
Contents |
Index |
The topics covered in this chapter are:
A query expression can also call Stored Query Expressions (SQEs) to return stored query results or call PL/SQL functions to return values used in the query.
When a query is executed using any of the methods supported by ConText Option, one of the arguments included in the query is a query expression. ConText Option then returns a list of all the documents that satisfy the search criteria, as well as scores that measure the relevance of the document to the search criteria.
For example, a query for documents containing the phrase peanut butter and jelly returns different results than a query for documents containing the terms peanut butter and jelly.
When you define a policy for a column, ConText Option lets you identify a list of stopwords. When stopwords are encountered in the documents in the column, they are not included as indexed terms in the text index; however, they are recorded.
As a result, stopwords cannot be searched for explicitly in text queries, but can be included as part of a phrase in a query expression.
Attention: Query expressions made up of only stopwords will result in errors. Ensure that at least one of the terms in your query expression is not a stopword.
Stoplists can be created in any language supported by ConText Option. ConText Option provides a default stoplist in English.
Note: Stopwords do not have an affect on the theme indexes generated by ConText Option for your English-language documents.
| Component | Purpose |
| Operators | Define the relationships between the terms in a query expression and specify the output returned by the query |
| Wildcard Characters | Expand query terms using pattern matching |
| Grouping Characters | Group terms and operators in a query expression |
| Stored Query Expressions (SQEs) | Return the results of a query that has been executed and the results stored in an SQE table |
| PL/SQL Functions | Execute a function and use the results in a query expression |
When a text index that has used base-letter conversion is specified in a query, ConText Option converts the term in the query expression to match the base-letter representation before the query is processed. In addition, all expansion and stopword checking for the query is performed on the base-letter terms.
Note: The terms in a thesaural query are not converted to base-letter representation before look-up in the thesaurus is performed. The base-letter conversion takes place after the thesaurus look-up and is performed on all the terms returned by the thesaurus.
For more information about creating an index that supports base-letter conversion, see Oracle ConText Option Administrator's Guide.
SELECT articles FROM texttab
WHERE CONTAINS(textcol, 'wine') > 0;
The following example of a one-step query returns all articles that contain the phrase wine and roses in the TEXTTAB.TEXT_COLUMN column. The query expression consists of the query phrase wine and roses, surrounded by single quotes.
SELECT articles FROM texttab
WHERE CONTAINS(textcol, '{wine and roses}') > 0;For more information about the CONTAINS function used in one-step queries, see "SQL Functions (Chapter 10)."
Operators perform two functions: they define the relationships between the terms in a query expression and they control the output returned by the query.
There are five types of operators:
When query terms are combined, the number of spaces around the logical operator is not significant.
The order of operands is not significant except for the MINUS operator. For example, 'A and B' is equivalent to 'B and A', but 'A minus B' is not the same as 'B minus A'.
Logical operators link query terms together to produce scores that are based on the relationship of the terms to each other. The logical operators combine the scores of their operands up to a maximum value of 100. Operands can be any query terms, as well as other operators.
| Operator | Syntax | Description |
| AND | term&term term and term | Returns documents that contain one term and another. Returns the minimum score of its operands. All query terms must occur; lower score taken |
| OR | term|term term or term | Returns documents that contain one term or another. Returns the maximum score of its operands. At least one term must exist; higher score taken |
| ACCUMULATE | term,term term accum term | Returns documents that contain one term or another. Calculates score by adding its operands. Similar to OR, but if terms co-exist, score is combined |
| EQUIVALENCE | term=term term equiv term | Used to specify acceptable substitutions of words in a search. |
| MINUS | term-term term minus term | Returns the score of the right query term subtracted from the score of the left query term. |
| NOT | term~term term not term | Returns documents that contain one term and not another. |
Attention: If any symbols or word equivalents for the logical operators are to be included as search terms in the query expression rather than used as operators, the symbols/words must be escaped by enclosing them in braces { }. The following examples illustrate how a phrase containing a operator symbol or equivalent can be queried:
'sense {and} sensibility'
'{peter, paul, and mary}'
'{pride & prejudice}'
'batman & robin & penguin'
'jupiter and ganymede'
'cats | dogs'
'california or oregon'
'cat, dog, horse'
'california accum election accum convention'
ACCUMULATE is similar to OR, in the sense that a document satisfies the query expression if any of the terms occur in the document; however, the scoring is different.
OR returns a score based only on the query term that occurs most frequently in a document. ACCUMULATE combines the scores for all the query terms that occur in a document.
'labradors=alsatians are big dogs'
ConText processes the above query faster and more efficiently than the same query written with the accumulate operator. For example, you could write the above query less efficiently and less concisely as follows:
'labradors are big dogs, alsatians are big dogs'
The savings you gain in using the equivalence operator over the accumulate operator is most significant when you have more than one equivalence operator in the query expression. For example, the following query
'labradors=alsatians are big canines=dogs'
is a more efficient, more concise form of:
'labradors are big dogs, alsatians are big dogs' alsatians are big canines' labradors are big canines'
Precedence of Equivalence Operator The equivalence operator has higher precedence that all other operators except the unary operators (FUZZY, SOUNDEX, STEM, and PL/SQL function calls).
'poisons - arsenic'
'swimming minus sharks'
MINUS is similar to AND in the sense that a document satisfies the query expression only if both terms occur in the document; however, the scoring is different.
AND returns a score based on the intersection of the two query terms. MINUS returns a score which is calculated by subtracting the number of occurrences of the second query term from the number of occurrences of the first term.
For example, to obtain the documents that contain the term animals but not dogs, use the following expression:
'animals ~ dogs'
Similarly, to obtain the documents that contain the term transportation but not automobiles or trains, use the following expression:
'transportation not (automobiles or trains)'
Note: The NOT operator does not affect the scoring produced by the other logical operators.
The following table shows how the scores are calculated for six different documents based on the logical operators. The NOT operator is not included in this example because it has no affect on score.
Note: Each occurrence of a query term or phrase scores 10.
The results are:
| Operator | Symbol | Description |
| NEAR | term;term term near term | Returns a higher score if word operands are near each other. |
The score for a document is the highest score generated for all the query terms that occur within the document in proximity to each other. A score of 100 is returned when the query terms are adjacent.
For example, if the query expression is 'ice;cream', the phrase I love ice cream would score 100, while the phrase ice is colder than cream would score lower. If both phrases occurred in a document, the document would be retrieved by ConText Option and the score for the document would be 100.
| Operator | Syntax | Description |
| WEIGHT | term*n | Assigns a relative importance to an operand; 0.1 to 10 can be designated |
| THRESHOLD | term>n | No documents which score below the threshold are returned |
| MAX | term:n | Returns the specified number of documents; for example, :20 means the top 20 documents on the HITLIST are to be returned. A MAX value must be an integer between 1 and 232 - 1 |
| FIRST/NEXT | term#n-n | Returns the specified number of documents that contain the query term(s), beginning with the first n specified documents |
'cat*3 | dog*1'
'relational databases > 75'
'(oracle*3 & sybase*.1) > 50'
'dance:20'
Note: Because the MAX operator acts on a query result set, you must place it at the outermost level of the query; you cannot embed it within the query expression.
In addition, the MAX operator cannot be used with COUNT_QUERY or in-memory queries.
'((dog*2 > 95) & cat) > 75 : 40'
The first component of the expression, (dog*2 > 95), specifies that only documents that have more than 5 references to dogs will be returned (because each reference scores 10 and the weight of each reference is doubled).
The second component, & cat, specifies that the returned documents must also have references to cats.
The third component, ) > 75, specifies that the lower score of dogs and cats must be at least 75 (because the AND operator returns the lower score of the two operands). Therefore, since all documents referring to dogs must have scored at least 95, the documents meeting the AND condition must have at least 8 references to cats (each reference scores 10 and the total must be greater than 75).
Note: The FIRST/NEXT operator cannot be used with COUNT_QUERY or in-memory queries.
The FIRST/NEXT operator requires a range as a qualifier and returns all documents that fall within the specified range:
query_term#start_range-end_range
Note: The order of the returned documents is not based on score or textkey. ConText Option returns the documents based on the order in which it encounters the documents in the queried text column.
For example, to return the first 10 documents encountered by ConText Option that contain the term dog, use the following expression:
'dog#1-10'
You could then return the next 10 documents using the following expression:
'dog#11-20'
Note: Because the FIRST/NEXT operator acts on a query result set, you must place it at the outermost level of the query; you cannot embed it within the query expression.
'cat:50#1-10'
Note: Placing the MAX operator inside the FIRST/NEXT operator as such is the only instance in which you can embed the MAX operator in a query expression.
| Operator | Syntax | Description |
| STEM | $term | Expands a query to include all terms having the same stem or root word as the specified term. |
| SOUNDEX | !term | Expands a query to include all terms that sound the same as the specified term (English-langugage text only). |
| FUZZY | ?term | Expands a query to include all terms with similar spellings as the specified term (English-langugage text only). |
The expansion operators are unary operators. They may be used in combination with each other and with any other operators described in this chapter. In addition, searches can be broadened by performing an expansion on an expansion.
The methods used by the expansion operators to perform stemming, fuzzy matching, and soundex matching for a text column are determined by the Wordlist preference in the policy for the column.
For more information about setting up preferences and policies, see Oracle ConText Option Administrator's Guide.
| Input: | Expands to: |
| $scream | scream screaming screamed |
| $distinguish | distinguish distinguished distinguishes |
| $guitars | guitars guitar |
| $commit | commit committed |
| $cat | cat cats |
| $sing | sang sung sing |
The ConText Option stemmer, licensed from Xerox Corporation's XSoft Division, supports the following languages: English, French, Spanish, Italian, German, and Dutch.
Note: If STEM returns a stopword, the stopword is not inlcuded in the query or highlighted by CTX_QUERY.HIGHLIGHT.
Soundex in ConText Option uses the same logic as the SOUNDEX function in SQL to search for words that have a similar sound. It returns all words in a text column that have the same SOUNDEX value.
The following example illustrates the results that could be returned for a one-step query that uses SOUNDEX:
SELECT ID, COMMENT FROM EMP_RESUME
WHERE CONTAINS (COMMENT, '!SMYTHE') > 0
ID COMMENT -- ------------
23 Smith is a hard worker who....
Note: SOUNDEX works best for languages that use a 7-bit character set, such as English. It can be used, with lesser effectiveness, for languages that use an 8-bit character set, such as many Western European languages.
For more information about the SOUNDEX function in SQL, see Oracle7 Server SQL Reference.
Unlike the STEM expansion, the number of words generated by a FUZZY search depends on what is in the text index; results can vary significantly according to the contents of the database index.
For example:
| Input: | Expands to: |
| ?cat | cat cats calc case |
| ?feline | feline defined filtering |
| ?apply | apply apple applied April |
| ?read | lead real |
Note: FUZZY works best for languages that use a 7-bit character set, such as English. It can be used, with lesser effectivemess, for languages that use an 8-bit character set, such as many Western European languages. Also, the Japanese lexer provides limited fuzzy matching.
In addition, if FUZZY returns a stopword, the stopword is not inlcuded in the query or highlighted by CTX_QUERY.HIGHLIGHT.
Penetration applies the expansion operators to each term within an explicit expression (i.e., an expression delimited by parentheses or braces). Any expansion operators outside an expression delimited by parentheses ( ) or braces { } is applied to each word or phrase inside the expression.
For example:
| Query Before Penetration | Query After Penetration |
| ?(dog, cat, mouse) | ?dog , ?cat , ?mouse |
| ?(dog,!(cat & mouse)) | ?dog , (!?cat & !?mouse) |
| ?((cat=feline) meows) | (?cat = ?feline) ?meows |
In the first example, a FUZZY expansion is performed on each term.
In the second example, a FUZZY expansion is performed on each term and a SOUNDEX expansion is performed only on the terms cat and mouse because cat and mouse are enclosed in a separate set of parentheses.
Note: Expansion operators do not penetrate expressions delimited by brackets [ ].
There are eight kinds of thesaurus operators, corresponding to the eight types of relationships that can be defined in an ISO2788 standard thesaurus.
| Operator | Syntax | Description |
| SYNONYM | SYN(term[,thes]) | Expands a query to include all the terms defined in the thesaurus as synonyms for the specified word |
| PREFERRED TERM | PT(term[,thes]) | Replaces the specified word in a query with the term defined in the thesaurus as the preferred term for the specified word. |
| RELATED TERM | RT(term[,thes]) | Expands a query to include all the terms defined in the thesaurus as a related term for the specified word. |
| TOP TERM | TT(term[,thes]) | Replaces the specified word in a query with the term defined in the thesaurus as the top term in the standard hierarchy (BT, NT) for the specified word. |
| BROADER TERM | BT(term[,level[,thes]]) | Expands a query to include the term defined in the thesaurus as a broader term for the specified word |
| NARROWER TERM | NT(term[,level[,thes]]) | Expands a query to include all the lower level terms defined in the thesaurus as narrower terms for the specified word |
| BROADER GENERIC TERM | BTG(term[,level[,thes]]) | Expands a query to include all terms defined in the thesaurus as a broader generic terms for the specified word. |
| NARROWER GENERIC TERM | NTG(term[,level[,thes]]) | Expands a query to include all the lower level terms defined in the thesaurus as narrower generic term for the specified word |
| BROADER PARTITIVE TERM | BTP(term[,level[,thes]]) | Expands a query to include all the terms defined in the thesaurus as broader partitive terms for the specified word. |
| NARROWER PARTITIVE TERM | NTP(term[,level[,thes]]) | Expands a query to include all the lower level terms defined in the thesaurus as narrower partitive term for the specified word |
Internally, ConText Option processes the expansion by bracketing each individual term returned by the expansion, then the terms are accumulated together using the ACCUMULATE operator.
For example, if bird, birdy, and avian are all synonyms:
SYN(bird) is expanded to {bird},{avian},{birdy}
If a term in a thesaural query does not have corresponding entries in the specified thesaurus, no expansion is produced and the term itself is used in the query.
For more information about thesaural relationships and creating thesauri, see Oracle ConText Option Administrator's Guide.
The maximum length of the expanded query is 32000 characters.
Thesaural operations cannot be nested. For example, the following query is not allowed.
SYN(BT(bird))
The thesaurus operators have the following arguments:
term
is the operand for the thesaurus operator. A term must be specified when using the NT operator.
For PREFERRED TERM (PT) and TOP TERM (TT) queries, term is replaced by the preferred term/top term defined for the term in the specified thesaurus; however, if no PT or TT entries are defined for the term, the term is not replaced and is used in the query.
For all other thesaural queries, term is expanded to include the synonymous, related, broader, or narrower terms defined for the term in the specified thesaurus.
level
For example, a level of 1 in a BT query returns only the broader term, if one exists, for the specified term. A level of 2 returns the broader term for the specified term, as well as the broader term, if one exists, for the broader term.
The level argument is optional and has a default value of one (1). Zero or negative values for the level argument return only the original query term.
Note: The level argument is used only in broader and narrower term queries.
thes
is the name of the thesaurus used to return the expansions for the specified term.
The thes argument is optional and has a default value of DEFAULT. As a result, a thesaurus named DEFAULT must exist in the thesaurus tables before using any of the thesaurus operators.
The following excerpt illustrates a one-step query which returns all documents that contain the term tutorial or any of the synonyms defined for tutorial in the DEFAULT thesaurus:
...CONTAINS(textcol, 'SYN(tutorial)')...
Compound Phrases in Synonym Queries
Expansion of compound phrases for a term in a synonym query are returned as AND conjunctives.
For example, the compound phrase temperature + measurement + instruments is defined in a thesaurus as a synonym for the term thermometer. In a synonym query for thermometer, the query is expanded to:
{thermometer},({temperature}&{measurement}&{instruments})
Note: In a thesaurus, compound phrases can only be defined in synonym relationships for a term.
For example, the term building has a preferred term of construction in a thesaurus. A PT query for building returns all documents that contain the word construction. Documents that contain the word building are not returned.
For example, the term building has a preferred term of construction in a thesaurus. A PT query for building returns all documents that contain the word construction. Documents that contain the word building are not returned.
Note: The hierarchy can contain three separate branches, represented by the three broader term operators. During a broader term query, the specified operator only searches up the designated branch of the hierarchy.
The following excerpt illustrates a one-step query which returns all documents that contain the term tutorial or the BT term defined for tutorial in the DEFAULT thesaurus:
...CONTAINS(textcol, 'BT(tutorial)')...
Note: The hierarchy can contain three separate branches, represented by the three narrower term operators. During a narrower term query, the specified operator only searches down the designated branch of the hierarchy.
The following on-step query excerpt illustrates a query which returns all documents that contain either the term tutorial or any of the NT terms defined for tutorial in the DEFAULT thesaurus:
...CONTAINS(textcol, 'NT(tutorial)')...
If the qualifier is not specified for a homograph in a broader or narrower term query, the query expands to include all of the broader/narrower terms for the homograph.
For example, if machine is a broader term for crane (building equipment) and bird is a broader term for crane (waterfoul):
BT(crane) expands to {crane},{machine},{bird}
Using the previous example:
BT(crane{(waterfoul)}) expands to {crane},{bird}
Note: When specifying a qualifier in a broader or narrower term query, the qualifier and its notation (parentheses) must be escaped, as is shown in this example.
For example, the term tutorial has a top term of learning systems in the standard hierarchy of a thesaurus. A TT query for tutorial returns all documents that contain the phrase learning systems. Documents that contain the word tutorial are not returned.
This sequence of look-up enables base-letter queries to work independent of whether the thesaurus is in base-letter form. However, if the keys in the thesaurus are in base letter form, these keys will not match the corresponding non-base letter form query terms. When you have a base-letter thesaurus, you must specify the base-letter form in the query.
Within query expressions, operators have the following order of evaluation from highest precedence to lowest:
| NEAR | ; |
| STEM | $ |
| FUZZY | ? |
| SOUNDEX | ! |
| EQUIVALENCE | = |
| Control operators | > * : |
| MINUS | - |
| AND | & |
| OR | | |
| ACCUMULATE | , |
Precedence Examples
In the first example, because AND has a higher precedence than OR, the query returns all documents that contain w1 and all documents that contain both w2 and w3.
In the second example, the query returns all documents that contain both w1 and w2 and all documents that contain w3.
In the third example, the FUZZY operator is first applied to w1, then the AND operator is applied to arguments w3 and w4, then the OR operator is applied to term w2 and the results of the AND operation, and finally, the score from the FUZZY operation on w1 is added to the score from the OR operation.
In the fourth example, term w1 is weighted by a factor of 5; results are only returned if the score is greater than 35; and only the 8 highest scoring documents that meet the search criteria will be returned.
For example, the following abbreviated one-step query finds all terms beginning with the pattern scal in a column named TEXT:
...contains(TEXTCOL, 'scal%') > 0
Note: If a wildcard character in a query results in a stopword being returned, the stopword is not included in the query or highlighted by CTX_QUERY.HIGHLIGHT..
For example, the open parenthesis indicates the beginning of a group. The first close parenthesis encountered is the end of the group. Any open parentheses encountered before the close parenthesis indicate nested groups.
Brackets perform the same function as the parentheses, but prevent penetration for the expansion operators.
Braces perform the same function as the parentheses, but also act as escape characters for including reserved words and special characters in query expressions.
For more information about how the braces are used in a query expression, see "Reserved Words and Special Characters" in this chapter.
| Operator | Symbol | Equivalent | Description |
| Stored Query Expression | SQE | * NONE * | Returns the stored results of an SQE |
The advantage of calling an SQE in a query expression, rather than specifying query terms, is that the results are typically returned more quickly because ConText Option does not have to query the text table directly.
In addition, SQEs can be used to perform iterative queries, in which an initial query is refined using one or more additional queries.
SQE(SQE_name)
where the SQE operator is followed by the name of the SQE, enclosed in parentheses.
exec ctx_query.store_sqe('emp_resumes', 'prog_lang', \
'cobol', 'session');This SQE queries the text column for the EMP_RESUMES policy (in this case, EMP.RESUMES) and returns all documents that contain the term cobol. It stores the results in the SQE table for the policy.
PROG_LANG can then be called within a query expression in a query:
select score, docid from emp where contains(resume, 'sqe(prog_lang)')>0 order by score;
| Operator | Symbol | Equivalent | Description |
| PL/SQL indicator | @ | execute | Executes the PL/SQL function following the notation |
@ owner_name.function_name(arg1, arg2,...argN)
The PL/SQL symbol (@) is followed immediately by the name of the PL/SQL function and the arguments, if any, for the function. In keeping with the notational conventions for PL/SQL, the arguments for the function must be enclosed in parentheses.
Note: The PL/SQL function must be fully qualified with the username of the function owner.
All characters (including grouping characters) included within the parentheses of the PL/SQL function are not considered as query operands or operators by ConText Option. Any characters outside the PL/SQL parentheses are processed according to the grouping rules used by ConText Option.
This function could be used in a query expression to return documents that contain the acronym or its fully-expanded text equivalent.
The following example of a one-step query returns all documents in TEXTCOL that contain either IBM or International Business Machines:
select score(10), title from articles where contains (textcol, 'execute ctxuser.convert(IBM), IBM', 10) > 0;
To make this sample work, a basic two column dictionary table named TRANSLATOR must be built. In this table, the first column of each row contains a single, unique English word. The second column contains the French equivalent of the English term.
The sample is simply a table look-up of an input word to get its corresponding French equivalent. To get French to English translations, another dictionary could hold the French words in column 1 and the English equivalents in column 2. The function could be designed to select the correct look-up table from the user's input.
CREATE OR REPLACE FUNCTION french
(English_word IN VARCHAR2)
RETURN VARCHAR2 AS
search_expr VARCHAR2(200) ;
french_word VARCHAR2(80) ;
begin
select FRENCH into french_word FROM translator
WHERE ENGLISH = English_word ;
search_expr := '(' || English_word || '|' || French_word || ')' ;
return (search_expr) ;
EXCEPTION
WHEN OTHERS THEN
search_expr := English_word ;
end ; This FRENCH function can be plugged into any query to do a transparent translation of an English word to French.
For example:
select POLICY_NO from insurance where contains (TEXTCOL, '@ctxsys.french(right)')>0;
In this example of a one-step query, the user searches a text column named TEXTCOL for automobile accidents in France involving right hand turns. Since the database being searched is French, the user wants to search for the French word for right (droite), which is returned by the function.
A slightly more sophisticated query could look for either the English or French word since the exact contents of the database might not be known.
Any symbols used to represent operators and any other query expression components are considered to be special characters that must be escaped if used as literals in query expressions.
In addition, the English word equivalents (and, or, minus, accum, and execute) for the logical operators and the PL/SQL indicator are considered to be reserved words and must be escaped.
In the following examples, an escape sequence is necessary because each expression contains a query expression operator or reserved symbol:
{AT&T}
& is an operator symbol
{big-tree}
- is an operator symbol
{big_tree}
_ is a reserved symbol
It is not necessary to have more than one escape sequence in a query (i.e., nesting) because everything within a set of braces is considered part of the escape sequence. For example, AT{&}T is the same as {AT&T} or Peter{,} Paul{,} {and} Mary is the same as {Peter, Paul, and Mary}.
The open brace { signals the beginning of the escape sequence, and the closed brace } indicates the end. Everything between the opening brace and the closing brace is part of the query expression (including any open brace characters). To include the close brace character in a query expression, use }}.
Note: Characters and reserved words within the braces are treated as literals and have no other meaning or effect when searching text; special characters (e.g., a comma or an ampersand) are ignored by ConText Option during queries.
In the expression 'peter, paul and mary' without braces, the comma and the word and are ConText Option operators. ConText Option evaluates the score as the score for paul and mary accumulated with the score for the peter.
If the expression is written as '{peter, paul and mary}', then ConText Option evaluates the score based on the single phrase peter paul and mary. Note that the comma has been treated as a literal special character and does not appear in the final search phrase.
|
Prev Next |
Copyright © 1996 Oracle Corporation. All Rights Reserved. |
Library |
Product |
Contents |
Index |