Oracle Context Option Application Developer's Guide Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index



Go to previous file in sequence Go to next file in sequence

CHAPTER 3. Understanding Query Expressions


This chapter explains how to use ConText Option to create query expressions to find relevant text in documents.

The topics covered in this chapter are:

About Query Expressions

A query expression defines the search criteria for retrieving documents using ConText Option. A query expression consists of query terms (words and phrases) and other components such as operators and special characters which allow users to specify exactly which documents are retrieved by ConText Option.

A query expression can also call Stored Query Expressions (SQEs) to return stored query results or call PL/SQL functions to return values used in the query.

When a query is executed using any of the methods supported by ConText Option, one of the arguments included in the query is a query expression. ConText Option then returns a list of all the documents that satisfy the search criteria, as well as scores that measure the relevance of the document to the search criteria.

Query Terms

Query terms can consist of words and phrases. Query terms can also contain stopwords.

Words and Phrases

The words in a query expression are the individual tokens on which the query expression operators perform an action. If multiple words are contained in a query expression, separated only by blank spaces (no operators), the string of words is considered a phrase and the entire string is searched for during a query.

Stopwords

Stopwords are common words, such as and, the, of, and to, that are not considered significant query terms by themselves because they occur so often in text. However, stopwords can provide useful search information when combined with more significant terms.

For example, a query for documents containing the phrase peanut butter and jelly returns different results than a query for documents containing the terms peanut butter and jelly.

When you define a policy for a column, ConText Option lets you identify a list of stopwords. When stopwords are encountered in the documents in the column, they are not included as indexed terms in the text index; however, they are recorded.

As a result, stopwords cannot be searched for explicitly in text queries, but can be included as part of a phrase in a query expression.

Attention: Query expressions made up of only stopwords will result in errors. Ensure that at least one of the terms in your query expression is not a stopword.

Stoplists can be created in any language supported by ConText Option. ConText Option provides a default stoplist in English.

Note: Stopwords do not have an affect on the theme indexes generated by ConText Option for your English-language documents.

Query Expression Components

In addition to query terms, a query expression may contain any or all of the following components:

Component Purpose
Operators Define the relationships between the terms in a query expression and specify the output returned by the query
Wildcard Characters Expand query terms using pattern matching
Grouping Characters Group terms and operators in a query expression
Stored Query Expressions (SQEs) Return the results of a query that has been executed and the results stored in an SQE table
PL/SQL Functions Execute a function and use the results in a query expression
Table 3 - 1. Query Expression Components



Base-Letter Queries

For languages that use an 8-bit character set, such as French and Spanish, Context gives you the option of converting characters to their base-letter representation before text indexing. This means that words with accents, umlauts, and so on are converted to their base-letter representation before their tokens are placed in the text index.

When a text index that has used base-letter conversion is specified in a query, ConText Option converts the term in the query expression to match the base-letter representation before the query is processed. In addition, all expansion and stopword checking for the query is performed on the base-letter terms.

Note: The terms in a thesaural query are not converted to base-letter representation before look-up in the thesaurus is performed. The base-letter conversion takes place after the thesaurus look-up and is performed on all the terms returned by the thesaurus.

For more information about creating an index that supports base-letter conversion, see Oracle ConText Option Administrator's Guide.

Query Expression Examples

The following example of a one-step query returns all articles that contain the word wine in the TEXTTAB.TEXT_COLUMN column. The query expression consists only of the query term wine, surrounded by single quotes.

	SELECT articles FROM texttab
	WHERE CONTAINS(textcol, 'wine') > 0;

The following example of a one-step query returns all articles that contain the phrase wine and roses in the TEXTTAB.TEXT_COLUMN column. The query expression consists of the query phrase wine and roses, surrounded by single quotes.

	SELECT articles FROM texttab
	WHERE CONTAINS(textcol, '{wine and roses}') > 0;

For more information about the CONTAINS function used in one-step queries, see "SQL Functions (Chapter 10)."

Operators

Your application can let the user specify these parameters at run-time.

Operators perform two functions: they define the relationships between the terms in a query expression and they control the output returned by the query.

There are five types of operators:

Logical Operators

Logical operators combine the terms in a query expression. All single words and phrases may be combined with logical operators.

When query terms are combined, the number of spaces around the logical operator is not significant.

The order of operands is not significant except for the MINUS operator. For example, 'A and B' is equivalent to 'B and A', but 'A minus B' is not the same as 'B minus A'.

Logical operators link query terms together to produce scores that are based on the relationship of the terms to each other. The logical operators combine the scores of their operands up to a maximum value of 100. Operands can be any query terms, as well as other operators.

Operator Syntax Description
AND term&term
term and term
Returns documents that contain one term and another. Returns the minimum score of its operands. All query terms must occur; lower score taken
OR term|term
term or term
Returns documents that contain one term or another. Returns the maximum score of its operands. At least one term must exist; higher score taken
ACCUMULATE term,term
term accum term
Returns documents that contain one term or another. Calculates score by adding its operands. Similar to OR, but if terms co-exist, score is combined
EQUIVALENCE term=term
term equiv term
Used to specify acceptable substitutions of words in a search.
MINUS term-term
term minus term
Returns the score of the right query term subtracted from the score of the left query term.
NOT term~term
term not term
Returns documents that contain one term and not another.
Table 3 - 2. Logical Operators



Attention: If any symbols or word equivalents for the logical operators are to be included as search terms in the query expression rather than used as operators, the symbols/words must be escaped by enclosing them in braces { }. The following examples illustrate how a phrase containing a operator symbol or equivalent can be queried:

'sense {and} sensibility'
'{peter, paul, and mary}'
'{pride & prejudice}'

AND Queries

Use the AND operator to search for documents that contain at least one occurrence of each of the query terms. For example:

	'batman & robin & penguin'
	'jupiter and ganymede'

OR Queries

Use the OR operator to search for documents that contain at least one occurrence of any of the query terms. For example:

	'cats | dogs'
	'california or oregon'

ACCUMULATE Queries

Use the ACCUMULATE operator to search for documents that contain at least one occurrence of any of the query terms. For example:

	'cat, dog, horse'
	'california accum election accum convention'

ACCUMULATE is similar to OR, in the sense that a document satisfies the query expression if any of the terms occur in the document; however, the scoring is different.

OR returns a score based only on the query term that occurs most frequently in a document. ACCUMULATE combines the scores for all the query terms that occur in a document.

EQUIVALENCE Queries

Use the equivalence operator to specify an acceptable substitution for a word in a search. For example, if you want all the documents that contain the phrase alsatians are big dogs or labradors are big dogs, you can write:

'labradors=alsatians are big dogs'

ConText processes the above query faster and more efficiently than the same query written with the accumulate operator. For example, you could write the above query less efficiently and less concisely as follows:

'labradors are big dogs, alsatians are big dogs'

The savings you gain in using the equivalence operator over the accumulate operator is most significant when you have more than one equivalence operator in the query expression. For example, the following query

'labradors=alsatians are big canines=dogs'

is a more efficient, more concise form of:

'labradors are big dogs, 
alsatians are big dogs'
alsatians are big canines'
labradors are big canines'

Precedence of Equivalence Operator The equivalence operator has higher precedence that all other operators except the unary operators (FUZZY, SOUNDEX, STEM, and PL/SQL function calls).

MINUS Queries

Use the MINUS operator to search for documents that contain at least one occurrence of each of the query terms. For example:

	'poisons - arsenic'
	'swimming minus sharks'

MINUS is similar to AND in the sense that a document satisfies the query expression only if both terms occur in the document; however, the scoring is different.

AND returns a score based on the intersection of the two query terms. MINUS returns a score which is calculated by subtracting the number of occurrences of the second query term from the number of occurrences of the first term.

NOT Queries

Use the NOT operator to search for documents that contain one query term and not another.

For example, to obtain the documents that contain the term animals but not dogs, use the following expression:

	'animals ~ dogs'

Similarly, to obtain the documents that contain the term transportation but not automobiles or trains, use the following expression:

	'transportation not (automobiles or trains)'

Note: The NOT operator does not affect the scoring produced by the other logical operators.

Scoring for Logical Operators

ConText Option calculates the relevance score for each document that meets the selection criteria specified in the query expression.

The following table shows how the scores are calculated for six different documents based on the logical operators. The NOT operator is not included in this example because it has no affect on score.

Note: Each occurrence of a query term or phrase scores 10.

Text Query Expression
bat & cave bat | cave bat , cave cave - bat bat - cave
bat cave
bat cave
cave
cave
20 40 60 20 0
bat
cave
cave
10 20 30 10 0
bat
bat
cave
10 20 30 0 10
bat
cave
10 10 20 0 0
bat 0 10 10 0 10
cave 0 10 10 10 0
Table 3 - 3. Scoring Example for Logical Operators



The results are:

Proximity Operators

Words or phrases that occur close together are considered to be more closely associated than those that are farther apart. The proximity operator calculates a score based on how close words are to each other rather than on how often the word or phrase appears in the document.

Operator Symbol Description
NEAR term;term
term near term
Returns a higher score if word operands are near each other.
Table 3 - 4. Proximity Operator



The score for a document is the highest score generated for all the query terms that occur within the document in proximity to each other. A score of 100 is returned when the query terms are adjacent.

For example, if the query expression is 'ice;cream', the phrase I love ice cream would score 100, while the phrase ice is colder than cream would score lower. If both phrases occurred in a document, the document would be retrieved by ConText Option and the score for the document would be 100.

Control Operators

The control operators regulate the results returned by the query expression operand on the left based on the value of a qualifier operand on the right.

Operator Syntax Description
WEIGHT term*n Assigns a relative importance to an operand; 0.1 to 10 can be designated
THRESHOLD term>n No documents which score below the threshold are returned
MAX term:n Returns the specified number of documents; for example, :20 means the top 20 documents on the HITLIST are to be returned. A MAX value must be an integer between 1 and 232 - 1
FIRST/NEXT term#n-n Returns the specified number of documents that contain the query term(s), beginning with the first n specified documents
Table 3 - 5. Control Operators



WEIGHT Queries

To search for documents that refer to cats or dogs with more emphasis given to references to cats, use the WEIGHT operator as in the following expression:

	'cat*3 | dog*1'

THRESHOLD Queries

To search for documents that refer to relational databases and only return documents if the score is greater than 75, use the following expression:

	'relational databases > 75'

Combined THRESHOLD and WEIGHT Queries

To search for documents that refer to Oracle and Sybase (weighting Oracle at 3 units and Sybase at .1 units) and display documents only if the score value is greater than 50, enter the following query expression:

	'(oracle*3 & sybase*.1) > 50'

MAX Queries

In the following example, the twenty highest scoring documents that contain the word dance will be placed in the results table.

	'dance:20'

Note: Because the MAX operator acts on a query result set, you must place it at the outermost level of the query; you cannot embed it within the query expression.

In addition, the MAX operator cannot be used with COUNT_QUERY or in-memory queries.

Combined THRESHOLD and MAX Queries

To search for documents that refer to dogs and cats where the following conditions are met:

use the following query expression:

	'((dog*2 > 95) & cat) > 75 : 40'

The first component of the expression, (dog*2 > 95), specifies that only documents that have more than 5 references to dogs will be returned (because each reference scores 10 and the weight of each reference is doubled).

The second component, & cat, specifies that the returned documents must also have references to cats.

The third component, ) > 75, specifies that the lower score of dogs and cats must be at least 75 (because the AND operator returns the lower score of the two operands). Therefore, since all documents referring to dogs must have scored at least 95, the documents meeting the AND condition must have at least 8 references to cats (each reference scores 10 and the total must be greater than 75).

FIRST/NEXT Queries

The FIRST/NEXT operator returns a specified range of documents that contain the query terms in a query expression. It can be used to create an application interface in which query results (hitlists) are returned incrementally. Because the query results are returned incrementally, query response is generally faster, the application can display the hitlists in a more manageable size, and control can be returned to the user more quickly.

Note: The FIRST/NEXT operator cannot be used with COUNT_QUERY or in-memory queries.

The FIRST/NEXT operator requires a range as a qualifier and returns all documents that fall within the specified range:

	query_term#start_range-end_range

Note: The order of the returned documents is not based on score or textkey. ConText Option returns the documents based on the order in which it encounters the documents in the queried text column.

For example, to return the first 10 documents encountered by ConText Option that contain the term dog, use the following expression:

	'dog#1-10'

You could then return the next 10 documents using the following expression:

	'dog#11-20'

Note: Because the FIRST/NEXT operator acts on a query result set, you must place it at the outermost level of the query; you cannot embed it within the query expression.

Combined FIRST/NEXT and MAX Queries

You can use the FIRST/NEXT operator extract chunks of a sorted hitlist returned by the MAX operator. For example, if you use the MAX operator to return to you only the first 50 documents that contain the term cat, you can extract the first 10 documents from the 50 as follows:

	'cat:50#1-10'

Note: Placing the MAX operator inside the FIRST/NEXT operator as such is the only instance in which you can embed the MAX operator in a query expression.

Expansion Operators

The expansion operators expand a query expression to include variants of the query term supplied by the user. There are three kinds of expansion operators:

Operator Syntax Description
STEM $term Expands a query to include all terms having the same stem or root word as the specified term.
SOUNDEX !term Expands a query to include all terms that sound the same as the specified term (English-langugage text only).
FUZZY ?term Expands a query to include all terms with similar spellings as the specified term (English-langugage text only).
Table 3 - 6. Expansion Operators



The expansion operators are unary operators. They may be used in combination with each other and with any other operators described in this chapter. In addition, searches can be broadened by performing an expansion on an expansion.

The methods used by the expansion operators to perform stemming, fuzzy matching, and soundex matching for a text column are determined by the Wordlist preference in the policy for the column.

For more information about setting up preferences and policies, see Oracle ConText Option Administrator's Guide.

STEM Expansions

Use the STEM ($) operator to search for terms that have the same linguistic root as the query term.For example:

Input: Expands to:
$scream scream screaming screamed
$distinguish distinguish distinguished distinguishes
$guitars guitars guitar
$commit commit committed
$cat cat cats
$sing sang sung sing
Table 3 - 7. Examples of STEM Expansions



The ConText Option stemmer, licensed from Xerox Corporation's XSoft Division, supports the following languages: English, French, Spanish, Italian, German, and Dutch.

Note: If STEM returns a stopword, the stopword is not inlcuded in the query or highlighted by CTX_QUERY.HIGHLIGHT.

SOUNDEX Expansions

The SOUNDEX (!) operator enables searches on words that have similar sounds; that is, words that sound like other words. This function allows comparison of words that are spelled differently, but sound alike in English.

Soundex in ConText Option uses the same logic as the SOUNDEX function in SQL to search for words that have a similar sound. It returns all words in a text column that have the same SOUNDEX value.

The following example illustrates the results that could be returned for a one-step query that uses SOUNDEX:

	SELECT ID, COMMENT FROM EMP_RESUME
	WHERE CONTAINS (COMMENT, '!SMYTHE') > 0
	ID COMMENT
	-- ------------
	23 Smith is a hard worker who....

Note: SOUNDEX works best for languages that use a 7-bit character set, such as English. It can be used, with lesser effectiveness, for languages that use an 8-bit character set, such as many Western European languages.

For more information about the SOUNDEX function in SQL, see Oracle7 Server SQL Reference.

FUZZY Expansions

FUZZY (?) expansions generate words that are spelled similarly. This type of expansion is helpful for finding more accurate results when there are frequent misspellings in the documents in the database.

Unlike the STEM expansion, the number of words generated by a FUZZY search depends on what is in the text index; results can vary significantly according to the contents of the database index.

For example:

Input: Expands to:
?cat cat cats calc case
?feline feline defined filtering
?apply apply apple applied April
?read lead real
Table 3 - 8. Examples of FUZZY Expansions



Note: FUZZY works best for languages that use a 7-bit character set, such as English. It can be used, with lesser effectivemess, for languages that use an 8-bit character set, such as many Western European languages. Also, the Japanese lexer provides limited fuzzy matching.

In addition, if FUZZY returns a stopword, the stopword is not inlcuded in the query or highlighted by CTX_QUERY.HIGHLIGHT.

Penetration in Expansion Operators

Penetration allows complex query expansions to be expressed in short concise notation. Penetration is a system of notation for query expressions and does not affect the meaning of the expansion operators or the order in which operations are performed; it is a tool to help developers generate non-ambiguous queries using the expansion operators.

Penetration applies the expansion operators to each term within an explicit expression (i.e., an expression delimited by parentheses or braces). Any expansion operators outside an expression delimited by parentheses ( ) or braces { } is applied to each word or phrase inside the expression.

For example:

Query Before Penetration Query After Penetration
?(dog, cat, mouse) ?dog , ?cat , ?mouse
?(dog,!(cat & mouse)) ?dog , (!?cat & !?mouse)
?((cat=feline) meows) (?cat = ?feline) ?meows
Table 3 - 9. Examples of Penetration



In the first example, a FUZZY expansion is performed on each term.

In the second example, a FUZZY expansion is performed on each term and a SOUNDEX expansion is performed only on the terms cat and mouse because cat and mouse are enclosed in a separate set of parentheses.

Note: Expansion operators do not penetrate expressions delimited by brackets [ ].

Base-letter Support

If you have base-letter conversion specified for a text column and the query expression contains a SOUNDEX or FUZZY operator, ConText Option operates on the base-letter form of the query; however, the STEM operator does not support base-letter conversion.

Thesaurus Operators

The thesaurus operators expand a query for a single term (word or phrase) using a thesaurus that defines relationships between the user-specified term and other semantically related terms.

There are eight kinds of thesaurus operators, corresponding to the eight types of relationships that can be defined in an ISO2788 standard thesaurus.

Operator Syntax Description
SYNONYM SYN(term[,thes]) Expands a query to include all the terms defined in the thesaurus as synonyms for the specified word
PREFERRED TERM PT(term[,thes]) Replaces the specified word in a query with the term defined in the thesaurus as the preferred term for the specified word.
RELATED TERM RT(term[,thes]) Expands a query to include all the terms defined in the thesaurus as a related term for the specified word.
TOP TERM TT(term[,thes]) Replaces the specified word in a query with the term defined in the thesaurus as the top term in the standard hierarchy (BT, NT) for the specified word.
BROADER TERM BT(term[,level[,thes]]) Expands a query to include the term defined in the thesaurus as a broader term for the specified word
NARROWER TERM NT(term[,level[,thes]]) Expands a query to include all the lower level terms defined in the thesaurus as narrower terms for the specified word
BROADER GENERIC TERM BTG(term[,level[,thes]]) Expands a query to include all terms defined in the thesaurus as a broader generic terms for the specified word.
NARROWER GENERIC TERM NTG(term[,level[,thes]]) Expands a query to include all the lower level terms defined in the thesaurus as narrower generic term for the specified word
BROADER PARTITIVE TERM BTP(term[,level[,thes]]) Expands a query to include all the terms defined in the thesaurus as broader partitive terms for the specified word.
NARROWER PARTITIVE TERM NTP(term[,level[,thes]]) Expands a query to include all the lower level terms defined in the thesaurus as narrower partitive term for the specified word
Table 3 - 10. Thesaurus Operators



Internally, ConText Option processes the expansion by bracketing each individual term returned by the expansion, then the terms are accumulated together using the ACCUMULATE operator.

For example, if bird, birdy, and avian are all synonyms:

SYN(bird) is expanded to {bird},{avian},{birdy}

If a term in a thesaural query does not have corresponding entries in the specified thesaurus, no expansion is produced and the term itself is used in the query.

For more information about thesaural relationships and creating thesauri, see Oracle ConText Option Administrator's Guide.

Limitations

The thesaurus operators can be used in conjunction with all the other query expression operators and special characters supported by ConText Option, with the exception of the NEAR operator.

The maximum length of the expanded query is 32000 characters.

Thesaural operations cannot be nested. For example, the following query is not allowed.

SYN(BT(bird))

Thesaurus Arguments

The thesaurus operators are implemented in ConText Option as PL/SQL functions, and, as such, have arguments that must be specified with the operator. All of the notational conventions and usage rules for PL/SQL apply to the thesaurus operators.

The thesaurus operators have the following arguments:

term

is the operand for the thesaurus operator. A term must be specified when using the NT operator.

For PREFERRED TERM (PT) and TOP TERM (TT) queries, term is replaced by the preferred term/top term defined for the term in the specified thesaurus; however, if no PT or TT entries are defined for the term, the term is not replaced and is used in the query.

For all other thesaural queries, term is expanded to include the synonymous, related, broader, or narrower terms defined for the term in the specified thesaurus.

level

is the number of levels traversed in the thesaurus hierarchy to return the broader (BT, BTG, BTP) or narrower (NT, NTG, NTP) term for the specified term.

For example, a level of 1 in a BT query returns only the broader term, if one exists, for the specified term. A level of 2 returns the broader term for the specified term, as well as the broader term, if one exists, for the broader term.

The level argument is optional and has a default value of one (1). Zero or negative values for the level argument return only the original query term.

Note: The level argument is used only in broader and narrower term queries.

thes

is the name of the thesaurus used to return the expansions for the specified term.

The thes argument is optional and has a default value of DEFAULT. As a result, a thesaurus named DEFAULT must exist in the thesaurus tables before using any of the thesaurus operators.

SYNONYM Queries

Use the SYNONYM operator (SYN) to expand a query to include all the terms that have been defined in a thesaurus as synonyms for a specified term.

The following excerpt illustrates a one-step query which returns all documents that contain the term tutorial or any of the synonyms defined for tutorial in the DEFAULT thesaurus:

	...CONTAINS(textcol, 'SYN(tutorial)')...

Compound Phrases in Synonym Queries

Expansion of compound phrases for a term in a synonym query are returned as AND conjunctives.

For example, the compound phrase temperature + measurement + instruments is defined in a thesaurus as a synonym for the term thermometer. In a synonym query for thermometer, the query is expanded to:

{thermometer},({temperature}&{measurement}&{instruments})

Note: In a thesaurus, compound phrases can only be defined in synonym relationships for a term.

PREFFERED TERM Queries

Use the PREFERRED TERM operator (PT) to replace a term in a query with the preferred term that has been defined in a thesaurus for the term.

For example, the term building has a preferred term of construction in a thesaurus. A PT query for building returns all documents that contain the word construction. Documents that contain the word building are not returned.

RELATED TERM Queries

Use the RELATED TERM operator (RT) to expand a query to include all terms with the preferred term that has been defined in a thesaurus for the term.

For example, the term building has a preferred term of construction in a thesaurus. A PT query for building returns all documents that contain the word construction. Documents that contain the word building are not returned.

Broader Term Queries

Use the broader term operators (BT, BTG, BTP) to expand a query to include the term that has been defined in a thesaurus as the broader or higher level term for a specified term. They can also expand the query to include the broader term for the broader term and the broader term for that broader term, and so on up through the thesaurus hierarchy.

Note: The hierarchy can contain three separate branches, represented by the three broader term operators. During a broader term query, the specified operator only searches up the designated branch of the hierarchy.

The following excerpt illustrates a one-step query which returns all documents that contain the term tutorial or the BT term defined for tutorial in the DEFAULT thesaurus:

	...CONTAINS(textcol, 'BT(tutorial)')...

Narrower Term Queries

Use the narrower term operators (NT, NTG, NTP) to expand a query to include all the terms that have been defined in a thesaurus as the narrower or lower level terms for a specified term. They can also expand the query to include all of the narrower terms for each narrower term, and so on down through the thesaurus hierarchy.

Note: The hierarchy can contain three separate branches, represented by the three narrower term operators. During a narrower term query, the specified operator only searches down the designated branch of the hierarchy.

The following on-step query excerpt illustrates a query which returns all documents that contain either the term tutorial or any of the NT terms defined for tutorial in the DEFAULT thesaurus:

	...CONTAINS(textcol, 'NT(tutorial)')...

Broader and Narrower Term Queries on Homographs

If a homograph (a word or phrase with multiple meanings, but the same spelling) appears in two or more nodes in the same hierarchy branch of a thesaurus, a qualifier is required for each occurrence of the term in the branch.

If the qualifier is not specified for a homograph in a broader or narrower term query, the query expands to include all of the broader/narrower terms for the homograph.

For example, if machine is a broader term for crane (building equipment) and bird is a broader term for crane (waterfoul):

BT(crane) expands to {crane},{machine},{bird}

If the qualifier for a homograph is specified in a broader or narrower term query, only the broader/narrower terms for the qualified homograph are returned.

Using the previous example:

BT(crane{(waterfoul)}) expands to {crane},{bird}

Note: When specifying a qualifier in a broader or narrower term query, the qualifier and its notation (parentheses) must be escaped, as is shown in this example.

TOP TERM Queries

Use the TOP TERM operator (TT) to replace a term in a query with the top term that has been defined for the term in the standard hierarchy (BT, NT) in a thesaurus. Top terms in the generic (BTG, NTG) and partitive (BTP, NTP) hierarchies are not returned.

For example, the term tutorial has a top term of learning systems in the standard hierarchy of a thesaurus. A TT query for tutorial returns all documents that contain the phrase learning systems. Documents that contain the word tutorial are not returned.

Base-letter Support for Thesaural Queries

When ConText processes a query on a base-letter index and the expression contains a thesaurus operator, ConText looks up the query term in the thesaurus without converting the query to base-letter. The expansions obtained from the thesaurus are converted to base-letter and looked up subsequently within the index according to query rules.

This sequence of look-up enables base-letter queries to work independent of whether the thesaurus is in base-letter form. However, if the keys in the thesaurus are in base letter form, these keys will not match the corresponding non-base letter form query terms. When you have a base-letter thesaurus, you must specify the base-letter form in the query.

Operator Precedence

Query expressions are evaluated in order from left to right according to the precedence to their operators. Operators with higher precedence are applied first. Operators of equal precedence are applied in order of their appearance in the expression from left to right.

Within query expressions, operators have the following order of evaluation from highest precedence to lowest:

NEAR ;
STEM $
FUZZY ?
SOUNDEX !
EQUIVALENCE =
Control operators > * :
MINUS -
AND &
OR |
ACCUMULATE ,

Precedence Examples

Query Expression Order of Evaluation
w1 | w2 & w3 (w1) | (w2 & w3)
w1 & w2 | w3 (w1 & w2) | w3
?w1 , w2 | w3 & w4 (?w1) , (w2 | (w3 & w4))
w1 * 5 > 35 : 8 (w1 * 5) > 35 : 8
?abc = def ghi & jkl = mno ((?abc = def) ghi) & (jkl=mno)
Table 3 - 11. Examples of Operator Precedence



In the first example, because AND has a higher precedence than OR, the query returns all documents that contain w1 and all documents that contain both w2 and w3.

In the second example, the query returns all documents that contain both w1 and w2 and all documents that contain w3.

In the third example, the FUZZY operator is first applied to w1, then the AND operator is applied to arguments w3 and w4, then the OR operator is applied to term w2 and the results of the AND operation, and finally, the score from the FUZZY operation on w1 is added to the score from the OR operation.

In the fourth example, term w1 is weighted by a factor of 5; results are only returned if the score is greater than 35; and only the 8 highest scoring documents that meet the search criteria will be returned.

Altering Precedence

Precedence is altered by grouping characters as follows:

Wildcard Characters

Wildcard characters can be used in query expressions to expand word searches into pattern searches. The wildcard characters are:

The percent wildcard specifies that any characters may appear in multiple positions represented by the wildcard. The underscore wildcard specifies a single position in which any character may occur.

For example, the following abbreviated one-step query finds all terms beginning with the pattern scal in a column named TEXT:

	...contains(TEXTCOL, 'scal%') > 0

Note: If a wildcard character in a query results in a stopword being returned, the stopword is not included in the query or highlighted by CTX_QUERY.HIGHLIGHT..

Grouping Characters

The grouping characters control operator precedence by grouping query terms and operators in a query expression. The grouping characters are:

The beginning of a group of terms and operators is indicated by an open character from one of the sets of grouping characters. The ending of a group is indicated by the occurrence of the appropriate close character for the open character that started the group. Between the two characters, other groups may occur.

For example, the open parenthesis indicates the beginning of a group. The first close parenthesis encountered is the end of the group. Any open parentheses encountered before the close parenthesis indicate nested groups.

Brackets perform the same function as the parentheses, but prevent penetration for the expansion operators.

Braces perform the same function as the parentheses, but also act as escape characters for including reserved words and special characters in query expressions.

For more information about how the braces are used in a query expression, see "Reserved Words and Special Characters" in this chapter.

Stored Query Expressions

Stored query expressions (SQEs) can be called in a query expression to return the stored results of the SQE.

Operator Symbol Equivalent Description
Stored Query Expression SQE * NONE * Returns the stored results of an SQE
Table 3 - 12. SQEs in Query Expressions



The advantage of calling an SQE in a query expression, rather than specifying query terms, is that the results are typically returned more quickly because ConText Option does not have to query the text table directly.

In addition, SQEs can be used to perform iterative queries, in which an initial query is refined using one or more additional queries.

Syntax

The syntax for calling an SQE is:

	SQE(SQE_name)

where the SQE operator is followed by the name of the SQE, enclosed in parentheses.

Example

Suppose you create a session SQE named PROG_LANG:

	exec ctx_query.store_sqe('emp_resumes', 'prog_lang', \
'cobol', 'session');

This SQE queries the text column for the EMP_RESUMES policy (in this case, EMP.RESUMES) and returns all documents that contain the term cobol. It stores the results in the SQE table for the policy.

PROG_LANG can then be called within a query expression in a query:

	select score, docid from emp
	where contains(resume, 'sqe(prog_lang)')>0
	order by score;

PL/SQL in Query Expressions

Stored PL/SQL functions can be called in a query expression to produce results which are used in the query.

Operator Symbol Equivalent Description
PL/SQL indicator @ execute Executes the PL/SQL function following the notation
Table 3 - 13. PL/SQL Notation in Query Expressions



Syntax

The syntax for calling a PL/SQL function is:

	@ owner_name.function_name(arg1, arg2,...argN)

The PL/SQL symbol (@) is followed immediately by the name of the PL/SQL function and the arguments, if any, for the function. In keeping with the notational conventions for PL/SQL, the arguments for the function must be enclosed in parentheses.

Note: The PL/SQL function must be fully qualified with the username of the function owner.

All characters (including grouping characters) included within the parentheses of the PL/SQL function are not considered as query operands or operators by ConText Option. Any characters outside the PL/SQL parentheses are processed according to the grouping rules used by ConText Option.

Example

Suppose you create a function named CONVERT that takes an acronym as input and returns the fully-expanded version of the acronym. For example, CONVERT would return International Business Machines for the acronym IBM.

This function could be used in a query expression to return documents that contain the acronym or its fully-expanded text equivalent.

The following example of a one-step query returns all documents in TEXTCOL that contain either IBM or International Business Machines:

	select score(10), title from articles
	where contains
	(textcol, 'execute ctxuser.convert(IBM), IBM', 10) > 0;

PL/SQL Sample Code

The following sample code creates a PL/SQL function named FRENCH that takes a word in English and converts it to French.

To make this sample work, a basic two column dictionary table named TRANSLATOR must be built. In this table, the first column of each row contains a single, unique English word. The second column contains the French equivalent of the English term.

The sample is simply a table look-up of an input word to get its corresponding French equivalent. To get French to English translations, another dictionary could hold the French words in column 1 and the English equivalents in column 2. The function could be designed to select the correct look-up table from the user's input.

 CREATE OR REPLACE FUNCTION french  
		(English_word IN  VARCHAR2) 
      RETURN VARCHAR2  AS 
  search_expr  VARCHAR2(200) ; 
  french_word     VARCHAR2(80) ; 
 
begin 
  select FRENCH into french_word FROM translator 
		WHERE ENGLISH = English_word ; 
  search_expr := '(' || English_word || '|' || French_word || ')' ;
  return (search_expr) ; 
 
EXCEPTION 
  WHEN OTHERS THEN 
    search_expr := English_word ; 
end ; 

This FRENCH function can be plugged into any query to do a transparent translation of an English word to French.

For example:

	select POLICY_NO from insurance
	where contains (TEXTCOL, '@ctxsys.french(right)')>0;

In this example of a one-step query, the user searches a text column named TEXTCOL for automobile accidents in France involving right hand turns. Since the database being searched is French, the user wants to search for the French word for right (droite), which is returned by the function.

A slightly more sophisticated query could look for either the English or French word since the exact contents of the database might not be known.

Reserved Words and Special Characters

Reserve words and characters have special meaning to query expressions and cannot be included in query expressions as query terms without explicitly identifying them with braces { }. Words and characters enclosed in braces are referred to as escape sequences.

Any symbols used to represent operators and any other query expression components are considered to be special characters that must be escaped if used as literals in query expressions.

In addition, the English word equivalents (and, or, minus, accum, and execute) for the logical operators and the PL/SQL indicator are considered to be reserved words and must be escaped.

In the following examples, an escape sequence is necessary because each expression contains a query expression operator or reserved symbol:

{AT&T}

& is an operator symbol

{big-tree}

- is an operator symbol

{big_tree}

_ is a reserved symbol

It is not necessary to have more than one escape sequence in a query (i.e., nesting) because everything within a set of braces is considered part of the escape sequence. For example, AT{&}T is the same as {AT&T} or Peter{,} Paul{,} {and} Mary is the same as {Peter, Paul, and Mary}.

The open brace { signals the beginning of the escape sequence, and the closed brace } indicates the end. Everything between the opening brace and the closing brace is part of the query expression (including any open brace characters). To include the close brace character in a query expression, use }}.

Note: Characters and reserved words within the braces are treated as literals and have no other meaning or effect when searching text; special characters (e.g., a comma or an ampersand) are ignored by ConText Option during queries.

In the expression 'peter, paul and mary' without braces, the comma and the word and are ConText Option operators. ConText Option evaluates the score as the score for paul and mary accumulated with the score for the peter.

If the expression is written as '{peter, paul and mary}', then ConText Option evaluates the score based on the single phrase peter paul and mary. Note that the comma has been treated as a literal special character and does not appear in the final search phrase.




Go to previous file in sequence Go to next file in sequence
Prev Next
Oracle
Copyright © 1996 Oracle Corporation.
All Rights Reserved.
Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index