Oracle Context Option Application Developer's Guide

Library

Product

Contents

Index

CHAPTER 2. Query Methods

Using Two-Step Queries
Using One-Step Queries
Using In-Memory Queries
Query Issues

This chapter describes the different query methods you can use in your ConText Option application. You can use these methods with text queries and theme queries. The following topics are covered:

Using Two-Step Queries

Using One-Step Queries

Using In-Memory Queries

Query Issues

Using Two-Step Queries

A two-step query uses the following methodology:

CTX_QUERY.CONTAINS

result table

Note: The result table must be created before the CONTAINS procedure is executed.

2. Use a SELECT statement on the result table (and the base text table, if desired) to return the specified columns as a hitlist for the rows (documents) that satisfy the query expression.

Figure 2 - 1. Two-Step Queries

Two-Step Query Example

The following example (diagrammed in Figure 2 - 2) shows a simple two-step query. The query uses a policy named ARTICLES_POL to search the text column in a table named TEXTTAB for any articles that contain the word petroleum. Note that before the two-step query example is executed, the result table, CTX_TEMP, is created:

	create table CTX_TEMP(

		textkey varchar2(64),

		score number,

		conid number);

	execute ctx_query.contains('ARTICLE_POLICY', \

		'petroleum', 'CTX_TEMP')

	SELECT SCORE, title 
	FROM CTX_TEMP, TEXTTAB
	WHERE texttab.PK=ctx_temp.textkey
	ORDER BY SCORE DESC;

In this example, the articles with the highest scores appear first in the hitlist because the results are sorted by score in descending order.

Figure 2 - 2. Diagram of a Two-Step Query

SELECT from a Pre-defined View

There is an alternative to step 2 of a two-step query. Rather than joining the result table and text table in a SELECT statement, create a view to perform the join. Then use a SELECT statement to select the appropriate rows from that view. Use this approach when the development tool does not allow tables to be joined in a SELECT statement (e.g. Oracle Forms).

For example:

	CREATE VIEW SURVEY AS SELECT * FROM TEXTTAB, CTX_TEMP

	WHERE PK = TEXTKEY;

	SELECT SCORE, AUTHOR FROM SURVEY

	ORDER BY SCORE DESC;

In this example:

the CREATE VIEW statement joins the table of articles (TEXTTAB) and the result table (CTX_TEMP). The PK column holds the primary key of the documents

The tables are joined using PK and TEXTKEY

the SELECT statement retrieves the scores from the result table

Composite Textkey Queries

To execute a two-step query on a table with a composite textkey, you specify the multiple textkey columns when you create the policy for the text column.

For more information about creating policies for composite textkey tables, refer to the Oracle ConText Option Administrator's Guide.

In addition, before the two-step query, create a result table in which the number of TEXTKEY columns match the number of columns in the composite textkey in the document table. You can create the result table manually or using the CTX_QUERY.GETTAB procedure.

For example, to manually create a result table with a composite textkey consisting of two columns:

	create table CTX_TEMP2(

		textkey varchar2(64),

		textkey2 varchar2(64),

		score number,

		conid number);

In the two-step query, use the AND operator in the WHERE condition when you join the result and text tables. For example:

	exec ctx_query.contains('ARTICLE2_POLICY',\

	'petroleum',\

	'CTX_TEMP2')

	SELECT SCORE, title 
	FROM CTX_TEMP2, TEXTTAB2
	WHERE texttab2.PK=ctx_temp2.textkey AND
	      texttab2.PK2=ctx_temp2.textkey2
	ORDER BY SCORE DESC;

Structured Queries

A structured query in a two-step query is a query based on one or more structured data columns in the same table as the text column being quiered. For example, you might use a structured query to retrieve documents on a certain subject that were written after a certain date, where the document content is in a text column and date information is in a structured data column.

The CTX_QUERY.CONTAINS procedure provides an additional parameter, STRUCT_QUERY, for specifying the WHERE condition in a structured query. For example, to select all news articles that contain the word Oracle that were written on or after October 1st, 1996, you might use:

	exec ctx_query.contains('news_text','Oracle','res_tab',\

	struct_query => 'issue_date >= (''1-OCT-1996'')')

Executing a structured query in this way improves performance over processing a query on a text column and then refining the hitlist by applying a where condition against a structured column. This is especially so when the selectivity of the where condition is high, because when you use the structured query parameter, the ConText server executes the entire query without first writing out a potentially large hitlist to be refined later by the Oracle server.

Note: If the user who includes a structured query in a two-step query is not the owner of the table containing the structured and text columns, the user must have SELECT privilege with GRANT OPTION on the table. In addition, if the object being queried is a view, the user must have SELECT privilege with GRANT OPTION on the base table for the view.

SELECT privilege with GRANT OPTION can be granted to a user using the GRANT command in SQL.

For more information, see Oracle7 Server SQL Reference.

Querying Columns in Remote Databases

If a database link has been created for a remote database, two-step queries support querying text columns in the remote database.

Note: Database links are created using the CREATE DATABASE LINK command in SQL.

For more information about creating database links, see Oracle7 Server SQL Reference.

To perform a two-step query for a text column in a remote database, the database link for the remote database is specified in the CONTAINS procedure as part of the policy for the column in the remote database.

In addition, the result table specified in CONTAINS must exist in the remote database and the user performing the query must have the appropriate privileges on the result table.

For example:

	exec ctx_query.contains('MY_POL@DB1', \
	
'petroleum','CTX_TEMP')

In this example, MY_POL exists in a remote database identified by the database link DB1. The CTX_TEMP result table exists in the same remote database.

For more information about remote queries and distributed databases, see Oracle7 Server Concepts.

Two-Step Queries in Parallel

The CONTAINS procedure provides an argument for processing two-step queries in parallel. Processing queries in parallel helps balance the load between ConText servers and may improve query performance.

When the CONTAINS procedure is called in a two-step query, the PARALLEL argument can be used to specify the number of ConText servers, up to the total number of ConText servers running with the Query personality, that are used to process two-step queries and write the results to the result table.

For example:

	execute ctx_query.contains('ARTICLE_POLICY',\

	'petroleum', 'CTX_TEMP', parallel=>2)

In this example, the text column in the ARTICLE_POLICY policy is queried for documents that contain the term petroleum. The query is processed in parallel by any two available ConText servers with the Query personality and the results are written to CTX_TEMP.

Scoring

In a two-step query, the score results generated by the CONTAINS procedure are physically stored in a result table that has been allocated (either by the application developer or dynamically within the application).

If you want to include scores in the hitlist returned by a two-step query, the scores must be selected from the result table in the second step of the query.

Hitlist Result Tables

In two-step queries, ConText Option uses result tables called hitlist tables to store intermediate results. Intermediate results can be merged into the standard SQL query through a join operation or a sub-query operation. The result tables must be created before the query is performed. A hitlist table can be created manually or allocated through the CTX_QUERY.GETTAB procedure.

Hitlist tables can be named anything; however they must have the following structure:

Column Name: Column Datatype: Purpose:
TEXTKEY VARCHAR2(64) Stores textkeys of the rows satisfying the query
SCORE NUMBER Stores the score for each row (document)
CONID NUMBER Stores the CONTAINS ID when multiple CONTAINS procedures utilize the same result table

For more information about the structure of hitlist result tables, see "Result Tables (Chapter 12)".

Sharing a Hitlist Result Table

For applications that may be used by multiple concurrent users, ConText Option allows for sharing a single result table among all the users rather than allocating a separate table for each user.

Usage of a shared results table is controlled by the application through the SHARE LEVEL and the QUERY ID parameters of the CONTAINS procedure. If the result table is shared, the CONTAINS procedure must specify that SHARE LEVEL = 1 and include a unique QUERY ID so that each result can be distinguished from others in the result table.

If SHARE LEVEL = 0 then:

the hitlist result table is intended for exclusive use

ConText Option truncates the hitlist result table at the start of each query

after the query is completed, CONID values are NULL

If SHARE LEVEL = 1 then:

the hitlist result table is intended for shared use

to identify which entries belong to each specific user in the hitlist result table, specify a unique number for QUERY ID in the CONTAINS procedure

This number will be assigned to the CONID for each row in the result table generated by the query.

before the query is run, you must delete existing rows in the result table with the same QUERY ID as that specified in the CONTAINS procedure

after the query is complete, the CONID column for all rows returned by the query contains the QUERY ID specified in the CONTAINS procedure

select the rows owned by the user by specifying the appropriate CONID in the WHERE clause of the SELECT statement

Attention: ConText Option does not verify that these rules are observed. You must control multiple concurrent usage by passing a different QUERY ID to the requestor if the result table is shared.

Composite Textkey Result Tables

When you execute a two-step query on a table with a composite textkey, the number of textkey columns in the result table must match the composite keys count in the document table. For example, if you want to execute a query on a document table that had a two-column textkey, create a result table with the following schema: TEXTKEY, TEXTKEY2, SCORE, CONID.

The following examples show two different ways in which to create a result table with a two column composite textkey within SQL*Plus.

/* create composite textkey result table manually */

	create table ctx_temp(

		textkey varchar(64),

		textkey2 varchar(64),

		score number,

		conid number);

/* allocate composite textkey result table with CTX_QUERY.GETTAB() */

exec ctx_query.gettab(CTX_QUERY.HITTAB, :hit_tab, 2)

Using One-Step Queries

The one-step query uses the CONTAINS and SCORE functions in a SQL statement to execute a user's request for documents. Rows and columns containing the text and structured data for relevant documents are returned to the application program as a record set like any other query in SQL.

Note: Before one-step queries can be executed, the database in which the text resides must be text enabled by setting the ConText Option initialization parameter TEXT_ENABLE = TRUE. This can be done in two ways:

setting it in the initsid.ora system initialization file

using the ALTER SESSION command

For more information about initialization parameters and the initsid.ora file, see Oracle7 Server Administrator's Guide.

For more information about using the ALTER SESSION command, see Oracle7 Server SQL Reference.

Figure 2 - 3. One-Step Queries

One-Step Query Processing

After a user has submitted a one-step query, ConText Option performs the following tasks to return the results to the user:

1. The query is placed on the text queue (query pipe). The Oracle server intercepts the query and passes the text portion (CONTAINS) to ConText Option.

CONTAINS function(s)

3. The ConText server rewrites the query as a standard SQL statement and passes it back to Oracle.

4. The rewritten query is executed by an Oracle server and the results are returned to the user.

5. The internal result table is truncated.

One-Step Query Example

The following SELECT statement (diagrammed in Figure 2 - 4) shows a simple one-step query. This query searches a text table called TEXTTAB for any articles that contain the word petroleum.

	SELECT *
	FROM texttab
	WHERE CONTAINS (text, 'petroleum') > 0;

Because ConText Option functions execute within normal SQL statements, all of the capabilities for selecting and querying normal structured data fields, as well as text, are available. For instance, in the example, if the text table had a column listing the date the article was published, the user could select articles based on that date as well as the content of the text column.

Figure 2 - 4. Diagram of a One-Step Query

Note: The asterisk wildcard character ( * ) in Figure 2 - 4 specifies that the record set returned by the query includes all the columns of the text table for the selected documents, as well as the scores generated for each document. If a query has more than one CONTAINS function, the asterisk wildcard does not return scores for the multiple CONTAINS and the SCORE function must be called explicitly. See "Scoring" in this chapter for an example.

Restrictions

The CONTAINS function can only appear in the WHERE clause of a SELECT statement.

You cannot issue the CONTAINS function in the WHERE clause of an UPDATE, INSERT or DELETE statement.

Composite Textkey Queries

You can perform one-step queries on text tables with composite textkeys. The syntax for the query is the same as the syntax for a query on a table with a single-column textkey.

Querying Columns in Remote Databases

If a database link has been created for a remote database, one-step queries support querying text columns in the remote database.

To perform a one-step query for a text column in a remote database, the database link for the remote database is specified as part of the table name in the SELECT clause.

For example:

	SELECT *
	FROM texttab@db1
	WHERE CONTAINS (text, 'petroleum') > 0;

In this example, texttab exists in a remote database identified by the database link DB1.

Note: One-step queries do not support querying LONG and LONG RAW columns in remote database tables.

For more information about creating database links, see Oracle7 Server SQL Reference.

For more information about remote queries and distributed databases, see Oracle7 Server Concepts.

Multiple CONTAINS

One-step queries support calling more than one CONTAINS functions in the WHERE clause of a SELECT statement. Multiple CONTAINS can be used in a one-step query to perform queries on multiple text columns located either in the same table or in separate tables.

If multiple ConText servers with the Query personality are running and a one-step query with multiple CONTAINS is executed, the query is processed in parallel. Each CONTAINS function is evaluated by one of the available ConText servers and the results from the servers are combined before they are returned to the user.

Suggestion: If your application makes use of multiple CONTAINS in one-step queries, ensure that multiple ConText servers with the Query personality are running to optimize query performance. The number of ConText servers should be at least equal to the number of CONTAINS you support in one-step queries for the application.

Scoring

In a one-step query, the document scores are generated by the CONTAINS function and returned by the SCORE function.

Each CONTAINS function in a query produces a separate score. When there are multiple CONTAINS functions, each CONTAINS function must have a label (a number) so the SCORE value can be identified in other clauses of the SELECT statement.

The SCORE function may be used in a SELECT list, an ORDER BY clause or a GROUP BY clause.

For example:

SELECT SCORE (10), SCORE(20), title FROM DOCUMENTS

	WHERE CONTAINS (TEXT, 'holmes,' 10)
	OR CONTAINS (TEXT, 'moriarty', 20)
	OR CONTAINS (TEXT, 'baker street', 30)

	ORDER BY SCORE(10)
	GROUP BY SCORE(30)

Using In-Memory Queries

In-memory queries use a buffer and a CONTAINS cursor to the buffer to return query results. Returning query results to a buffer in memory improves performance over writing and reading query results to and from database result tables, which is typical of one- and two-step queries.

The process for performing in-memory queries is the following:

CTX_QUERY.OPEN_CON

OPEN_CON performs the following operations:

opens a cursor to the query buffer

queries a text column using the specified policy and query expression

stores in the query buffer the document textkeys and scores for all the documents that meet the search criteria in the query buffer

Hits are stored in order that they are returned or ranked by score, depending on the argument specified for OPEN_CON.

In addition, OPEN_CON can be specified to return additional columns (up to five) for the selected documents from the text table.

CTX_QUERY.FETCH_HITS

function for each textkey in the buffer to fetch the desired query results, one hit at a time, until the desired number of hits has been returned or no hits remain in the buffer.

CTX_QUERY.CLOSE_CON

procedure to release the cursor opened by OPEN_CON.

Figure 2 - 5. In-Memory Queries

Limitations

In-memory queries have the following limitations:

Structured Queries

Because the OPEN_CON procedure does not support an additional struct_query parameter, you cannot query for structured data in an in-memory query.

MAX and FIRST/NEXT Operators

You cannot use the MAX and FIRST/NEXT operators with in-memory queries.

In-Memory Query Example

The following example shows a simple in-memory query. This query uses a policy named ARTICLES_POL to search the text column in a table named TEXTTAB for any articles that contain the word petroleum.

declare 
  score  char(5); 
  pk     char(5); 
  curid  number; 
  title  char(256);

begin 
  dbms_output.enable(100000); 
  curid := ctx_query.open_con(
                        policy_name  =>  'ARTICLES_POL',
                        text_query   =>  'petroleum',
                        score_sorted =>  true, 
                        other_cols   =>  'title'); 
  while (ctx_query.fetch_hit(curid, pk, score, title)>0)
   loop 
    dbms_output.put_line(score||pk||substr(title,1,50));
   end loop; 
  ctx_query.close_con(curid);
end;

In this example, the TITLE column from the table is also returned by OPEN_CON, so a variable must be declared for TITLE.

DBMS_OUTPUT.ENABLE sets the buffer size to the maximum of 100000 bytes (1 Mb) to ensure that the buffer is large enough to hold the results of the query.

The SCORE_SORTED argument in OPEN_CON is set to 'true' which causes OPEN_CON to store the hits in the query buffer in descending order by score.

FETCH_HITS is called in a loop to fetch SCORE, PK, and TITLE for each hit until a value less than zero is returned, indicating that the buffer is empty.

DBMS_OUTPUT.PUT_LINE prints the results to the standard output.

For more information about the DBMS_OUTPUT PL/SQL package, see Oracle7 Server Application Developer's Guide.

Querying Columns in Remote Databases

If a database link has been created for a remote database, in-memory queries support querying text columns in the remote database.

Note: Database links are created using the CREATE DATABASE LINK command in SQL.

For more information about creating database links, see Oracle7 Server SQL Reference.

To perform an in-memory query for a text column in a remote database, the database link for the remote database is specified in the OPEN_CON procedure as part of the policy for the column in the remote database.

In addition, the result table specified in CONTAINS must exist in the remote database and the user performing the query must have the appropriate privileges on the result table.

For more information about remote queries and distributed databases, see Oracle7 Server Concepts.

Query Issues

This section discusses some of the issues that need to be addressed when developing a query application, specifically:

limiting the size of hitlists

selecting a query method

Limiting the Size of Hitlists

The MAX operator allows you to specify the maximum number of documents that will be retrieved by a query. The documents are returned in order of score, so the most relevant documents will be returned first.

This operator is particularly useful to prevent writing a large number of records to the hitlist table, which could result in performance degradation. However, there are some limitations to using this operator.

The disadvantage of this scheme is that mixed queries might not return the desired results.

For example, suppose you have a query that would normally return 1000 hits. If you use the maximum documents operator to limit the number of hits to 100, for a simple query, you will get the top 100 hits ordered by score. However, if you add a structured condition, and the 900 hits that you cut out of your query happen to match the structured condition, you will not have those results returned.

Selecting a Query Method

Each of the query methods (two-step, one-step, and in-memory) provide advantages and disadvantages that you must consider when developing an application.

The following table illustrates the various advantages and disadvantages to the different methods:

Query Method Advantages Disadvantages
One-step One-step queries are best suited for applications that provide interactive and ad-hoc queries using SQL*Plus. 1. No pre-allocation of result tables
2. Uses standard SQL statements
3. Uses table and column names
4. Queries returned in a single step
5. Can retrieve all hits at once 1. Generally slower than two-step or in-memory queries
2. No access to result tables
Two-step Two-step queries are best suited for PL/SQL-based applications that return very large hitlists and in which query response time is not critical. 1. Result tables can be manipulated
2. Generally faster than one-step queries, especially for mixed queries
3. Can retrieve all hits at once
4.. Structured data can be queried as part of the CONTAINS (first step) 1. Requires pre-allocation of result tables
2. Uses policy names
3. Requires two steps to complete
4. Requires join to base text table to return document details
In-memory In-memory queries are best suited for PL/SQL-based applications that return large hitlists or in which query response time is more critical (e.g. World Wide Web applications) 1. No result tables
2. Faster response time
3. Large hitlists generally faster than one-step and two step queries
4. Can specify the number of hits returned 1. Uses policy names
2 Cannot retrieve all hits at once
3. With small hitlists, performance improvement over two-step is negligible
4. Requires three steps, including a loop, to complete
5. Queries for structured data must be performed separately and joined with in-memory results
6. MAX and FIRST/NEXT operators are not supported

Table 2 - 1. Comparison of Query Methods

Prev Next

Library

Product

Contents

Index

Column Name:	Column Datatype:	Purpose:
`TEXTKEY`	`VARCHAR2(64)`	Stores textkeys of the rows satisfying the query
`SCORE`	`NUMBER`	Stores the score for each row (document)
`CONID`	`NUMBER`	Stores the CONTAINS ID when multiple CONTAINS procedures utilize the same result table

Query Method		Advantages	Disadvantages
One-step	One-step queries are best suited for applications that provide interactive and ad-hoc queries using SQL*Plus.	1. No pre-allocation of result tables 2. Uses standard SQL statements 3. Uses table and column names 4. Queries returned in a single step 5. Can retrieve all hits at once	1. Generally slower than two-step or in-memory queries 2. No access to result tables
Two-step	Two-step queries are best suited for PL/SQL-based applications that return very large hitlists and in which query response time is not critical.	1. Result tables can be manipulated 2. Generally faster than one-step queries, especially for mixed queries 3. Can retrieve all hits at once 4.. Structured data can be queried as part of the CONTAINS (first step)	1. Requires pre-allocation of result tables 2. Uses policy names 3. Requires two steps to complete 4. Requires join to base text table to return document details
In-memory	In-memory queries are best suited for PL/SQL-based applications that return large hitlists or in which query response time is more critical (e.g. World Wide Web applications)	1. No result tables 2. Faster response time 3. Large hitlists generally faster than one-step and two step queries 4. Can specify the number of hits returned	1. Uses policy names 2 Cannot retrieve all hits at once 3. With small hitlists, performance improvement over two-step is negligible 4. Requires three steps, including a loop, to complete 5. Queries for structured data must be performed separately and joined with in-memory results 6. MAX and FIRST/NEXT operators are not supported