Oracle8 ConText Cartridge Application Developer's Guide
Release 2.4
A63821-01

Library

Product

Contents

Index
 

Prev Next

3
Understanding Query Expressions

This chapter explains how to use ConText to create query expressions to find relevant text in documents. The topics covered in this chapter are:

About Query Expressions

A query expression defines the search criteria for retrieving documents using ConText. A query expression consists of query terms (words and phrases) and other components such as operators and special characters which allow users to specify exactly which documents are retrieved by ConText.

A query expression can also call stored query expressions (SQEs) to return stored query results or call PL/SQL functions to return values used in the query.

When a query is executed using any of the methods supported by ConText, one of the arguments included in the query is a query expression. ConText then returns a list of all the documents that satisfy the search criteria, as well as scores that measure the relevance of the document to the search criteria

Query Terms

Query terms can consist of words and phrases. Query terms can also contain stopwords.

Words and Phrases

The words in a query expression are the individual tokens on which the query expression operators perform an action. If multiple words are contained in a query expression, separated only by blank spaces (no operators), the string of words is considered a phrase and the entire string is searched for during a query.

Stopwords

Stopwords are common words, such as and, the, of, and to, that are not considered significant query terms by themselves because they occur so often in text. However, stopwords can provide useful search information when combined with more significant terms.

For example, a query for documents containing the phrase peanut butter and jelly returns different results than a query for documents containing the terms peanut butter and jelly.

When you define a policy for a column, ConText lets you identify a list of stopwords. When stopwords are encountered in the documents in the column, they are not included as indexed terms in the text index; however, they are recorded.

As a result, stopwords cannot be searched for explicitly in text queries, but can be included as part of a phrase in a query expression.
 

See Also: 

For more information about querying with stopwords, see "Querying with Stopwords" in this chapter. 

 
 

Stoplists can be created in any language supported by ConText. ConText provides a default stoplist in English.
 


Note: 

Stopwords do not have an affect on the theme indexes generated by ConText for your English-language documents. 


 
 

Query Expression Components

In addition to query terms, a query expression may contain any or all of the following components:

Component  Purpose 

Operators 

Define the relationships between the terms in a query expression and specify the output returned by the query. The different types of operators are: logical, ranking, result set, proximity, expansion, and thesaurus. 

Wildcard Characters 

Expand query terms using pattern matching 

Grouping Characters 

Group terms and operators in a query expression 

Stored Query Expressions (SQEs) 

Return the results of a query that has been executed and the results stored in an SQE table 

PL/SQL Functions 

Execute a function and use the results in a query expression 

 

Case-Sensitive Queries

ConText supports case-insensitivity for text queries and case-sensitivity for both text and theme queries.

Text Queries

With text queries, you can issue case-sensitive and case-insensitive queries. The ability to query in a case-sensitive way depends on the lexer preference used to index the document set.

By default, ConText uses a lexer preference that is not case-sensitive when indexing documents. Therefore, with a policy containing the default lexer preference, queries are not case-sensitive. When queries are not case-sensitive, a query on United returns the same hits as a query on united.

To issue case-sensitive text queries, you or your ConText administrator must first index your document set using a policy with a case-sensitive lexer preference. Using the same policy, you can issue case-sensitive queries. With case-sensitive queries, a query on United is different from a query on united.

Case-sensitive querying helps to identify words that have different meaning when capitalized. For example, to query on the proper noun Church (as someone's name) without getting the hits for the common noun church, you issue Church as your query. ConText returns all appearances of Church.
 


Note: 

Because a case-sensitive query on a term such as Church returns all appearances of Church, the hitlist includes occurrences of Church at the beginning of a sentence, whether it is the common or proper noun. 


 
 
Stopwords and Case-Sensitivity

When you have case-sensitivity enabled, searches on stopwords are also case-sensitive. Thus when you issue a case-sensitive query on a phrase containing stopwords and non-stopwords, ConText searches for the phrase containing the stopwords with the specified case.

For example, assuming the word on is a stopword and case-sensitivity is enabled, a search on the phrase on the waterfront does not return hits for documents containing the phrase On the waterfront.

Theme Queries

Theme queries are case-sensitive. For example, a query on Turkey produces hits on Turkey the country and not Turkey the bird.
 

See Also: 

For more information about case-sensitive theme queries, see Chapter 4, "Theme Queries"

 
 

Composite Word Queries (German and Dutch only)

German and Dutch language text contains composite words. With ConText, you can create a composite index and subsequently issue queries to search for composite words using a subcomposite word as your query term.

To query against a composite index, you specify the policy associated with the composite index with two-step or in-memory queries. For one-step queries, you must specify the policy if the text column has more than one index attached to it.
 

See Also: 

For more information about creating a composite index for German, see Oracle8 Context Cartridge Administrator's Guide. 

 
 

German Example

When using a German composite index, a query on the term Bahnhof (train station) returns documents that contain Bahnhof or any word containing Bahnhof as a sub-composite, such as Hauptbahnhof, Nordbahnhof, or Ostbahnhof.

However, a query on Bahnhof does not return documents that contain the single words Bahn or Hof.

Dutch Example

When using a Dutch composite index, a query on the term kapitien returns documents that contain kapitien or any word containing kapitien as a sub-composite, such a scheepskapitien.

Highlighting Composite Terms

You can use text highlighting with composite word queries. When you do so, ConText highlights the entire composite word, not just the sub-composite you entered as your query.

For example, when you issue Bahnhof as your query, context highlights the words Hauptbahnhof, Nordbahnhof, and Ostbahnhof entirely.
 

See Also: 

For more information on highlighting text queries, see Chapter 6, "Document Presentation: Highlighting"

 
 

Base-Letter Queries

For languages that use an 8-bit character set, such as French and Spanish, Context gives you the option of converting characters to their base-letter representation before text indexing. This means that words with tildes, accents, umlauts, and so on are converted to their base-letter representation before their tokens are placed in the text index.

When you specify a text index that has used base-letter conversion in a query, ConText converts the term in the query expression to match the base-letter representation before the query is processed.

The result is that with base-letter conversion on for Spanish text index, a query on manaña returns documents that contain manaña and manana.

However, with base letter conversion off for a Spanish text index, a query on manaña returns documents that contain only manaña.

In addition, all expansion and stopword checking for the query is performed on the base-letter terms.
 

See Also: 

For more information about creating an index that supports base-letter conversion, see Oracle8 Context Cartridge Administrator's Guide. 

 
 

Thesaural Queries

The terms in a thesaural query are not converted to base-letter representation before look-up in the thesaurus. The base-letter conversion takes place after the thesaurus look-up and is performed on all the terms returned by the thesaurus.

Query Expression Examples

The following example of a one-step query returns all articles that contain the word wine in the TEXTTAB.TEXT_COLUMN column. The query expression consists only of the query term wine, surrounded by single quotes.

SELECT articles FROM texttab
WHERE CONTAINS(textcol, 'wine') > 0;

The following example of a one-step query returns all articles that contain the phrase wine and roses in the TEXTTAB.TEXT_COLUMN column. The query expression consists of the query phrase wine and roses, surrounded by single quotes.

SELECT articles FROM texttab 
WHERE CONTAINS(textcol, '{wine and roses}') > 0;
 
See Also: 

For more information about the CONTAINS function used in one-step queries, see CONTAINS in Chapter 9

 
 

Logical Operators

Logical operators combine the terms in a query expression. All single words and phrases may be combined with logical operators. When query terms are combined, the number of spaces around the logical operator is not significant.

Logical operators link query terms together to produce scores that are based on the relationship of the terms to each other. The logical operators combine the scores of their operands up to a maximum value of 100. Operands can be any query terms, as well as other operators.

Operator  Syntax  Description 

AND 

term1&term2 

term1 and term2 

Returns documents that contain term1 and term2. Returns the minimum score of its operands. All query terms must occur; lower score taken. 

OR 

term1|term2 

term1 or term2 

Returns documents that contain term1 or term2. Returns the maximum score of its operands. At least one term must exist; higher score taken. 

NOT 

term1~term2 

term1 not term2 

Returns documents that contain term1 and not term2

EQUIVALENCE 

term1=term2 

term1 equiv term2 

Specifies that term2 is an acceptable substitution for term1

 

AND Operator

Use the AND operator to search for documents that contain at least one occurrence of each of the query terms. For example, to obtain all the documents that contain the terms batman and robin and penguin, issue the following query:

'batman & robin & penguin'

In an AND query, the score returned is the score of the lowest query term. In the example above, if the three individual scores for the terms batman, robin, and penguin is 10, 20 and 30 within a document, the document scores 10.

OR Operator

Use the OR operator to search for documents that contain at least one occurrence of any of the query terms. For example, to obtain the documents that contain the term cats or the term dogs, use one of the following:

'cats | dogs'
'cats OR dogs'

In an OR query, the score returned is the score for the highest query term. In the example above, if the scores for cats and dogs is 30 and 40 within a document, the document scores 40.

NOT Operator

Use the NOT operator to search for documents that contain one query term and not another.

For example, to obtain the documents that contain the term animals but not dogs, use the following expression:

'animals ~ dogs'

Similarly, to obtain the documents that contain the term transportation but not automobiles or trains, use the following expression:

'transportation not (automobiles or trains)'
 

Note: 

The NOT operator does not affect the scoring produced by the other logical operators. 


 
 

Equivalence Operator

Use the equivalence operator to specify an acceptable substitution for a word in a search. For example, if you want all the documents that contain the phrase alsatians are big dogs or labradors are big dogs, you can write:

'labradors=alsatians are big dogs'

ConText processes the above query faster and more efficiently than the same query written with the accumulate operator. For example, you could write the above query less efficiently and less concisely as follows:

'labradors are big dogs, alsatians are big dogs'

The savings you gain in using the equivalence operator over the accumulate operator is most significant when you have more than one equivalence operator in the query expression. For example, the following query

'labradors=alsatians are big canines=dogs'

is a more efficient, more concise form of:

'labradors are big dogs, 
alsatians are big dogs, 
alsatians are big canines, 
labradors are big canines'
Precedence of Equivalence Operator

The equivalence operator has higher precedence that all other operators except the unary operators (fuzzy, soundex, stem, and PL/SQL function calls).

WITHIN Operator

You can use the WITHIN operator to narrow a query down into document sections. Document sections can be one of the following:

WITHIN Syntax

The syntax for the WITHIN operator is as follows:

Syntax  Description 

expression WITHIN SENTENCE 

Searches for documents that contain expression within a sentence. Specify an AND or NOT query for expression

expression WITHIN PARAGRAPH 

Searches for documents that contain expression within a paragraph. Specify an AND or NOT query for expression

term WITHIN section 

Searches for term within the pre-defined section. The WITHIN operator has no effect on score. 

 

Querying Within Sentence or Paragraphs

Querying within sentence or paragraph boundaries is useful to find combinations of words that occur in the same sentence or paragraph.

Examples

To find documents that contain dog and cat within the same sentence:

'(dog and cat) WITHIN SENTENCE'

To find documents that contain dog and cat within the same paragraph:

'(dog and cat) WITHIN PARAGRAPH'

To find documents that contain sentences with the word dog but not cat:

'(dog not cat) WITHIN SENTENCE'

Querying Within User-defined Sections

Use the WITHIN operator to narrow down a query into user-defined document sections.

For example in an HTML document set, you or your ConText administrator can define a section for all headings delimited with <HEAD> and <\HEAD> and subsequently issue a query for a term in a heading across all documents.
 


Note: 

The WITHIN operator requires you to know the name of the section you wish to search. A list of defined sections can be obtained using the CTX_ALL_SECTIONS or CTX_USER_SECTIONS views. 


 
  
See Also: 

For more information about defining sections, see the Oracle8 Context Cartridge Administrator's Guide. 

 
 

Examples

To find all the documents that contain the term San Francisco within the user-defined section Headings, write your query as follows:

'San Francisco WITHIN Headings'

To find all the documents that contain the term sailing and contain the term San Francisco within the user-defined section Headings, write your query in one of two ways:

'(San Francisco WITHIN Headings) and sailing'

'sailing and San Francisco WITHIN Headings'

To find all documents that contain the terms dog and cat within the same user-defined section Headings, write your query as follows:

'(dog and cat) WITHIN Headings'

Note that the above query is logically different from:

'dog WITHIN Headings and cat WITHIN Headings'

which finds all documents that contain dog and cat where the terms dog and cat are in Headings sections, regardless of whether they occur in the same Headings section or different sections.

To find all documents in which dog is near cat within the section Headings, write your query as follows:

'dog near cat WITHIN Headings'

Limitations

The WITHIN operator has the following limitations:

Score-Changing Operators

Score changing operators behave like logical operators in that they return documents given the terms you specify. However, these operators affect document scores differently and, as such, can be used to change a document's rank in a hitlist with respect to a query term. The following table describes these operators:

Operator  Syntax  Description 

ACCUMULATE 

term1,term2 

term1 accum term2 

Returns documents that contain term1 or term2. Calculates score by adding the score of each operand. Similar to OR, except that the returned score is the sum of all scores. 

MINUS 

term1-term2 

term1 minus term2 

Returns documents that contain term1. Calculates score by subtracting occurrences of term2 from occurrences of term1

WEIGHT 

term*n 

Returns documents that contain term. Calculates score by multiplying the raw score of term by n, where n is a number from 0.1 to 10. 
 

Accumulate Operator

Use the accumulate operator to search for documents that contain at least one occurrence of any of the query terms, where the documents that contain the most frequent occurrences of the query terms are given the highest score.

For example, to search for documents that contain either term Brazil or soccer and to have the highest scores attached to the documents that contain the most occurrences of these words, you can issue:

'soccer,Brazil'

Accumulate is similar to OR, in the sense that a document satisfies the query expression if any of the terms occur in the document; however, the scoring is different. OR returns a score based only on the query term that occurs most frequently in a document. Accumulate combines the scores for all the query terms that occur in a document, topping out at 100 when the sum exceeds 100. Thus documents that contain the most query terms are ranked the highest.

MINUS Operator

Use the MINUS operator to search for documents that contain a query term, and when you want the presence of a second query term to cause the document to be ranked lower.

The minus operator is useful for lowering the score of documents that contain "noise". For example, suppose a query on the term cars always returned high scoring documents about Ford cars. You can lower the scoring of the Ford documents by using the expression:

'cars - Ford'

In essence, this expression returns the documents that contain the term cars. However, the score returned for a document is the number of occurrences of cars minus the number of occurrences of Ford. When a returned document does not contain Ford, the occurrence of the term Ford is counted as zero.

Weight Operator

The weight operator multiplies the score by the given factor, topping out at 100 when the product exceeds 100. For example, the query cat, dog*2' sums the score of cat with twice the score of dog, topping out at 100 when the score is greater than 100.

In expressions that contain more than one query term, use the weight operator to adjust the relative scoring of the query terms. You can reduce the score of a query term by using the weight operator with a number less than 1; you can increase the score of a query term by using the weight operator with a number greater than 1 and less than 10.

The weight operator is useful in accumulate, OR, or AND queries when the expression has more than one query term. With no weighting on individual terms, the score cannot tell you which of the query terms occurs the most. If you are interested in documents that contain a particular query term more than another term, the overall ranking tells you nothing about which documents pertain to the term that you are most interested in.

Example

You have a collection of sports articles. You are interested in the articles about soccer, in particular Brazilian soccer. It turns out that a regular query on soccer, Brazil returns many high ranking articles on US soccer. To raise the ranking of the articles on Brazilian soccer, you can issue the following query:

'soccer, Brazil*3'

Table 3-1 illustrates how the weight operator can change the ranking of three hypothetical documents A, B, and C, which all contain information about soccer. The columns in the table show the total score of four different query expressions on the three documents.

Table 3-1

  soccer  Brazil  soccer,Brazil  soccer,Brazil*3 

20 

10 

30 

50 

10 

30 

40 

100 

50 

10 

60 

80 

 

The score in the third column containing the query soccer, Brazil is the sum of the scores in the first two columns. The score in the fourth column containing the query soccer,Brazil*3 is the sum of the score of the first column soccer plus three times the score of the second, Brazil.

With the initial query of soccer,Brazil, the documents are ranked in the order C B A. With the query of soccer,Brazil*3, the documents are ranked B C A, which is the preferred ranking.

NEAR Operator

Use the near operator to have Context return a score based on the proximity of two or more query terms. ConText returns higher scores for terms closer together and lower scores for terms farther apart in a document. 


Note: 

The NEAR operator works with only text queries. You cannot use NEAR with theme queries. 


 
 

The syntax for the near operator is as follows:

OPERATOR  SYNTAX 

NEAR 

NEAR((word1, word2,..., wordn) [, MAX_SPAN [, ORDER]]) 

 
 

wordn

Specify the terms in the query separated by commas. The query terms can be single words or phrases.

MAX_SPAN

Optionally specify the size of the biggest clump. The default is 100. ConText returns an error if you specify a number greater than 100.

A clump is the smallest group of words in which all query terms occur. All clumps begin and end with a query term.

For near queries with two terms, max_span is the maximum distance allowed between the two terms. For example, to query on dog and cat where dog is within 6 words of cat, issue the following query:

'near((dog, cat), 6)'
ORDER

Specify TRUE for ConText to search for terms in the order you specify. The default is FALSE.

For example, to search for the words monday, tuesday, and wednesday in that order with a maximum clump size of 20, issue the following query:

'near((monday, tuesday, wednesday), 20, TRUE)
 

Note: 

To specify ORDER , you must always specify a number for the MAX_SPAN parameter. 


 
 

ConText might return different scores for the same document when you use identical query expressions that have the ORDER flag set differently. For example, ConText might return different scores for the same document when you issue the following queries:

'near((dog, cat), 50, FALSE)'
'near((dog, cat), 50, TRUE)'

Near Scoring

The scoring for the near operator combines frequency of the terms with proximity of terms. For each document that satisfies the query, ConText returns a score between 1 and 100 that is proportional to the number of clumps in the document and inversely proportional to the average size of the clumps. This means many small clumps in a document result in higher scores, since small clumps imply closeness of terms.

The number of terms in a query also affects score. Queries with many terms, such as seven, generally need fewer clumps in a document to score 100 than do queries with few terms, such as two.

A clump is the smallest group of words in which all query terms occur. All clumps begin and end with a query term. You can define clump size with the max_span parameter as described in this section.

Near with Other Operators

You can use the near operator with other operators such as AND and OR. Scores are calculated in the regular way.

For example, to find all documents that contain the terms tiger, lion, and cheetah where the terms lion and tiger are within 10 words of each other, issue the following query.

'near((lion, tiger), 10) AND cheetah'

The score returned for each document is the lower score of the near operator and the term cheetah.

You can also use the equivalence operator to substitute a single term in a near query:

'near((stock crash, Japan=Korea), 20)'

This query ask for all documents that contain the phrase stock crash within twenty words of Japan or Korea.

Backward Compatibility Near Syntax

You can write near queries using the syntax of ConText release 2.3.6 and before. For example, to find all documents where lion occurs near tiger, you can write:

'lion near tiger'

or with the semi-colon as follows:

'lion;tiger'

This query is equivalent to the following query:

'near((lion, tiger), 100, FALSE)'
 

Note: 

Only the syntax of the near operator is backward compatible. In the example above, the score returned is calculated using the clump method as described in this section. 


 
 

Highlighting with the Near Operator

When you use highlighting and your query contains the near operator, all occurrences of all terms in the query that satisfy the proximity requirements are highlighted. Highlighted terms can be single words or phrases.

For example, assume a document contains the following text:

Chocolate and vanilla are my favorite ice cream flavors.  I like chocolate 
served in a waffle cone, and vanilla served in a cup with carmel syrup.

If the query is near((chocolate, vanilla)), 100, FALSE), the following is highlighted:

 <<Chocolate>> and <<vanilla>> are my favorite ice cream flavors.  I like 
<<chocolate>> served in a waffle cone, and <<vanilla>> served served in a cup 
with carmel syrup.

However, if the query is near((chocolate, vanilla)), 4, FALSE), only the following is highlighted:

 <<Chocolate>> and <<vanilla>> are my favorite ice cream flavors.  I like 
chocolate served in a waffle cone, and vanilla served in a cup with carmel 
syrup.
 
See Also: 

For more information about highlighting, see Chapter 6, "Document Presentation: Highlighting"

 
 

Section Searching and Near

You can use the NEAR operator with the WITHIN operator for section searching as follows:

'near((dog, cat), 10) WITHIN Headings'

When evaluating expressions such as these, Context looks for clumps that lie entirely within the given section.

In the example above, only those clumps that contain dog and cat that lie entirely within the section Headings are counted. That is, if the term dog lies within Headings and the term cat lies five words from dog, but outside of Headings, this pair of words does not satisfy the expression and is not counted.

Result-Set Operators

Use the result-set operators to control what documents are returned from a query result set. The operands for these operators are expressions, which can be an individual query term or a logical combination of query terms that use other operators. 


Note: 

Because these operators manipulate a result set, they cannot be embedded within each other; they must be placed at the outermost level of the query expression. 

These operators also have no effect on highlighting with CTX_QUERY.HIGHLIGHT. 


 
 

Result set operators are typically used to exclude noise from the hitlist (irrelevant documents) and to retrieve documents out of a hitlist more efficiently. There are three result set operators:

Operator  Syntax  Description 

THRESHOLD 

expression>n 

term>n 

Returns only those documents in the result set that score above the threshold n

Within an expression, selects documents that contain the query term with score of at least n

MAX 

expression:n 

Returns the first n highest scoring documents. For example,:20 means to return the top 20 documents in the hitlist. The value n must be an integer between 1 and 65535. 

FIRST/NEXT 

 

expression#m-n 

Returns the specified number of documents as ordered in the hitlist range m to n

 

Threshold Operator

You can use the threshold operator in two ways:

Expression level

Use the expression level threshold operator to eliminate documents in the result set that score below a threshold number. For example, to search for documents that contain relational databases and to return only documents that score greater than 75, use the following expression:

'relational databases > 75'

Query Term Level

Use the query term threshold operator in a query expression to select a document based on how a term scores in the document. For example, to select documents that have at least a score of 30 for lion and contain tiger, use:

'(lion > 30) and tiger'

Max Operator

Use the max operator to retrieve a given number of the highest scoring documents. For example, to obtain the twenty highest scoring documents that contain the word dance, you can write:

'dance:20'

The max operator is particularly useful to prevent writing a large number of records to the hitlist table, which could result in performance degradation.
 


Note: 

The max operator cannot be used with the CTX_QUERY.COUNT_HITS function or with in-memory queries. 


 
 

First/Next Operator

Use the first/next operator to return a specified range of documents from the hitlist.
 


Note: 

In a first/next query, the order of the returned documents is not based on score or textkey. ConText returns the documents based on the order in which it encounters the documents in the queried text column 


 
 

For example, to return the first 10 documents encountered by ConText that contain the term dog, use the following expression:

'dog#1-10'

You could then return the next 10 documents using the following expression:

'dog#11-20'

The first/next operator can be used to create an application interface in which query results (rows in the hitlist) are returned incrementally. Because the query results are returned incrementally, query response is generally faster. The application can display the hitlists in a more manageable size, and control can be returned to the user faster.
 


Note: 

The first/next operator cannot be used with the CTX_QUERY.COUNT_HITS function or with in-memory queries. 


 
 

Combined First/Next and Max Queries

You can use the first/next operator extract chunks of a sorted hitlist returned by the max operator. For example, if you use the max operator to return only the highest scoring 50 documents that contain the term cat, you can extract the first 10 documents from the 50 as follows:

'cat:50#1-10'
 

Note: 

Placing the max operator inside the first/next operator as such is the only instance in which you can embed the max operator in a query expression. 


 
 

Expansion Operators

The expansion operators expand a query expression to include variants of the query term supplied by the user. There are three kinds of expansion operators:

Operator  Syntax  Description 

STEM 

$term 

Expands a query to include all terms having the same stem or root word as the specified term. 

SOUNDEX 

!term 

Expands a query to include all terms that sound the same as the specified term (English-language text only). 

FUZZY 

?term 

Expands a query to include all terms with similar spellings as the specified term (English-language text only). 

 

The expansion operators are unary operators. They may be used in combination with each other and with any other operators described in this chapter. In addition, searches can be broadened by performing an expansion on an expansion.

The methods used by the expansion operators to perform stemming, fuzzy matching, and soundex matching for a text column are determined by the Wordlist preference in the policy for the column.
 

See Also: 

For more information about setting up preferences and policies, see Oracle8 Context Cartridge Administrator's Guide. 

 
 

Stem Expansions

Use the STEM ($) operator to search for terms that have the same linguistic root as the query term. For example:

Input  Expands To 

$scream 

scream screaming screamed 

$distinguish 

distinguish distinguished distinguishes 

$guitars 

guitars guitar 

$commit 

commit committed 

$cat 

cat cats 

$sing 

sang sung sing 

 

The ConText stemmer, licensed from Xerox Corporation's XSoft Division, supports the following languages: English, French, Spanish, Italian, German, and Dutch.
 


Note: 

If STEM returns a stopword, the stopword is not included in the query or highlighted by CTX_QUERY.HIGHLIGHT


 
 

Soundex Expansions

The soundex (!) operator enables searches on words that have similar sounds; that is, words that sound like other words. This function allows comparison of words that are spelled differently, but sound alike in English.

Soundex in ConText uses the same logic as the soundex function in SQL to search for words that have a similar sound. It returns all words in a text column that have the same soundex value.

The following example illustrates the results that could be returned for a one-step query that uses SOUNDEX:

SELECT ID, COMMENT FROM EMP_RESUME
WHERE CONTAINS (COMMENT, '!SMYTHE') > 0

ID COMMENT 
-- ------------
23 Smith is a hard worker who..
 

Note: 

SOUNDEX works best for languages that use a 7-bit character set, such as English. It can be used, with lesser effectiveness, for languages that use an 8-bit character set, such as many Western European languages. 

For more information about the SOUNDEX function in SQL, see Oracle8 SQL Reference. 


 
 

Fuzzy Expansions

Fuzzy (?) expansions generate words that are spelled similarly. This type of expansion is helpful for finding more accurate results when there are frequent misspellings in the documents in the database.

Unlike the stem expansion, the number of words generated by a fuzzy search depends on what is in the text index; results can vary significantly according to the contents of the database index.

For example:

Input  Expands To 

?cat 

cat cats calc case 

?feline 

feline defined filtering 

?apply 

apply apple applied April 

?read 

lead real 

 
 


Note: 

Fuzzy works best for languages that use a 7-bit character set, such as English. It can be used, with lesser effectiveness, for languages that use an 8-bit character set, such as many Western European languages. Also, the Japanese lexer provides limited fuzzy matching. 

In addition, if fuzzy returns a stopword, the stopword is not included in the query or highlighted by CTX_QUERY.HIGHLIGHT


 
 

Penetration in Expansion Operators

Penetration allows complex query expansions to be expressed in short concise notation. Penetration is a system of notation for query expressions and does not affect the meaning of the expansion operators or the order in which operations are performed; it is a tool to help you generate non-ambiguous queries using the expansion operators.

Penetration applies the expansion operators to each term within an explicit expression (i.e., an expression delimited by parentheses or braces). Any expansion operators outside an expression delimited by parentheses ( ) or braces { } is applied to each word or phrase inside the expression.

For example:

Query Before Penetration  Query After Penetration 

?(dog, cat, mouse) 

?dog, ?cat, ?mouse 

?(dog,!(cat & mouse)) 

?dog, (!?cat & !?mouse) 

?((cat=feline) meows) 

(?cat =?feline)?meows 

 

In the first example, a fuzzy expansion is performed on each term.

In the second example, a fuzzy expansion is performed on each term and a soundex expansion is performed only on the terms cat and mouse because cat and mouse are enclosed in a separate set of parentheses

In the third example, a fuzzy expansion is performed on each term, including both equivalence terms.
 


Note: 

Expansion operators do not penetrate expressions delimited by brackets [ ]. 


 
 

Examining Query Expansions

You can use query expression feedback to examine how ConText expands query expressions containing fuzzy, stem and soundex operators. 

See Also: 

Chapter 5, "Query Expression Feedback"

 
 

Base-letter Support

If you have base-letter conversion specified for a text column and the query expression contains a SOUNDEX or FUZZY operator, ConText operates on the base-letter form of the query.

The STEM operator does not support base-letter conversion.

Thesaurus Operators

The thesaurus operators expand a query for a single term (word or phrase) using a thesaurus that defines relationships between the user-specified term and other semantically related terms.

There are ten kinds of thesaurus operators, corresponding to the ten types of relationships that can be defined in an ISO2788 standard thesaurus.

Operator  Syntax  Description 

SYNONYM 

SYN(term[,thes]) 

Expands a query to include all the terms defined in the thesaurus as synonyms for term

PREFERRED 

PT(term[,thes]) 

Replaces the specified word in a query with the preferred term for term

RELATED 

RT(term[,thes]) 

Expands a query to include all the terms defined in the thesaurus as a related term for term

TOP 

TT(term[,thes]) 

Replaces the specified word in a query with the top term in the standard hierarchy (BT, NT) for term

NARROWER 

NT(term[,level[,thes]]) 

Expands a query to include all the lower level terms defined in the thesaurus as narrower terms for term. 

NARROWER GENERIC 

NTG(term[,level[,thes]]) 

Expands a query to include all the lower level terms defined in the thesaurus as narrower generic terms for term. 

NARROWER PARTITIVE 

NTP(term[,level[,thes]]) 

Expands a query to include all the lower level terms defined in the thesaurus as narrower partitive term for term

NARROWER INSTANCE 

NTI(term[,level[,thes]]) 

Expands a query to include all the lower level terms defined in the thesaurus as narrower instance term for term

BROADER 

BT(term[,level[,thes]]) 

Expands a query to include the term defined in the thesaurus as a broader term for term

BROADER GENERIC 

BTG(term[,level[,thes]]) 

Expands a query to include all terms defined in the thesaurus as a broader generic terms for term

BROADER PARTITIVE 

BTP(term[,level[,thes]]) 

Expands a query to include all the terms defined in the thesaurus as broader partitive terms for term

BROADER INSTANCE 

BTI(term[,level[,thes]]) 

Expands a query to include all the terms defined in the thesaurus as broader instance terms for term

 

Internally, ConText processes the expansion by bracketing each individual term returned by the expansion, then the terms are accumulated together using the ACCUMULATE operator.

For example, if bird, birdy, and avian are all synonyms:

SYN(bird) is expanded to {bird},{avian},{birdy}.

If a term in a thesaural query does not have corresponding entries in the specified thesaurus, no expansion is produced and the term itself is used in the query.
 

See Also: 

For more information about viewing thesaural expansions, see Chapter 5, "Query Expression Feedback"

For more information about thesaural relationships and creating thesauri, see Oracle8 Context Cartridge Administrator's Guide. 

 
 

Limitations

The thesaurus operators can be used in conjunction with all the other query expression operators and special characters supported by ConText, with the exception of the near operator.

The maximum length of the expanded query is 32000 characters.

Thesaural operations cannot be nested. For example, the following query is not allowed.

'SYN(BT(bird))'

Thesaurus Arguments

The thesaurus operators are implemented in ConText as PL/SQL functions, and, as such, have arguments that must be specified with the operator. All of the notational conventions and usage rules for PL/SQL apply to the thesaurus operators.

The thesaurus operators have the following arguments:

term

Specify the operand for the thesaurus operator. You must specify a term when using the NT operator. For preferred term (PT) and top term (TT) queries, term is replaced by the preferred term/top term defined for the term in the specified thesaurus; however, if no PT or TT entries are defined for the term, the term is not replaced and is used in the query.

For all other thesaural queries, term is expanded to include the synonymous, related, broader, or narrower terms defined for the term in the specified thesaurus.

level

Specify the number of levels traversed in the thesaurus hierarchy to return the broader (BT, BTG, BTP) or narrower (NT, NTG, NTP) term for the specified term. For example, a level of 1 in a BT query returns only the broader term, if one exists, for the specified term. A level of 2 returns the broader term for the specified term, as well as the broader term, if one exists, for the broader term.

The level argument is optional and has a default value of one (1). Zero or negative values for the level argument return only the original query term.

thes

Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT. As a result, a thesaurus named DEFAULT must exist in the thesaurus tables before using any of the thesaurus operators.

Synonym Operator

Use the synonym operator (SYN) to expand a query to include all the terms that have been defined in a thesaurus as synonyms for a specified term.

The following query returns all documents that contain the term tutorial or any of the synonyms defined for tutorial in the DEFAULT thesaurus:

'SYN(tutorial)'

Compound Phrases in Synonym Operator

Expansion of compound phrases for a term in a synonym query are returned as AND conjunctives.

For example, the compound phrase temperature + measurement + instruments is defined in a thesaurus as a synonym for the term thermometer. In a synonym query for thermometer, the query is expanded to:

{thermometer},({temperature}&{measurement}&{instruments})
 

Note: 

In a thesaurus, compound phrases can only be defined in synonym relationships for a term. 


 
 

Preferred Term Operator

Use the preferred term operator (PT) to replace a term in a query with the preferred term that has been defined in a thesaurus for the term.

For example, the term building has a preferred term of construction in a thesaurus. A PT query for building returns all documents that contain the word construction. Documents that contain the word building are not returned.

Related Term Operator

Use the related term operator (RT) to expand a query to include all terms with the related term that has been defined in a thesaurus for the term.

For example, the term dinosaur has a related term of paleontology. A RT query for dinosaur returns all documents that contain the word paleontology. Documents that contain the word dinosaur are not returned.

Narrower Term Operators

Use the narrower term operators (NT, NTG, NTP, NTI) to expand a query to include all the terms that have been defined in a thesaurus as the narrower or lower level terms for a specified term. They can also expand the query to include all of the narrower terms for each narrower term, and so on down through the thesaurus hierarchy.
 


Note: 

The hierarchy can contain four separate branches, represented by the four narrower term operators. During a narrower term query, the specified operator only searches down the designated branch of the hierarchy. 


 
 

The following query returns all documents that contain either the term tutorial or any of the NT terms defined for tutorial in the DEFAULT thesaurus:

'NT(tutorial)'

The following query returns all documents that contain either fairy tale or any of the narrower instance terms for fairy tale as defined in the DEFAULT thesaurus:

'NTI(fairy tale)'

That is, if the terms cinderella and snow white are defined as narrower term instances for fairy tale, ConText returns documents that contain fairy tale, cinderella, or snow white.

Broader Term Operators

Use the broader term operators (BT, BTG, BTP, BTI) to expand a query to include the term that has been defined in a thesaurus as the broader or higher level term for a specified term. They can also expand the query to include the broader term for the broader term and the broader term for that broader term, and so on up through the thesaurus hierarchy.
 


Note: 

The hierarchy can contain four separate branches, represented by the four broader term operators. In a broader term query, the specified operator only searches up the designated branch of the hierarchy. 


 
 

The following query returns all documents that contain the term tutorial or the BT term defined for tutorial in the DEFAULT thesaurus:

'BT(tutorial)'

Broader and Narrower Term Operator on Homographs

If a homograph (a word or phrase with multiple meanings, but the same spelling) appears in two or more nodes in the same hierarchy branch of a thesaurus, a qualifier is required for each occurrence of the term in the branch.

If the qualifier is not specified for a homograph in a broader or narrower term query, the query expands to include all of the broader/narrower terms for the homograph.

For example, if machine is a broader term for crane (building equipment) and bird is a broader term for crane (waterfoul):

BT(crane) expands to {crane},{machine},{bird}

If the qualifier for a homograph is specified in a broader or narrower term query, only the broader/narrower terms for the qualified homograph are returned.

Using the previous example:

BT(crane{(waterfoul)}) expands to {crane},{bird}
 


Note: 

When specifying a qualifier in a broader or narrower term query, the qualifier and its notation (parentheses) must be escaped, as is shown in this example. 


 
 

Top Term Operator

Use the TOP TERM operator (TT) to replace a term in a query with the top term that has been defined for the term in the standard hierarchy (BT, NT) in a thesaurus. Top terms in the generic (BTG, NTG), partitive (BTP, NTP), and instance (BTI, NTI) hierarchies are not returned.

For example, the term tutorial has a top term of learning systems in the standard hierarchy of a thesaurus. A TT query for tutorial returns all documents that contain the phrase learning systems. Documents that contain the word tutorial are not returned.

Thesaural Expansions and Case-Sensitivity

Thesaural expansions in text queries can differentiate between terms based on case.

For example, a case-sensitive thesaurus named thes1 is created and Mercury is defined as a narrower term for planets, while mercury is defined as a narrower term for metals.

During a query, the following expansions occur:

BT(mercury,1,thes1) expands to {MERCURY}, {METALS}

BT(Mercury,1,thes1) expands to {MERCURY}, {PLANETS}
 


Note: 

There is no way to enable or disable case-sensitivity. ConText preserves the case of all entries entered in a thesaurus based on whether the thesaurus was specified during creation to be case-sensitive. Similarly, text queries use the cases of terms to perform the thesaural look-up based on the thesaurus specified for the term(s). 


 
 

Limitations

Case-sensitive thesauri only affect the expansion of a term and not the terms actually used in the query. The case of the expanded terms depends on whether the text index being queried is case-sensitive or case-insensitive.

For example, when the case-sensitive thesaurus, thes1, is used with a case-insensitive index, the following expansion is returned:

BT(Mercury,1,thes1) expands to {MERCURY}, {PLANETS}

The query then returns all documents in which the two terms occur, regardless of case. In other words, documents that contain mercury, Mercury, planets, Planets, or any other combinations of case for the two terms are all returned by the query.

With a case-sensitive text index, the same query expands to:

BT(Mercury,1,thes1) expands to {Mercury}, {planets}

The query returns only those documents in which Mercury and planets occur.

Base-letter Support for Thesaural Queries

When ConText processes a query on a base-letter index and the expression contains a thesaurus operator, ConText looks up the query term in the thesaurus without converting the query to base-letter. The expansions obtained from the thesaurus are converted to base-letter and looked up subsequently within the index according to query rules.

This sequence of look-up enables base-letter queries to work independent of whether the thesaurus is in base-letter form. However, if the keys in the thesaurus are in base letter form, these keys will not match the corresponding non-base letter form query terms. When you have a base-letter thesaurus, you must specify the base-letter form in the query.

Wildcard Characters

Wildcard characters can be used in query expressions to expand word searches into pattern searches. The wildcard characters are:

Wildcard Character  Description 

The percent wildcard specifies that any characters can appear in multiple positions represented by the wildcard. 

The underscore wildcard specifies a single position in which any character can occur. 

 

For example, the following abbreviated one-step query finds all terms beginning with the pattern scal in a column named text:

...contains(TEXT, 'scal%') > 0
 

Note: 

To expand the wildcard query, ConText uses the word list for the text column and rewrites the query with these terms. When your wildcard query expands to a number of terms greater than the maximum allowed in a query, ConText returns an error. 

In addition, if a wildcard expression translates to a stopword, the stopword is not included in the query or highlighted by CTX_QUERY.HIGHLIGHT


 
 

Grouping Characters

The grouping characters control operator precedence by grouping query terms and operators in a query expression. The grouping characters are:

The beginning of a group of terms and operators is indicated by an open character from one of the sets of grouping characters. The ending of a group is indicated by the occurrence of the appropriate close character for the open character that started the group. Between the two characters, other groups may occur.

For example, the open parenthesis indicates the beginning of a group. The first close parenthesis encountered is the end of the group. Any open parentheses encountered before the close parenthesis indicate nested groups.

Brackets perform the same function as the parentheses, but prevent penetration for the expansion operators.

Stored Query Expressions

You can store the results of a query expression and then call the SQE later in a query expression to return the stored results. To call a stored query expression, use the SQE operator.

Operator  Syntax  Description 

Stored Query Expression 

 

SQE(SQE_name) 

Returns the stored result of SQE_name

 

The advantage of calling an SQE in a query expression, rather than specifying query terms, is that the results are typically returned faster, since ConText does not have to query the text table directly.

In addition, SQEs can be used to perform iterative queries, in which an initial query is refined using one or more additional queries.

Using Stored Query Expressions

The process for using stored query expressions is:

  1. Call CTX_QUERY.STORE_SQE to store the results for the text column or policy. With STORE_SQE, you specify a name for the SQE, a policy (which identifies the text column for the SQE), a query expression, and whether the SQE is a session or system SQE
  2. Call the stored query expression in the query expression of a text (or theme) query. ConText returns the results of the SQE in the same way it returns the results of a regular query. If the results of the SQE are out-of-date, ConText automatically re-evaluates the SQE before returning the results.
  3.  


    Note: 

    Because ConText must first determine if the results are out-of-date with respect to the document index, many changes to the index though inserting, deleting, and updating documents will slow down the retrieval of the stored query expression results. 


     
     

Administration of stored query expressions can be performed using the REFRESH_SQE, REMOVE_SQE, and PURGE_SQE procedures in the CTX_QUERY PL/SQL package.

Example

To create a session SQE named PROG_LANG, use CTX_QUERY.STORE_SQE as follows:

exec ctx_query.store_sqe('emp_resumes', 'prog_lang',    'cobol', 'session');

This SQE queries the text column for the EMP_RESUMES policy (in this case, EMP.RESUMES) and returns all documents that contain the term cobol. It stores the results in the SQE table for the policy.

PROG_LANG can then be called within a query expression as follows:

select score, docid from emp 
where contains(resume, 'sqe(prog_lang)')>0 
order by score;

Session and System SQEs

When you initially create an SQE using CTX_QUERY.STORE_SQE, you can specify whether the SQE is for the current session or for all sessions (system SQE).

You can use session SQEs only in the current session. These SQEs are stored only for the duration of the session. When a session is terminated, all session SQEs created during the session are deleted from the SQE tables. If you want to use a session SQE in another session, you must recreate the SQE.

System SQEs can be used in all sessions, including concurrent sessions. When a session is terminated, system SQEs created during the session are not deleted from the SQE tables and can be used in future sessions.

Re-evaluation of Stored Query Expressions

If the text column referenced by an stored query expression has been modified since the stored query expression was created, the stored query expression results may be out-of-date. Before returning the results of an stored query expression in a query expression, ConText verifies that the results are current. If they are not current, ConText automatically evaluates the differences and updates the results.

ConText also verifies that any stored query expressions nested within an stored query expression have up-to-date results
 


Note: 

ConText does not verify whether PL/SQL functions in stored query expressions have been updated. If a PL/SQL function in an stored query expression has been updated, the stored query expression must be manually re-evaluated. 


 
 

Result lists in stored query expression tables may get fragmented by consecutive re-evaluations. You can resolve fragmentation by calling CTX_QUERY.REFRESH_SQE.

Iterative Queries

Iterative queries are queries built on other queries to refine or add to the result set of the original query. Once you define a stored query expression, you can add additional search criteria in two ways:

Extending the Expression in the CONTAINS Procedure

Sometimes you might want to add a condition to a stored query expression to re-define your search criteria. You can do so by extending the query with additional operators when you call CTX_QUERY.CONTAINS. When you extend stored queries in this way, the response time is usually faster than an equivalent query without the SQE operator.

For example, you find that wildcard queries take a long time to process. You therefore define a wildcard query as a stored query expression, Q1, to return all documents indexed under policy pol that have words beginning with the letter z:

ctx_query.store_sqe('pol', 'Q1', 'z%', 'session');

You then extend the query by adding an OR condition: You ask for all documents indexed under policy pol that contain words beginning with the letter z or contains the word cat:

ctx_query.contains('pol', 'SQE(Q1) | cat', 'ctx_temp');

Internally, ConText must still use the text index to find those documents that might have the word cat but not z%; however, the response time is generally much faster than the following equivalent query:

ctx_query.contains('pol', 'z% | cats', 'ctx_temp');

Nesting Stored Query Expressions

You can use stored query expressions to define other stored query expressions. This is useful when you want to refine the result set returned from a stored query expression.

For example, you define the stored query expression, Q1 as follows:

ctx_query.store_sqe('pol', 'Q1', 'lions | tigers', 'session');

You then want to reduce this hitlist by adding another condition, so you define Q2 as follows:

ctx_query.store_sqe('pol', 'Q2', 'SQE(Q1) and zoos', 'session');

You then execute Q2 as follows:

ctx_query.contains('pol', 'SQE(Q2)', 'ctx_temp');

This query searches for all documents that contain the terms lions or tigers and zoos. It is generally faster that the following equivalent query:

ctx_query.contains('pol', 'lions | tigers and zoos', 'ctx_temp');

SQE Tables

Each stored query expression is stored in two tables: a central or system table owned by CTXSYS and an text index table attached to the policy for which the stored query expression was created.

The table owned by CTXSYS is an internal table which stores the stored query expression definitions for all the stored query expressions that have been created for all existing policies. It cannot be accessed directly, but can be viewed through two views, CTX_SQES (users with CTXADMIN role) and CTX_USER_SQES (users with CTXAPP and CTXADMIN roles).

The table used to store the results of an stored query expression for a text column is one of the tables created automatically when the column is indexed; however, the SQR table is only populated when an stored query expression is created and updated when an stored query expression is re-evaluated.

The tablespace, storage clause, and other parameters used to create the SQR table are specified by the Engine preference in the policy for the text column of the stored query expression.
 


Note: 

Similar to the other ConText index tables, the SQR table is an internal table that is accessed only by ConText when an stored query expression is processed in a query. 

For more information about policies, preferences, text indexing, and the structure of the stored query expression tables and views, see Oracle8 Context Cartridge Administrator's Guide. 


 
 

Using Operators in Stored Query Expressions

You can use all query expression operators in stored query expressions, with the following exceptions:

Stored query expressions also support all of the special characters and other components that can be used in a query expression, including PL/SQL functions and other stored query expressions.

PL/SQL in Query Expressions

In a query expression, you can call a PL/SQL function that returns a value. The syntax for the PL/SQL operator is as follows:

Syntax  Description 

@owner_name.fname(arg1, arg2,...,argn) 

execute owner_name.fname() 

exec owner_name.fname() 

Executes fname() where fname() returns a value. Return values that are not of type VARCHAR2 are cast into strings when possible. If fname() does not return a value, an exception is raised. 

 

Example

Calling a PL/SQL function within a query is useful for converting words to alternate forms. For example, you can call a function that takes acronyms and returns the expanded string.

Suppose you, as user ctxuser, create a function named CONVERT that takes an acronym as input and returns the fully-expanded version of the acronym. Then, to obtain all documents that contain either IBM or International Business Machine, you issue the following query:

'execute ctxuser.convert(IBM), IBM'

Likewise, you can call a PL/SQL function that translates words. For example, you can call a function french that converts an English word to its French equivalent. You can then search on the French word for cat by issuing the following query:

'@ctxuser.french(cat)'

Operator Precedence

Operator precedence is the order in which the components of a query expression are evaluated. ConText query operators can be divided into two sets of operators that have their own order of evaluation. These two groups are described below as Group 1 and Group 2.

In all cases, query expressions are evaluated in order from left to right according to the precedence of their operators. Operators with higher precedence are applied first. Operators of equal precedence are applied in order of their appearance in the expression from left to right.

Group 1

Within query expressions, the Group 1 operators have the following order of evaluation from highest precedence to lowest:

Operator  Equivalent 

EQUIV 

NEAR 

Weight, Threshold 

* > 

MINUS 

NOT 

WITHIN 

 

AND 

OR 

ACCUM 

Max 

First/Next 

 

Group 2

Within query expression, the Group 2 operators have the following order of evaluation from highest to lowest:

Operator  Equivalent 

Wildcard 

% _ 

Stem 

Fuzzy 

Soundex 

 

Procedural Operators

Other operators not listed under Group 1 or Group 2 are procedural. These operators have no sense of precedence attached to them. They include the SQE, PL/SQL, and thesaurus operators.

Precedence Examples


Query Expression  Order of Evaluation 

w1 | w2 & w3 

(w1) | (w2 & w3) 

w1 & w2 | w3 

(w1 & w2) | w3 

?w1, w2 | w3 & w4 

(?w1), (w2 | (w3 & w4)) 

abc = def ghi & jkl = mno 

((abc = def) ghi) & (jkl=mno) 

dog and cat WITHIN body 

dog and (cat WITHIN body) 

 

In the first example, because AND has a higher precedence than OR, the query returns all documents that contain w1 and all documents that contain both w2 and w3.

In the second example, the query returns all documents that contain both w1 and w2 and all documents that contain w3.

In the third example, the fuzzy operator is first applied to w1, then the AND operator is applied to arguments w3 and w4, then the OR operator is applied to term w2 and the results of the AND operation, and finally, the score from the fuzzy operation on w1 is added to the score from the OR operation.

The fourth example shows that the equivalence operator has higher precedence than the AND operator.

The fifth example shows that the AND operator has lower precedence than the WITHIN operator.

Altering Precedence

Precedence is altered by grouping characters as follows:

Escaping Reserved Words and Characters

To query on words or symbols that have special meaning to query expressions such as and & or| accum, execute, you must escape them. There are two ways to escape characters in a query expression:

Escape Symbol  Meaning 

{} 

Use braces to escape a string of characters or symbols. Everything within a set of braces in considered part of the escape sequence. 

Use the backslash character to escape an individual character or symbol. Only the character immediately following the backslash is escaped. 

 

Example

In the following examples, an escape sequence is necessary because each expression contains a ConText operator or reserved symbol:

'AT\&T'
'{AT&T}'

'high\-voltage'
'{high-voltage}'
 

Note: 

If you use braces to escape an individual character within a word, the character is escaped, but the word is broken into three tokens. 

For example, a query written as high{-}voltage searches for high - voltage, with the space on either side of the hyphen. 


 
 

Reserved Words

The following is a list of ConText reserved words and characters that must be escaped to be searched on:

Operator  Reserved Word  Reserved Character 

And 

AND 

Or 

OR 

Accumulate 

ACCUM 

Minus 

MINUS 

Not 

NOT 

Near 

(none) 

Stem 

(none) 

Soundex 

(none) 

Fuzzy 

(none) 

Threshold 

(none) 

Weight 

(none) 

First/Next 

(none) 

Max 

(none) 

Wildcard (multiple) 

(none) 

Wildcard (single) 

(none) 

Within 

WITHIN 

(none) 

Grouping (parentheses) 

(none) 

( ) 

Grouping (brackets) 

(none) 

[ ] 

Escape (multiple characters) 

(none) 

{ } 

Escape (single character) 

(none) 

Paragraph Searching 

PARAGRAPH (used with WITHIN) 

(none) 

PL/SQL call 

EXECUTE 

EXEC 

Sentence Searching 

SENTENCE (used with WITHIN) 

(none) 

Stored Query Expression 

SQE 

(none) 

Synonym 

SYN 

(none) 

Preferred 

PT 

(none) 

Related 

RT 

(none) 

Top 

TT 

(none) 

Broader 

BT 

(none) 

Narrower 

NT 

(none) 

Broader Generic 

BTG 

(none) 

Narrower Generic 

NTG 

(none) 

Broader Partitive 

BTP 

(none) 

Narrower Partitive 

NTP 

(none) 

 

Querying Escape Characters

The open brace { signals the beginning of the escape sequence, and the closed brace} indicates the end. Everything between the opening brace and the closing brace is part of the query expression (including any open brace characters). To include the close brace character in a query expression, use}}.

To escape the backslash escape character, use \\.

Querying with Stopwords

Stopwords are words for which ConText does not create an index entry. They are usually common words that are unlikely to be searched on by themselves.

ConText is shipped with a default list of stopwords in English containing common words such as this and that. However, you or ConText administrator can define stopwords.
 

See Also: 

For more information about defining stopwords, see Oracle8 Context Cartridge Administrator's Guide. 

 
 

Stopwords by Themselves

You cannot query on a stopword by itself or a phrase of only stopwords; whenever you attempt to query on a stopword by itself or a stopword-only phrase, the result is always no hits.

For example, you cannot issue a query to retrieve all documents that contain this if this is defined as a stopword, nor can you issue a query on a phrase of stopwords such as the who, if the words the and who are defined as stopwords.

Stopwords with Non-stopwords

You can query on phrases that contain stopwords as well as non-stopwords, such as this boy talks to that girl, where this and that are the only stopwords. This is possible because Context records the position of stopwords even though it does not create an index entry for them.

Case-Sensitivity

If you have case-sensitivity enabled for text queries and you issue a query on a phrase containing stopwords and non-stopwords, you must specify the correct case for the stopwords. For example, a query on this boy talks to that girl does not return documents that containing the phrase This boy talks to that girl, assuming this is a stopword. 

See Also: 

For more information about issuing case-sensitive text queries, see "Case-Sensitive Queries" in this chapter. 

 
 

Stopwords with Operators

When you use a stopword or a stopword-only phrase as an operand of a query operator, ConText rewrites the expression to eliminate the stopword or stopword-only phrase and then executes the query.

The following table describes some common stopword transformations. The Stopword Expression column describes the query expression or component of a query expression you enter, while the right-hand column describes the way ConText rewrites the query.

In these examples, a value of no_token for the rewritten expression means no hits are returned for the query.

Stopword Expression  Rewritten Expression 

non_stopword AND stopword 

non_stopword 

stopword AND non_stopword  

non_stopword 

stopword AND stopword 

no_token 

non_stopword NOT stopword  

non_stopword 

stopword NOT non_stopword  

no_token 

stopword NOT stopword 

no_token 

 

For example, assuming that the word this is a stopword and that the word dog is a non-stopword, the query dog and that is rewritten to dog, applying the first transformation is the list.
 

See Also: 

For a complete list of stopword transformations, see Appendix D, "Stopword Transformations"

To learn about how to examine stopword transformations, see Chapter 5, "Query Expression Feedback"

 
 

Querying with Special Characters

Context indexes text by identifying tokens (words). For English and most European languages it assumes that blank spaces delimit tokens. At index time, ConText must also know how to interpret punctuation characters and characters that occur within words and numbers. Such special characters must be defined in the BASIC LEXER preference. They are described as follows:

Type of Character  Description 

Punctuations 

Characters that delimit the end of sentences such as the period '.' and question mark '?' and those that occur next to words and numbers, such as the comma ',' and the dollar sign '$'. These characters are not indexed. 

Continuation 

Characters that indicate a word continues on the next line. An example is the hyphen '-'. These characters are not indexed. 

Printjoins 

Characters that join words together such as hyphen '-'. These characters are indexed. 

Skipjoins 

Characters that join words together such as hyphen '-'. These characters are not indexed. 

Numjoin 

Characters that occur in numbers such as the decimal point '.'. These characters are indexed. 

Numgroup 

Characters that group digits within a number such as the comma ','. These characters are indexed. 

Startjoin 

Non-alphanumeric characters that occur at the beginning of a token. For example, you can define < as a startjoin character for HTML tagged text. These characters are indexed. 

Enjoin 

Non-alphanumeric characters that occur at the end of a token. For example, you can define > as and endjoin character for HTML tagged text. These characters are indexed. 

 

In the BASIC LEXER preference, ConText defines a default set of characters for each group.

The way you query on tokens that contain these characters depends on how ConText indexes the tokens containing these characters. This is because ConText tokenizes words at query time the same way it tokenizes words at index time. To query on words or numbers that contain special characters, you must know how these words are represented in the index.
 

See Also: 

For more information about defining special characters for the BASIC LEXER preference, see Oracle8 Context Cartridge Administrator's Guide. 

 
 

Querying with Punctuation and Continuation Characters

Punctuation and continuation characters are not indexed with the words they occur next to or with, and thus are ignored by ConText at query time. The following table shows how ConText strips punctuation characters at query time:

Query   Equivalent Query 

'John swims fast. Sharks eat.' 

'John swims fast sharks eat' 

'John swims. Fast sharks eat.' 

'John swims fast sharks eat' 

'{John swims, fast sharks eat}' 

'John swims fast sharks eat' 

'{SHAZAM!}' 

'SHAZAM' 

'{$250}' 

'250' 

'{#101}' 

'101' 

'{phone#}' 

'phone' 

 
  


Suggestion: 

Because ConText strips punctuation characters at query time, leaving them out of the query expression and using the equivalent query might be a better approach, especially when the characters are reserved as in the last five examples. 


 
 

Querying with Printjoins and Skipjoins

Printjoins and skipjoins are characters such as hyphens that join words together.

When you define a character as a printjoin, such as a hyphen, you specify that the words on either side of the hyphen are to be indexed with the hyphen. For example, sister-in-law is indexed as the token sister-in-law.

When you define a character as a skipjoin, such as a hyphen, you specify that the two words on either side of the hyphen are to be indexed as one token without the hyphen. For example, sister-in-law is indexed as sisterinlaw.

To query on words that contain a join character, you must know if the character is defined as a skipjoin or printjoin in the BASIC LEXER preference.

Printjoin Example

If the hyphen character is defined as a printjoin, you must write your query with the hyphen, since the indexed token contains the hyphen. Thus, to query on all the documents that contain the term sister-in-law, you must write your query as follows with the hyphen:

'{sister-in-law}'
 

Note: 

The '-' character must be escaped, or else ConText interprets it as the MINUS operator. 


 
 

Skipjoin Example

When a character is defined a as skipjoin, it is not indexed with the word, therefore you can write queries with or without the skipjoin character.

If the hyphen character is defined as a skipjoin, you can write your query with or without the hyphen. Thus, to query on all documents that contain sister-in-law, you can write your query as one of the following expressions:

'sisterinlaw'
'{sister-in-law}'

You can write your query in two ways, because both queries are lexed to sisterinlaw before index look-up. This also means that the documents retrieved can contain either sisterinlaw or sister-in-law.

Querying with Numjoins and Numgroups

Numjoin and numgroup characters are characters that can appear in numbers, such as the decimal point and the comma.

Numjoin

A numjoin is a character that occurs once in a string of digits, such as a decimal point, and gets indexed with the number. (ConText defines the decimal as a default numjoin character for the BASIC LEXER preference.) For example, the number 3.14 is indexed as 3.14. Thus to query on 3.14 with the decimal point defined as a numjoin character, you write:

'3.14'

When you define the numjoin character to be NULL, Context indexes 3.14 as the two separate numbers 3 and 14.
 


Note: 

When a period follows a number such as at the end of a sentence, ConText knows to index the number without the decimal point. For example, the number fourteen in the following sentence gets indexed as 14 without the period: 

The score was San Francisco 21, Dallas 14.  


 
 

Numgroup

A numgroup is a character such as a comma that groups digits together in a number. Numgroup characters get indexed with the number. (ConText defines the comma as a default numgroup character for the BASIC LEXER preference.) For example, the number 6,344,555 gets indexed as 6,344,555.

To query on a number that contains numgroup characters, you must write the query with the numgroup character. For example, to query on 6,344,555, you write:

'{6,344,555}'

Note that the comma must be escaped.
 


Note: 

When you have the comma defined as a numgroup character, you must query on numbers using the comma. That is, a query on {1,000} does not return documents that contain 1000 without the comma. A better query is with the equivalence operator: 

'{1,000}=1000' 


 
 

When you define the numgroup character as NULL, numbers such as 1,000 get indexed as 1 and 000.

Querying with Startjoin and Endjoin Characters

Startjoin and endjoin characters are non-alphanumeric characters that start and end tokens. These characters are indexed with the token they occur with.

You or your ConText administrator typically define startjoin and endjoin characters when you index tagged text such as HTML. This makes it easy to define sections for section searching as well as to query on the tags themselves.

For example, to query on the tag <HEAD> with < defined as a startjoin and > defined as an endjoin, write your query as follows:

'{<HEAD>}'

In the query above, an escape sequence is necessary, since > is an operator.
 

See Also: 

For more information about section searching, see "WITHIN Operator" in this chapter. 

 
 



Prev

Next
 
Oracle
Copyright © 1998 Oracle Corporation. 
All Rights Reserved. 

Library

Product

Contents

Index