Oracle8 ConText Cartridge Application Developer's Guide
Release 2.4
A63821-01

Library

Product

Contents

Index
 

Prev Next

6
Document Presentation: Highlighting

This chapter describes how ConText query applications can present documents with highlighted information.

The following topics are covered in this chapter:

Overview of Document Presentation

In a typical query application, users can issue text or theme queries. The application executes the query and returns to the user a hitlist, allowing the user to select one or more documents.

When the user chooses a document, ConText enables you to present the selected document with the query terms highlighted for text queries, or with the relevant paragraphs highlighted for theme queries.

Your application can also present linguistic summaries of the selected documents.
 

See Also: 

For more information about linguistic output, see Chapter 7, "ConText Linguistics"

 
 

When developing applications in PL/SQL, you use the CTX_QUERY.HIGHLIGHT procedure to create various forms of highlighted documents that can be presented to users. The source documents can be stored as plain text or in any of the formats ConText supports for text indexing.

For world wide web applications, you can use the ConText viewers to present highlighted documents.
 

See Also: 

For more information about highlighting with ConText viewers, see the Oracle8 ConText Cartridge Workbench User's Guide. 

 
 

Using CTX_QUERY.HIGHLIGHT

Figure 6-1

 

CTX_QUERY.HIGHLIGHT generates highlighting information for text or theme queries. You typically call CTX_QUERY.HIGHLIGHT after executing a text or theme query. With text queries, HIGHLIGHT marks the relevant words or phrases in the document. With theme queries, HIGHLIGHT marks the relevant paragraphs in the document.
 


Note: 

ConText does not do sentence-level theme highlighting. 


 
 

Output

As illustrated in Figure 6-1, CTX_QUERY.HIGHLIGHT can be used to generate the following output for a document:

Output  Description  Table 

Original Document 

Document in native format without highlights. 

NOFILTAB 

Plain Text Document 

Plain text of document without highlights. 

PLAINTAB 

Highlighted Document 

Plain text document with occurrences of the specified word (text query) or paragraph (theme query) highlighted. 

MUTAB 

Offset Information 

Highlight information that identifies the position and length of the query terms or paragraphs found in the source document. 

The positions and lengths of the query terms are specified as offsets from the beginning of the plain text version of the document. 

HIGHTAB 

 
 


Note: 

The filter ConText uses to create the plain text in the PLAINTAB and MUTAB tables is the same filter ConText uses to index the document. 

For more information about supported formats, see Oracle8 ConText Cartridge Administrator's Guide. 


 
  

Note: 

If the document is an HTML document filtered through the internal HTML filter, the marked-up ASCII text version generated by HIGHLIGHT and stored in a MUTAB table retains the original HTML tags from the document. 


 
  
See Also: 

For more information about the structure of the highlight output tables, see "Highlight Table Structures" in Appendix A, "Result Tables"

 
 

Highlighting Mark-up

When you call CTX_QUERY.HIGHLIGHT, you can specify the markup used to indicate the start and end of a highlighted word or phrase for text queries, or the start and end of a highlighted paragraph for theme queries.

When you specify no markup, HIGHLIGHT uses default markup. The default highlighting mark-up produced by HIGHLIGHT differs depending on the format of the source document.

If the source document is an ASCII document or a formatted document, the default highlighting markup is three angle brackets immediately to the left (<<<) and right (>>>) of each term.

If the source document is an HTML document filtered through an external filter, the default highlighting markup is the same as the highlighting markup for plain text or formatted documents (<<< and >>>).

If the source document is an HTML document filtered through the internal HTML filter, the default highlighting markup is the HTML tags used to indicate the start and end of a font change:

Creating Highlighted Text

To present highlighted documents in an application, do the following:

  1. Allocate one or more highlight result tables to store the results.
  2. Issue a query to obtain a list of documents.
  3. Call the CTX_QUERY.HIGHLIGHT procedure for a document from the hitlist.
  4. Display (or otherwise use) the output generated by HIGHLIGHT.
  5. Release the result table(s).

Allocating Result Tables

The result tables required by the HIGHLIGHT procedure can be allocated manually using the CREATE TABLE command in SQL or using the CTX_QUERY.GETTAB procedure.

For example, to create a MUTAB table to store highlighted ascii mark-up, issue the following statement:

create table mu_ascii
id number, 
document long 
);

To create a HIGHTAB table to store highlight offset information, issue the following statement:

create table highlight_ascii
(
id number, 
offset number, 
length number, 
strength number
);
 
See Also: 

For more information about the structure of the highlight output tables, see "Highlight Table Structures" in Appendix A, "Result Tables"

 
 

Issuing a Query

Issue a one-step, two-step, or in-memory query to return a hitlist of documents. You can issue either a text or theme query. For text queries, you call CONTAINS with a text policy; for theme queries, you call CONTAINS with a theme policy. The hitlist provides the textkeys that are used to generate highlight and display output for specified documents in the hitlist.

Calling CTX_QUERY.HIGHLIGHT

Call CTX_QUERY.HIGHLIGHT with a pointer to a document (generally the textkey obtained from the hitlist) and a text or theme query expression.

CTX_QUERY. HIGHLIGHT returns various forms of the specified document that can be further processed or displayed by the application.

ConText uses the query expression specified in the HIGHLIGHT procedure to generate the highlight offset information and marked-up ASCII text. In addition, the offset information is based on the ASCII text version of the document.
 


Note: 

While the query expression is usually the same as the expression used to return documents in the text query, it is not required that the query expressions match. For example, you might allow a user to search for all articles by a particular author and then allow the user to view highlighted references to a specified subject in the returned documents. 


 
 

If the query expression contains a result set operator (first/next, max, threshold), the result set operator is ignored. ConText returns highlight information for the entire result set.
 

See Also: 

For more information about the query expression in HIGHLIGHT, see the CTX_QUERY.HIGHLIGHT specification in Chapter 10

 
 

Text Query Highlighting

To create highlight mark-up for text queries, you must specify a text policy, which is usually the policy you specify with the CONTAINS procedure for the same query. With text queries, the HIGHLIGHT procedure highlights the terms you specify in the query parameter.

For example, to highlight all the occurrences of the term dog with a document identified by textkey 14, issue the following statement:

ctx_query.highlight 
   (
cspec=> 'text_policy',
textkey => '14', 
query => 'dog', 
id=> 14, 
hightab => 'highlight_ascii', 
mutab   => 'mu_ascii' 
   );

Theme Query Highlighting

To create highlight mark-up for a theme query, you must specify a theme policy, which is usually the policy you specify with the CONTAINS procedure for the same query. With theme queries, the HIGHLIGHT procedure highlights the relevant paragraphs in the document.

For example, to highlight all the paragraphs that are relevant to the theme query computers for document with textkey 12, issue the following query:

ctx_query.highlight 
   (
cspec=> 'theme_policy',
textkey => '12', 
query => 'computers', 
id=> 12, 
hightab => 'highlight_ascii', 
mutab   => 'mu_ascii' 
   );

Presenting HIGHLIGHT Output

You can use the MUTAB table to view highlighted ascii text. For example in SQL*Plus, you can issue the following statement to view a MUTAB table called mu_ascii:

select * from mu_ascii order by id; 

You can also use the offset information in the HIGHTAB table to highlight the document in ways that suit your application.

Text Query Highlight Output

With text queries, the word or phrase is highlighted. For example, a text query on dog might produce the following type of highlighted ascii output for a document:

...
The quick brown <<dog>> jumped over the fox.
...

Theme Query Highlight Output

With theme queries, the relevant paragraphs in the document are highlighted. For example, a theme query of computers produces the following type of highlighted ascii output for a document:


<<< LAS VEGAS -- International Business Machines Corp. is using the huge 
computer trade show here this week to try to prove a much disputed marketing 
claim ofthe past year and a half: that its PS/2 line of personal computers 
really does offer unique benefits.>>>                                   
   In the battle for the hearts and minds of the 100,000 dealers, corporate 
customers and other spectators gathered here, IBM has set up a series of 
demonstrations of the Micro Channel, which is the PS/2's internal data pathway. 
The demonstrations seek to show that this pathway has extra flexibility 
that can translate into more speed. One demonstration uses an add-in circuit 
board that IBM claims allows data to be sent over a network about 60% faster. 
Another illustrates a quicker way to store the huge amounts of data handled by a 
so-called file server, the machine that controls a network of personal 
computers.                                                         
  <<< While most personal computers contain just one "master" processor -- the 
chip that tells the various parts of the computer what to do -- the Micro 
Channel allows for more than one. That means that in Micro Channel machines, the 
workhorse central processor can dump lots of work onto another processor, 
freeing itself to go about other tasks.>>> 

...

In this three paragraph excerpt of a news article that satisfies the theme query computers, ConText highlights (with angle brackets) only the paragraphs that are about computers.

Release Highlight Result Tables

After documents have been processed by the HIGHLIGHT procedure and displayed to the user, drop the highlight result tables.

If the tables were allocated using CTX_QUERY.GETTAB, you use CTX_QUERY.RELTAB to release the tables.

If the tables were created manually, drop the tables using the SQL command DROP TABLE.




Prev

Next
 
Oracle
Copyright © 1998 Oracle Corporation. 
All Rights Reserved. 

Library

Product

Contents

Index