Oracle8 ConText Cartridge Application Developer's Guide
Release 2.4
A63821-01

Library

Product

Contents

Index
 

Prev Next

8
Using CTX_LING

This chapter explains how to use the CTX_LING PL/SQL package in ConText to generate the different types of theme output for English text. It also provides some tips and suggestions for using the output to enhance query applications.

The topics covered in this chapter are:

Overview

Figure 8-1

 

As shown in Figure 8-1, CTX_LING output consists of lists of themes, theme summaries, and Gists. ConText stores the output in either the theme or Gist table. The following table describes the different output as well as how to generate each type:

Table 8-1

Output Type  Description  How to Generate 

List of Themes 

The main concepts of a document. 

You can generate list of themes where each theme is a single word or phrase or where each theme is a hierarchical list of parent themes. 

Call CTX_LING.REQUEST_THEMES with document textkey and a policy. 

Use CTX_LING.SET_FULL_THEMES to enable hierarchical list of themes. 

Gist 

Text in a document that best represents what the document is about as a whole. 

You can generate either paragraph or sentence level Gists. 

Call CTX_LING.REQUEST_GIST with document textkey and a policy. Specify GENERIC for the pov parameter and specify either PARAGRAPH or SENTENCE for the glevel parameter. 

Theme Summary 

Text in a document that best represent a given theme in the document. 

You can generate either paragraph or sentence level theme summaries. 

Call CTX_LING.REQUEST_GIST with document textkey and a policy. Specify the required document theme with the pov parameter and specify either PARAGRAPH or SENTENCE for the glevel parameter 

 

In a query application, you can use CTX_LING output as an alternative to presenting the entire text of a document. For example, you can present some form of CTX_LING output next to each title when you present the hitlist to the user.

Likewise, after the user selects a document from the hitlist, you can also give the user the option of viewing the Gist of a document in addition to or as an alternative to viewing the entire text of a document.

You can use linguistic settings to enable case-conversion for all-uppercase or all-lowercase text, or to change the default size of Gists and theme summaries. 

See Also: 

For more information about linguistic settings, see "Enabling Linguistic Settings" in this chapter. 

 
 

You obtain CTX_LING output (list-of-themes, theme summaries, and gists) by submitting a request using procedures in the CTX_LING PL/SQL package. Table 8-1 describes which procedures to use.

To generate CTX_LING output, the documents must be stored in a column (either directly or indirectly through a pathname to files), and a policy must be attached to the column. 


Note: 

The setup requirements of having text in a column and having a policy for the column apply to ConText indexes (text/theme) as well as ConText linguistics. The procedures for storing text and creating policies are not discussed in this manual. 

For more information about storing text in columns and creating policies for the columns, see Oracle8 ConText Cartridge Administrator's Guide. 


 
 

Linguistic Personality

Requests for CTX_LING output can only be processed by ConText servers running with the Linguistic personality. A ConText server with the Linguistic personality can also have other personalities in its personality mask. Starting up ConText servers is the task of the ConText administrator, through the CTXSYS Oracle user.
 

See Also: 

For more information about the Linguistic personality and starting ConText servers, see Oracle8 ConText Cartridge Administrator's Guide. 

 
 

Services Queue

The Services Queue is used for managing requests for CTX_LING output. Such a request is cached in memory until the requestor uses the CTX_LING.SUBMIT procedure to add the request to the Services Queue. If more than one request for a single document is cached in memory when the user submits the requests, ConText stores all of the requests as a single batch request in the queue.

ConText servers with the Linguistic personality monitor the Service Queue for requests and process the next request in the queue.
 

See Also: 

For more information about the Services Queue, see Oracle8 ConText Cartridge Administrator's Guide. 

 
 

List of Themes

A list of themes is a list of the main ideas of a document. With each theme, ConText returns a weight that measures the strength of the theme relative to the other themes in the document.

You can use a list of themes in a query application as an alternative to presenting the entire text of a document after a query. When used with theme queries, a presentation of a list of themes for a returned document can also help the user select other documents with the same theme.

You generate a list of themes on a per document basis. To generate a list of themes, use CTX_LING.REQUEST_THEMES. You can generate a list of themes in two ways:

Single Themes

You can generate up to fifty themes for each document, using the CTX_LING.REQUEST_THEMES procedure. This procedure writes a single word or phrase that represents the theme to a row in the theme table. The words or phrases that represent the themes are normalized themes derived from the knowledge catalog.

Theme Hierarchies

You can also generate each document theme (up to 50) accompanied by the hierarchical list of parent themes as defined in the knowledge catalog. A theme is related to its parent theme usually by an "is-associated-with" or "is-a-part-of" relationship. For example, a theme of insects belongs to the hierarchical list of parent themes defined as zoology, biology, hard sciences and science and technology.

To enable hierarchical list of themes output, you must use CTX_LING.SET_FULL_THEMES before you call CTX_LING.REQUEST_THEMES.

Generating theme hierarchical information in the theme table helps to match themes with theme summaries generated with CTX_LING.REQUEST_GIST.
 

See Also: 

For more information about generating themes, see "Generating Lists of Themes, Theme Summaries, and Gists" in this chapter. 

 
 

Theme Summaries

A theme summary for a document provides a short summary of the document from a specific point-of-view. You can use theme summaries to present the relevant text (paragraph or sentence) of documents selected by a theme query.

Because a theme summary provides a concise, focused summary for a particular theme in a document, users of a query application can use a theme summary to compare documents with similar themes.

You can generate two types of theme summaries:

A paragraph-level theme summary consists of the paragraph or paragraphs that best represent a single document theme. A sentence-level theme summary consists of the sentence or sentences that best match a single document theme.

To create either paragraph-level or sentence-level theme summaries, use CTX_LING.REQUEST_GIST.

You can control the size of theme summaries with linguistic settings.
 


Note: 

The size settings for theme summaries can only be modified by creating custom setting labels in the administration tool. 


 
  
See Also: 

For more information about how to generate theme summaries, see "Generating Lists of Themes, Theme Summaries, and Gists" in this chapter. 

For more information on specifying linguistic settings, see "Enabling Linguistic Settings" in this chapter. 

 
 

Gists

A Gist for a document provides a summary that reflects all of the themes in the document. In a query application, you can use a Gist to give the user a overall summary of a document returned in a hitlist.

You can generate two types of Gists:

A paragraph-level Gist consists of the document paragraphs that best represent the themes in a document as a whole. A sentence-level Gist is the sentence or sentences that best represent the themes in a document as a whole.

To generate either a paragraph-level or sentence-level Gist, use CTX_LING.REQUEST_GIST.
 


Note: 

The settings for Gist can only be modified by creating custom setting configurations in the GUI administration tool. 


 
  
See Also: 

For more information about how to generate Gists, see "Generating Lists of Themes, Theme Summaries, and Gists" in this chapter. 

For more information on specifying linguistic settings, see "Enabling Linguistic Settings" in this chapter. 

 
 

Generating CTX_LING Output

You can present CTX_LING output (lists of themes, theme summaries, and Gists) as an alternative to presenting entire documents to users after a query. To generate theme and Gist information, follow these steps:

Creating Output Tables

To create a theme table called CTX_THEMES to store the list of themes from REQUEST_THEMES, issue the following SQL statement:

    create table ctx_themes (
        cid        number,
        pk         varchar2(64),
        theme      varchar2(2000),
        weight     number);

To create a Gist table called CTX_GIST to store the Gist or theme summaries from REQUEST_GIST, issue the following SQL statement:

    create table ctx_gist (
        cid        number,
        pk         varchar2(64),
        pov        varchar2(80),
        gist       long);
 

Note: 

Because the combination of the CID (column ID) and PK (primary key) columns in the output tables uniquely identify each document in a text column, you can use the output tables to store theme and Gist information for multiple text columns. You can also choose to create multiple output tables to store the theme and Gist information separately for each text column. 


 
  
See Also: 

For more information about the structure of CTX_LING output tables, see "CTX_LING Output Table Structures" in Appendix A, "Result Tables"

 
 

Creating Composite Textkey Output Tables

To create a theme table whose textkey has two columns, issue the following SQL statement:

    create table ctx_themes
        cid        number,
        pk1        varchar2(64),
        pk2        varchar2(64),
        theme      varchar2(2000),
        weight     number);

To create a Gist table whose textkey has two columns, issue the following SQL statement:

    create table ctx_gist (
        cid        number,
        pk1        varchar2(64),
        pk2        varchar2(64),
        pov        varchar2(80),
        gist       long);
 
See Also: 

For more information about the structure of CTX_LING output tables, see "CTX_LING Output Table Structures" in Appendix A, "Result Tables"

 
 

Generating Lists of Themes, Theme Summaries, and Gists

Table 8-2 describes the different types of CTX_LING output and how to generate each type.

Table 8-2

Output Type  Description  How to Generate 

List of Themes 

The main concepts of a document. 

You can generate list of themes where each theme is a single word or phrase or where each theme is a hierarchical list of parent themes. 

Call CTX_LING.REQUEST_THEMES with document id. 

Use CTX_LING.SET_FULL_THEMES to enable hierarchical list of themes. 

 

Gist 

Text in a document that best represents what the document is about as a whole. 

You can generate either paragraph or sentence level Gists. 

Call CTX_LING.REQUEST_GIST

Specify GENERIC for the pov parameter and specify either paragraph or sentence for the glevel parameter. 

Theme Summary 

Text in a document that best represent a given theme in the document. 

You can generate either paragraph or sentence level theme summaries. 

Call CTX_LING.REQUEST_GIST

Specify the required document theme with the pov parameter and specify either paragraph or sentence for the glevel parameter. 

 

To generate CTX_LING output for a document in a text column, you first call CTX_LING.REQUEST_GISTor CTX_LING.REQUEST_THEMES as described in Table 8-2, then call CTX_LING.SUBMIT to enter these requests in the services queue as a single transaction for that particular document.
 


Note: 

A policy must be defined for a column before you can generate CTX_LING output for the documents in the column. 


 
 

The following example shows how to generate a list of themes and a paragraph-level theme summary. It assumes the tables ctx_themes and ctx_gist have already been created:

declare handle number;
begin
ctx_ling.request_themes('CTXSYS.DOC_POLICY','7039','CTXSYS.CTX_THEMES');
ctx_ling.request_gist('CTXSYS.DOC_POLICY','7039','CTXSYS.CTX_GIST',
                      'PARAGRAPH', 'Oracle Corporation');

handle := ctx_ling.submit; 
end;

The first call requests a list of themes from document 7039, stored in a column identified by the DOC_POLICY policy. The second call requests a paragraph-level theme summary for Oracle Corporation from the same document. The list of themes and theme summary that ConText generates is stored in the CTX_LING output tables (ctx_themes and ctx_gists), which were created beforehand.

The call to CTX_LING.SUBMIT submits the requests as one batch request to the services queue and returns a handle which can be used to monitor the status of the request. Because the two requests are submitted as one batch request, ConText generates the theme and Gist output in only one linguistic processing cycle. 

See Also: 

For more examples on generating Gists and theme summaries, refer to CTX_LING.REQUEST_GIST in Chapter 10

For more examples on generating lists of themes, refer to CTX_LING.REQUEST_THEMES in Chapter 10

 
 

Generating Theme Hierarchical Information

By default, ConText generates single themes when you request a list of themes with CTX_LING.REQUEST_THEMES. To generate the hierarchical theme information, you must set the full themes flag to TRUE with CTX_LING.SET_FULL_THEMES. A hierarchical list-of-themes contains single themes accompanied by its parent themes as defined in the knowledge catalog. A theme is related to its parent theme usually by an "is-a-part-of" relationship.

Generating theme hierarchical information helps to match themes with the theme summaries generated with CTX_LING.REQUEST_GIST.

The following examples illustrates the difference between single theme output and hierarchical theme output.

Examples

The following SQL statements generate and output single theme information for a document identified by pk:

SQL> exec ctx_ling.request_themes('ctx_thidx', pk, 'ctx_themes')  
SQL> exec ctx_ling.submit(200)  
SQL> select theme from ctx_themes;  
 
THEME 
-------------------------------------------------------------------------------
NASDAQ - National Association of Securities Dealers Automated Quotation System 
stocks 
indexes 
weakness 
composites 
prices 
franchises 
shares 
cellularity 
declining issues 
measures 
analysts 
OTC 
purchases 
Wall Street 
lows 
 
16 rows selected. 

However, when you set the full themes flag to TRUE, ConText generates theme hierarchical information:

SQL> exec ctx_ling.set_full_themes(TRUE)  
SQL> exec ctx_ling.request_themes('ctx_thidx', pk, 'ctx_themes')  
SQL> exec ctx_ling.submit(200)  
SQL> select theme from ctx_themes  

THEME 
-------------------------------------------------------------------------------
:stock market:NASDAQ - National Association of Securities Dealers Automated 
Quotation System: 
:stock market:stocks: 
:catalogs, itemization:indexes: 
:weakness, fatigue:weakness: 
:combination, mixture:composites: 
:retail trade industry:prices: 
:business fundamentals:franchises: 
:possession, ownership:shares: 
:cellularity: 
:stock market:declining issues: 
:analysis, evaluation:measures: 
:analysis, evaluation:analysts: 
:OTC: 
:general commerce:purchases: 
:general investment:Wall Street: 
:bottoms, undersides:lows:

Combining Queries with CTX_LING Output

Generating a list of themes is a good way of extending theme or text queries. For a document in a query hitlist, the user can learn more about the document by reading a list-of themes or Gist.

For example, suppose a theme query on music returns a hitlist containing 20 documents. If these documents are lengthy, the user might not want to read every single document to find out what each is about. Rather than return to the user the document text, you can return a list of themes or a Gist for each document for the user to skim.

Implementation

Generally, you can generate CTX_LING output for a document set at two different times:

Generating CTX_LING output at Indexing Time

You can generate CTX_LING output at indexing time; that is, generate output before queries are issued against the document set. When you do so, the CTX_LING output is returned to the user immediately, since the output was already created.

However, while the retrieval time for the CTX_LING output is good, the drawback to this method is that you have to maintain a permanent theme or Gist output table, using your own triggers to keep it updated. A permanent output table for an entire document set also takes up system disk space.

Generating CTX_LING output at Query-Time

You could also generate CTX_LING output after executing a query. The advantage of generating themes as needed is that the output table lasts only for the user session; you need not maintain a permanent CTX_LING output table for all your documents.

However, generating CTX_LING output takes time depending on the number of documents, the length of the documents, and how your linguistic servers are configured. A user might not want to wait a few minutes for a ConText query application to process a large number of documents.

The example below shows how to generate CTX_LING output after a theme query.

Example

The following PL/SQL code illustrates how to generate a list of themes for every document in a hitlist table returned from a theme query on birds. (You can use the same method to loop through any text table, once the text column table has a policy attached to it.)

create or replace procedure get_theme IS
handle number;

cursor ctx_cur is
                select textkey from ctx_temp;

BEGIN

ctx_query.contains('DOWTHEME', 'birds', 'ctx_temp');

for ctx_cur_rec in ctx_cur loop
    ctx_ling.request_themes('DOWPOLICY' , ctx_cur_rec.textkey, \
           'ctx_themes');
   handle:= ctx_ling.submit;
end loop;

END;
/

This routine first declares a cursor that selects the rows from the ctx_temp result table, to be populated with a theme query on birds.

The cursor FOR loop opens the cursor, executing the select statement that copies all textkeys in the ctx_temp table to the cursor. The loop index ctx_cur_rec is implicitly defined as a cursor record of type%ROWTYPE.

Every iteration of the loop calls the CTX_LING.REQUEST_THEMES procedure with the document textkey derived from ctx_cur_rec. Each request is submitted to the services queue with CTX_LING.SUBMIT, which returns a handle.

The theme output is written to the ctx_themes table.

Enabling Linguistic Settings

The default linguistic setting of GENERIC is active whenever you initialize linguistics to create theme indexes, theme highlighting or to generate CTX_LING output.

You can enable a linguistic setting other than the default (GENERIC) when you want to process all lower-case or all upper-case text, or when you want to change the sizes of Gists and theme summaries. When you enable a linguistic setting for a session, the setting applies only to that session.

The settings for case-conversion (GENERIC or SA) are pre-defined. However, to change the size of Gists and theme summaries, you must create a custom setting with the administration tool.

To enable either a case-conversion setting or a custom setting created with the administration tool, use the CTX_LING.SET_SETTINGS_LABEL procedure with a setting label. For example, to process all-uppercase or all-lowercase text for your current session:

execute ctx_ling.set_settings_label('SA')

The specified setting configuration is active for your session until SET_SETTINGS_LABEL is called with a new setting configuration label.

You can use the CTX_LING.GET_SETTINGS_LABEL function to return the label for the active setting configuration for the current session.
 

See Also: 

For more information about creating custom settings, refer to the online help system for the administration tool. 

 
 

Monitoring the Services Queue

When you submit a request to the services queue with CTX_LING.SUBMIT, a handle is returned. With this handle, you can use procedures in the CTX_SVC package to perform the following tasks:

Monitoring the Status of Requests

To monitor the status of requests in the Services Queue, use the CTX_SVC.REQUEST_STATUS function. This function returns one of the following statuses:

Status  Meaning 

PENDING 

The request has not yet been picked up by a ConText server. 

RUNNING 

The request is being processed by a ConText server. 

ERROR 

The request errored. 

SUCCESS 

The request completed successfully. 

 

For example, the following PL/SQL procedure submits a request to generate themes and gist for a document with an id of 49. It then checks the status of the request.

 CREATE OR REPLACE PROCEDURE GENERATE_THEMES AS 
  
   v_Handle number; 
   v_Status varchar2(10); 
   v_Time   date; 
   v_Errors varchar2(60);  
  
BEGIN 
  DBMS_OUTPUT.PUT_LINE('Begin generate_themes procedure' ); 
  
  ctx_ling.request_themes('CTXDEMO.DEMO_POLICY', '49', 'CTXDEMO.ctx_themes' ); 
  ctx_ling.request_gist('CTXDEMO.DEMO_POLICY', '49', 'CTXDEMO.ctx_gist' ); 
  v_Handle := ctx_ling.submit; 
  
  DBMS_OUTPUT.PUT_LINE( v_Handle ); 
  
  v_Status := ctx_svc.request_status( v_Handle, v_Time, v_ErrorS );  
  DBMS_OUTPUT.PUT_LINE( v_Status );  
  DBMS_OUTPUT.PUT_LINE( v_Time ); 
  DBMS_OUTPUT.PUT_LINE( substr( v_Errors, 1, 20 ) ); 
  
  EXCEPTION 
    WHEN OTHERS THEN 
       DBMS_OUTPUT.PUT_LINE(' Exception handling' ); 
  
END GENERATE_THEMES; 

This procedure binds the return value of REQUEST_STATUS to v_Status for the linguistic request identified by v_Handle. The value for v_Handle is returned by the call to CTX_LING.SUBMIT which placed the requests for the themes and gists in the Services Queue.

Removing Pending Requests

To remove requests with a status of PENDING from the Services Queue, use the CTX_SVC.CANCEL procedure.

For example:

execute ctx_svc.cancel(3321)

In this example, a pending request with handle 3321 is removed from the Services Queue.

If a request has a status of RUNNING, ERROR, or SUCCESS, it cannot be removed from the Services Queue.

Clearing Requests with Errors

To remove requests with a status of ERROR from the Services Queue, use the CTX_SVC.CLEAR_ERROR procedure.

For example:

execute ctx_svc.clear_error(3321)

In this example, a request with handle 3321 is removed from the Services Queue.

If a value of 0 (zero) is specified for the handle, all requests with a status of ERROR are removed from the queue. If a request has a status of PENDING, RUNNING, or SUCCESS, it cannot be removed from the queue using CLEAR_ERROR.

Specifying Completion and Error Procedures

To specify a procedure to be called when a linguistic request completes or errors, use the SET_COMPLETION_CALLBACK and SET_ERROR_CALLBACK procedures in CTX_LING. ConText invokes the procedure defined by SET_COMPLETION_CALLBACK after it processes a linguistic request; ConText invokes the procedure defined by SET_ERROR_CALLBACK when it encounters an error.

The following is an example of how to define and use a completion callback procedure. This example is taken from genling.sql in the ctxling demonstration provided with the ConText installation.

For every linguistic request processed, ling_comp_callback keeps track of the number articles processed by decrementing num_docs, previously defined as the number of articles in the table. The procedure also keeps track of any errors by incrementing num_errors.

create or replace procedure LING_COMP_CALLBACK 
     p_handle in number, 
     p_status in varchar2,
     p_errors in varchar2
   ) IS
     l_total number;
     l_pk    varchar2(64);
BEGIN

 -- decrement the count in the tracking table

 update ling_tracking set num_docs = num_docs - 1;

 -- if the request errored, mark the errors in the pending table

 IF (p_status = 'ERROR') then
     update ling_tracking set num_errros = num_errors + 1;
 end IF;
 commit;

END;
/

The following code is an anonymous PL/SQL block that sets the linguistic completion callback procedure to ling_comp_callback and then generates CTX_LING output for every document in the articles table:

declare
  cursor c1 is select article_id
                    from articles;
  l_handle number;

begin

-- set the completion callback procdure to keep the pending table
-- in sync with the number of documents processed (completed requests) 
-- and the number of errored requests.

   ctx_ling.set_completion_callback('LING_COMP_CALLBACK');
end;

-- loop through all articles in the article table, requesting themes
-- and gists
--

for crec in c1 loop

ctx_ling.request_themes('DEMO_POLICY', crec.article_id, 'ARTICLE_THEMES');
ctx_ling.request_gist('DEMO_POLICY', crec.article_id, 'ARTICLE_GISTS');
l_handle := ctx_ling.submit;

end loop;

end;

Logging Parse Information

At start-up of a ConText server, the logging of linguistic parse information is disabled by default.

To enable logging of the parse information generated by ConText linguistics during a session, use the CTX_LING.SET_LOG_PARSE procedure.

For example:

        execute ctx_ling.set_log_parse('TRUE')

Once you enable parse logging for a session, it is active until you explicitly disable it during the session. You can use the CTX_LING.GET_LOG_PARSE function to know whether parse logging is enabled or disabled for the session.
 


Attention: 

Parse logging is a useful feature if you are having difficulty generating CTX_LING output and you want to monitor how ConText is parsing your documents; however, parse logging may affect performance considerably. As such, you should only enable parse logging if you encounter problems with generating CTX_LING output. 


 
 



Prev

Next
 
Oracle
Copyright © 1998 Oracle Corporation. 
All Rights Reserved. 

Library

Product

Contents

Index