Hit highlighting with full-text searches

Hit highlighting with full-text searches

Post by mdav.. » Thu, 24 Aug 2000 04:00:00



I'm creating a 100 million word corpus of historical Spanish texts that
uses full-text queries with SQL Server 7.0, and have a question re. hit
highlighting.

What I need is a way to access the "character offset" info in the full-
text index, which contains the location WITHIN THE RECORD of all
matching hits (e.g. record 314, starting at character 437, record 476,
starting at charatcer 1245, etc).

Using just the INSTR and CHARINDEX functions to find the string within
the record are only adequate for an exact string, e.g.
        "likes cultures"
Here you'd just use INSTR or CHARINDEX to look for this exact string
and then add the highlighting codes.

The problem comes with wildcard and proximity searches.  For example,
imagine that you're searching for
        like culture
        likes cultures
        liked cultures
        liking culture, etc.
The query would be:
        LIK CULTURE*
The CONTAINS query _will_ find the records matching any of these
variants, but the problem is finding and highlighting the hits within
the record itself.  Suppose that the record contains the following
strings:
        likes Pepsi
        liking summer vacation
        liked cultures
        like all of the others
CHARINDEX and INSTR want an exact string to search for, and there isn't
one.  You can do all sorts of algorithms to try and find the string
within the record (e.g LIK CULTURE*) and highlight it, but I haven't
found anything that works well.  The only solution that I can think of
is to use info from the index itself containing the location of the
string in each record.

Any comments?  Thanks in advance,

Mark Davies
Illinois State University

Sent via Deja.com http://www.deja.com/
Before you buy.

 
 
 

1. Full-text Search, Index Service and hit-highlighting

Hello everyone!

You know, I'm new to the Full-Text in the SQL Server and Index Service, and
am a bit at loss as to which does what.

Well, my question is this:
I have many (large) documents in the image column of the SQL Server (2000)
table. Now I want to make a search on them and highlight the hits,
outputting not the whole document, but a short context of the hit ( like
Google does ).

One way I found is to use the Webhits.dll. The problems are:
1) It is needed to extract the file with the hit to the disk, which is bad,
as the file may be large.
2) I haven't managed the highlight feature to work. The .htw file outputs
the document itself ( it's stored in the virtual directory of the IIS ), but
no hits are highlighted: in face, the .htw says there are NO hits...

Any help would be greatly appreciated.

Thanks,
  Koly

2. Looking For Eastern Europe Contacts

3. Hit highlighting with full-text searches

4. ADO and SYSTEM.MDW

5. Correct record Identification

6. SQL 7.0 Full Text Search and Highlighting

7. JNDI JDBC TOMCAT MYSQL pb:no suitable driver

8. Highlighting text after text search

9. Highlighting fulltext search results

10. FullText Search and Highlighting