Hit highlighting with full-text searches

Hit highlighting with full-text searches

Post by davies_pr.. » Thu, 24 Aug 2000 04:00:00



I'm creating a 100 million word corpus of historical Spanish texts that
uses full-text queries with SQL Server 7.0, and have a question re. hit
highlighting.

What I need is a way to access the "character offset" info in the full-
text index, which contains the location WITHIN THE RECORD of all
matching hits (e.g. record 314, starting at character 437, record 476,
starting at charatcer 1245, etc).

Using just the INSTR and CHARINDEX functions to find the string within
the record are only adequate for an exact string, e.g.
        "likes cultures"
Here you'd just use INSTR or CHARINDEX to look for this exact string
and then add the highlighting codes.

The problem comes with wildcard and proximity searches.  For example,
imagine that you're searching for
        like culture
        likes cultures
        liked cultures
        liking culture, etc.
The query would be:
        LIK CULTURE*
The CONTAINS query _will_ find the records matching any of these
variants, but the problem is finding and highlighting the hits within
the record itself.  Suppose that the record contains the following
strings:
        likes Pepsi
        liking summer vacation
        liked cultures
        like all of the others
CHARINDEX and INSTR want an exact string to search for, and there isn't
one.  You can do all sorts of algorithms to try and find the string
within the record (e.g LIK CULTURE*) and highlight it, but I haven't
found anything that works well.  The only solution that I can think of
is to use info from the index itself containing the location of the
string in each record.

Any comments?  Thanks in advance,

Mark Davies
Illinois State University

Sent via Deja.com http://www.deja.com/
Before you buy.

 
 
 

1. Hit highlighting with full-text searches

I'm creating a 100 million word corpus of historical Spanish texts that
uses full-text queries with SQL Server 7.0, and have a question re. hit
highlighting.

What I need is a way to access the "character offset" info in the full-
text index, which contains the location WITHIN THE RECORD of all
matching hits (e.g. record 314, starting at character 437, record 476,
starting at charatcer 1245, etc).

Using just the INSTR and CHARINDEX functions to find the string within
the record are only adequate for an exact string, e.g.
        "likes cultures"
Here you'd just use INSTR or CHARINDEX to look for this exact string
and then add the highlighting codes.

The problem comes with wildcard and proximity searches.  For example,
imagine that you're searching for
        like culture
        likes cultures
        liked cultures
        liking culture, etc.
The query would be:
        LIK CULTURE*
The CONTAINS query _will_ find the records matching any of these
variants, but the problem is finding and highlighting the hits within
the record itself.  Suppose that the record contains the following
strings:
        likes Pepsi
        liking summer vacation
        liked cultures
        like all of the others
CHARINDEX and INSTR want an exact string to search for, and there isn't
one.  You can do all sorts of algorithms to try and find the string
within the record (e.g LIK CULTURE*) and highlight it, but I haven't
found anything that works well.  The only solution that I can think of
is to use info from the index itself containing the location of the
string in each record.

Any comments?  Thanks in advance,

Mark Davies
Illinois State University

Sent via Deja.com http://www.deja.com/
Before you buy.

2. sql server7 maintenance plan not working

3. Full-text Search, Index Service and hit-highlighting

4. OpenRoad source code

5. Hit highlighting with full-text searches

6. Error con Delete en un adodc

7. SQL 7.0 Full Text Search and Highlighting

8. HP USERS with Dynamic Server, ALERT!!

9. Highlighting text after text search

10. Highlighting fulltext search results

11. FullText Search and Highlighting