I'm creating a 100 million word corpus of historical Spanish texts that
uses full-text queries with SQL Server 7.0, and have a question re. hit
What I need is a way to access the "character offset" info in the full-
text index, which contains the location WITHIN THE RECORD of all
matching hits (e.g. record 314, starting at character 437, record 476,
starting at charatcer 1245, etc).
Using just the INSTR and CHARINDEX functions to find the string within
the record are only adequate for an exact string, e.g.
Here you'd just use INSTR or CHARINDEX to look for this exact string
and then add the highlighting codes.
The problem comes with wildcard and proximity searches. For example,
imagine that you're searching for
liking culture, etc.
The query would be:
The CONTAINS query _will_ find the records matching any of these
variants, but the problem is finding and highlighting the hits within
the record itself. Suppose that the record contains the following
liking summer vacation
like all of the others
CHARINDEX and INSTR want an exact string to search for, and there isn't
one. You can do all sorts of algorithms to try and find the string
within the record (e.g LIK CULTURE*) and highlight it, but I haven't
found anything that works well. The only solution that I can think of
is to use info from the index itself containing the location of the
string in each record.
Any comments? Thanks in advance,
Illinois State University
Sent via Deja.com http://www.deja.com/
Before you buy.