Full-text Search, Index Service and hit-highlighting

Full-text Search, Index Service and hit-highlighting

Post by Denetho » Sun, 09 Dec 2001 05:11:54



Hello everyone!

You know, I'm new to the Full-Text in the SQL Server and Index Service, and
am a bit at loss as to which does what.

Well, my question is this:
I have many (large) documents in the image column of the SQL Server (2000)
table. Now I want to make a search on them and highlight the hits,
outputting not the whole document, but a short context of the hit ( like
Google does ).

One way I found is to use the Webhits.dll. The problems are:
1) It is needed to extract the file with the hit to the disk, which is bad,
as the file may be large.
2) I haven't managed the highlight feature to work. The .htw file outputs
the document itself ( it's stored in the virtual directory of the IIS ), but
no hits are highlighted: in face, the .htw says there are NO hits...

Any help would be greatly appreciated.

Thanks,
  Koly

 
 
 

Full-text Search, Index Service and hit-highlighting

Post by Hilary Cotte » Sun, 09 Dec 2001 12:27:36


the problem is that you want to highlight hits to your search phrase.  At
first glance this seems trivial, all you have to do is to go through your
content and replace hits to your search phrase with some formatting in front
of and after the search phrase to highlight it.  The problem is that you
have matches to plural forms and other inflectional forms in freetext
queries.  So a simple search and replace will not capture all the hits,
especially when you consider plurals like goose and geese, or mouse and
mice, or verb forms that are totally irregular like am and is.

If you want to run your page/row through index server hit highlighting you
will have to extract your row into a page, and then run it through
webhits.dll - which provides the hit highlighting function.  You have to do
this through manually clicking a hyperlink.

--
Hilary
www.iisfaq.com

Quote:> Hello everyone!

> You know, I'm new to the Full-Text in the SQL Server and Index Service,
and
> am a bit at loss as to which does what.

> Well, my question is this:
> I have many (large) documents in the image column of the SQL Server (2000)
> table. Now I want to make a search on them and highlight the hits,
> outputting not the whole document, but a short context of the hit ( like
> Google does ).

> One way I found is to use the Webhits.dll. The problems are:
> 1) It is needed to extract the file with the hit to the disk, which is
bad,
> as the file may be large.
> 2) I haven't managed the highlight feature to work. The .htw file outputs
> the document itself ( it's stored in the virtual directory of the IIS ),
but
> no hits are highlighted: in face, the .htw says there are NO hits...

> Any help would be greatly appreciated.

> Thanks,
>   Koly


 
 
 

Full-text Search, Index Service and hit-highlighting

Post by Denetho » Sun, 09 Dec 2001 18:18:58


Quote:> [...] So a simple search and replace will not capture all the hits,
> especially when you consider plurals like goose and geese, or mouse and
> mice, or verb forms that are totally irregular like am and is.

Yes, I know that this is a problem. Actually, my question was: what to do
then. Isn't there some way to do the highlighting without extracting the
file from the SQL Server table?

Quote:> If you want to run your page/row through index server hit highlighting you
> will have to extract your row into a page, and then run it through
> webhits.dll - which provides the hit highlighting function.  You have to
do
> this through manually clicking a hyperlink.

That's OK, but I don't seem to get how to use it. For example, there is a
file Cart.inc in my virtual directory, which contains the word 'Cart'.
To highlight the word 'Cart' in it ( there is this word inside ) I use the
following code:

'Code start

    WebHitsQuery = "CiWebHitsFile=" & "/L2kFX/Cart.inc"

    WebHitsQuery = WebHitsQuery & "&CiRestriction=" & Server.URLEncode(
"Cart" )

  Response.Write( WebHitsQuery ) 'Debug output

%>

<a href="http://localhost/L2kFX/Noname2.htw?<%= WebHitsQuery
%>&CiHiliteType=Full">Highlight Full</a>

<BR>

<%
Response.Write( "CiWebHitsFile=/L2KFX/Cart.inc&CiRestriction=Cart" ) //debug
info
%>
<a
href="http://localhost/L2kFX/qfullhit-r.htw?CiWebHitsFile=/L2KFX/Cart.inc&Ci
Restriction="Cart"">123</a>

'Code End

Anything wrong here? The 'Cart' doesn't seem to be found. Or do I have to
start the Index Servie first? If so, why doesn't it tell me so?

Thanks,
  Koly

 
 
 

Full-text Search, Index Service and hit-highlighting

Post by Hilary Cotte » Wed, 12 Dec 2001 01:41:56


See answers in line
Quote:>-----Original Message-----
>> [...] So a simple search and replace will not capture
all the hits,
>> especially when you consider plurals like goose and
geese, or mouse and
>> mice, or verb forms that are totally irregular like am
and is.

>Yes, I know that this is a problem. Actually, my question
was: what to do
>then. Isn't there some way to do the highlighting without
extracting the
>file from the SQL Server table?

no, maybe there will be in the SQL.NET file system
Quote:

>> If you want to run your page/row through index server

hit highlighting you
Quote:>> will have to extract your row into a page, and then run
it through
>> webhits.dll - which provides the hit highlighting

function.  You have to
Quote:>do
>> this through manually clicking a hyperlink.

>That's OK, but I don't seem to get how to use it. For
example, there is a
>file Cart.inc in my virtual directory, which contains the
word 'Cart'.
>To highlight the word 'Cart' in it ( there is this word
inside ) I use the
>following code:

>'Code start

>    WebHitsQuery = "CiWebHitsFile=" & "/L2kFX/Cart.inc"

>    WebHitsQuery = WebHitsQuery & "&CiRestriction=" &
Server.URLEncode(
>"Cart" )

>  Response.Write( WebHitsQuery ) 'Debug output

>%>

><a href="http://localhost/L2kFX/Noname2.htw?<%=
WebHitsQuery
>%>&CiHiliteType=Full">Highlight Full</a>

><BR>

><%
>Response.Write

( "CiWebHitsFile=/L2KFX/Cart.inc&CiRestriction=Cart" ) //de
bug
Quote:>info
>%>
><a
>href="http://localhost/L2kFX/qfullhit-r.htw?

CiWebHitsFile=/L2KFX/Cart.inc&Ci
Quote:>Restriction="Cart"">123</a>

>'Code End

>Anything wrong here? The 'Cart' doesn't seem to be found.
Or do I have to
>start the Index Servie first? If so, why doesn't it tell

me so?
you need index server running in order to do this.

- Show quoted text -

Quote:>Thanks,
>  Koly

>.

 
 
 

Full-text Search, Index Service and hit-highlighting

Post by Denetho » Wed, 12 Dec 2001 04:09:17


Quote:> See answers in line

Eeerrrm... Pardon my ignorance, but what does that mean?

Thanks,
  Koly

 
 
 

Full-text Search, Index Service and hit-highlighting

Post by Dinesh T » Wed, 12 Dec 2001 04:52:51


Koly,

Hilary meant that she has answered in tandem with your statements.Lemme
reproduce those..

Isn't there some way to do the highlighting without
extracting the

Quote:>file from the SQL Server table?

no, maybe there will be in the SQL.NET file system  ---> Hilary's answer.

Quote:>Anything wrong here? The 'Cart' doesn't seem to be found.
Or do I have to
>start the Index Servie first? If so, why doesn't it tell

me so?
you need index server running in order to do this.---> Hilary's answer.

Pls double-check the perv.reply...I might have missed some.
Dinesh.


Quote:> > See answers in line

> Eeerrrm... Pardon my ignorance, but what does that mean?

> Thanks,
>   Koly

 
 
 

Full-text Search, Index Service and hit-highlighting

Post by Dinesh T » Wed, 12 Dec 2001 06:46:29


Oops! I meant '.....he has answered....'

Dinesh.



> Koly,

> Hilary meant that she has answered in tandem with your statements.Lemme
> reproduce those..

> Isn't there some way to do the highlighting without
> extracting the
> >file from the SQL Server table?
> no, maybe there will be in the SQL.NET file system  ---> Hilary's answer.

> >Anything wrong here? The 'Cart' doesn't seem to be found.
> Or do I have to
> >start the Index Servie first? If so, why doesn't it tell
> me so?
> you need index server running in order to do this.---> Hilary's answer.

> Pls double-check the perv.reply...I might have missed some.
> Dinesh.



> > > See answers in line

> > Eeerrrm... Pardon my ignorance, but what does that mean?

> > Thanks,
> >   Koly

 
 
 

Full-text Search, Index Service and hit-highlighting

Post by Denetho » Thu, 13 Dec 2001 00:08:50


Quote:> Hilary meant that she has answered in tandem with your statements.Lemme
> reproduce those..

Thanks!

Quote:> >Anything wrong here? The 'Cart' doesn't seem to be found.
> Or do I have to
> >start the Index Servie first? If so, why doesn't it tell
> me so?
> you need index server running in order to do this.---> Hilary's answer.

Nope, running Indexing Service doesn't help the situation. Any ideas??

Thanks,
  Koly

 
 
 

Full-text Search, Index Service and hit-highlighting

Post by Denetho » Thu, 13 Dec 2001 00:11:46


Quote:> Hilary meant that she has answered in tandem with your statements.Lemme
> reproduce those..

Thanks!

Quote:> >Anything wrong here? The 'Cart' doesn't seem to be found.
> Or do I have to
> >start the Index Servie first? If so, why doesn't it tell
> me so?
> you need index server running in order to do this.---> Hilary's answer.

Nope, running Indexing Service doesn't help the situation. Any ideas??

Thanks,
  Koly

 
 
 

1. Hit highlighting with full-text searches

I'm creating a 100 million word corpus of historical Spanish texts that
uses full-text queries with SQL Server 7.0, and have a question re. hit
highlighting.

What I need is a way to access the "character offset" info in the full-
text index, which contains the location WITHIN THE RECORD of all
matching hits (e.g. record 314, starting at character 437, record 476,
starting at charatcer 1245, etc).

Using just the INSTR and CHARINDEX functions to find the string within
the record are only adequate for an exact string, e.g.
        "likes cultures"
Here you'd just use INSTR or CHARINDEX to look for this exact string
and then add the highlighting codes.

The problem comes with wildcard and proximity searches.  For example,
imagine that you're searching for
        like culture
        likes cultures
        liked cultures
        liking culture, etc.
The query would be:
        LIK CULTURE*
The CONTAINS query _will_ find the records matching any of these
variants, but the problem is finding and highlighting the hits within
the record itself.  Suppose that the record contains the following
strings:
        likes Pepsi
        liking summer vacation
        liked cultures
        like all of the others
CHARINDEX and INSTR want an exact string to search for, and there isn't
one.  You can do all sorts of algorithms to try and find the string
within the record (e.g LIK CULTURE*) and highlight it, but I haven't
found anything that works well.  The only solution that I can think of
is to use info from the index itself containing the location of the
string in each record.

Any comments?  Thanks in advance,

Mark Davies
Illinois State University

Sent via Deja.com http://www.deja.com/
Before you buy.

2. Numbering Records

3. JAVA and permissions

4. pg_dump dying (and VACUUM ANALYZE woes)...

5. Full-text Search and Indexing service

6. *****US-CHI. UNISYS A-SERIES Programming (with COBOL with DMS II)

7. Full-text search and Indexing Service

8. Help!! Full Text Indexing and Microsoft Search Service

9. SQL 7.0 Full Text Search and Highlighting

10. Full-text searching (Microsoft Search service) goes nuts