Understanding CONTAINSTABLE rankings in SQL 2000

Understanding CONTAINSTABLE rankings in SQL 2000

Post by Edward Fretwel » Wed, 09 Oct 2002 19:37:49



Hi All,

Any help on this would be greatly appreciated:

Okay... My exciting data set consists of metadata about two images
available to our users. Each image has a title field and a caption
field as follows:

---Image One---

Title:
Michealangelo's Statue of David ATIMAGEI SR1

Caption:
Michealangelo's Statue of David   Color photography  ATIMAGEI SR1

---Image Two ---

Title:
Beckham, David ATIMAGEI SR1

Caption:
Beckham, David   English football player David Beckham.  ATIMAGEI SR1

--- ---

Both of these fields are indexed using SQL 2000 Full-Text searching
and are then available for users to search via the web site.

What I'm trying to understand is some odd behaviour from SQL Server
when it generates the CONTAINSTABLE rankings for these pictures. When
a user types "david beckham" into our search form, it is parsed into
the following full-text search condition:

ISABOUT(FORMSOF(INFLECTIONAL,david) Weight(1),
FORMSOF(INFLECTIONAL,beckham) Weight(1))
AND ISABOUT(ATDIAGRA weight(.1), ATILLUST weight(.1), ATIMAGEI
weight(.1), ATMAPMAP weight(.1))

Now, when we apply this criteria against the title index, Image Two
(Beckham, David) comes out with the higher ranking of the two. This is
fab and exactly what we want - Image One gets a rank of 50 and Image
Two gets a rank of 55.

However, when we apply the criteria to both fields (by joining the
results of two separate CONTAINSTABLE statements) and combine the
results to get a total rank, something odd happens and Image One
(Michealangelo's David) now comes top despite only containing the word
David twice and not containing the word Beckham at all. This is bad
for our users and I can't understand why it is happening. In this case
the rank for Image One is 105 and Image Two is 82.

Is anyone able to explain this behaviour to me? - If so, is there
anything we can do about it?

Many Thanks,

Edward

 
 
 

Understanding CONTAINSTABLE rankings in SQL 2000

Post by Peter Ston » Wed, 09 Oct 2002 20:16:00


Did you try using one index including both columns ?


Quote:> Hi All,

> Any help on this would be greatly appreciated:

> Okay... My exciting data set consists of metadata about two images
> available to our users. Each image has a title field and a caption
> field as follows:

> ---Image One---

> Title:
> Michealangelo's Statue of David ATIMAGEI SR1

> Caption:
> Michealangelo's Statue of David   Color photography  ATIMAGEI SR1

> ---Image Two ---

> Title:
> Beckham, David ATIMAGEI SR1

> Caption:
> Beckham, David   English football player David Beckham.  ATIMAGEI SR1

> --- ---

> Both of these fields are indexed using SQL 2000 Full-Text searching
> and are then available for users to search via the web site.

> What I'm trying to understand is some odd behaviour from SQL Server
> when it generates the CONTAINSTABLE rankings for these pictures. When
> a user types "david beckham" into our search form, it is parsed into
> the following full-text search condition:

> ISABOUT(FORMSOF(INFLECTIONAL,david) Weight(1),
> FORMSOF(INFLECTIONAL,beckham) Weight(1))
> AND ISABOUT(ATDIAGRA weight(.1), ATILLUST weight(.1), ATIMAGEI
> weight(.1), ATMAPMAP weight(.1))

> Now, when we apply this criteria against the title index, Image Two
> (Beckham, David) comes out with the higher ranking of the two. This is
> fab and exactly what we want - Image One gets a rank of 50 and Image
> Two gets a rank of 55.

> However, when we apply the criteria to both fields (by joining the
> results of two separate CONTAINSTABLE statements) and combine the
> results to get a total rank, something odd happens and Image One
> (Michealangelo's David) now comes top despite only containing the word
> David twice and not containing the word Beckham at all. This is bad
> for our users and I can't understand why it is happening. In this case
> the rank for Image One is 105 and Image Two is 82.

> Is anyone able to explain this behaviour to me? - If so, is there
> anything we can do about it?

> Many Thanks,

> Edward


 
 
 

Understanding CONTAINSTABLE rankings in SQL 2000

Post by John Kan » Thu, 10 Oct 2002 00:53:28


Edward,
Could you confirm that you only have two rows (i.e. only two images) in your
FT enabled table?
If you do, then RANK will never return a meaningful value as part of the
RANKing formula is based upon statistically significant data in order to
generate meaningful values. For more info on RANK, see the SQL 2000 BOL
title "Full-Text Search Recommendations"

Further info on RANK from a SQL Magazine reader & SQL Server PM reply:
a.. Also, this from SQL Magazine and Richard Waymire (SQL Dev. PM) I work
with a database that requires the user to enter keywords based on
credentials such as MCP, MCSE, etc. I store these values in a
full-text-enabled field. I want to rank search results on this data by the
quality of the match. After reading about this topic in SQL Server Books
Online (BOL), I realized that I could use the Containstable facility to
weight the values and return the rankings. How does the Containstable
facility calculate the ranking values? Does 0 indicate a poor match and 1000
the perfect match?

The Containstable facility computes the ranking value on each query, and the
value depends on the current size of the full-text catalog, occurrences of
the words, proximity of the words, and the size of each indexed item. The
weight is an input value that skews how the facility calculates the rank in
favor of one or more of the query terms. A value of 0 indicates a less
relevant match. You will never achieve a value of 1000 in a content-based
query because 1000 means an exact match and only Boolean matches can be
exact. All other matches aren't statistically exact. The Containstable
facility is statistically based. And when you're working with statistics,
the more items you consider, the more meaningful the results. So the
rankings that you receive from a full-text catalog that holds data for only
three rows are less meaningful in relation to one another than the rankings
you receive on these three rows if the full-text catalog holds data for 1
million rows.

Regards,
John


> Did you try using one index including both columns ?



> > Hi All,

> > Any help on this would be greatly appreciated:

> > Okay... My exciting data set consists of metadata about two images
> > available to our users. Each image has a title field and a caption
> > field as follows:

> > ---Image One---

> > Title:
> > Michealangelo's Statue of David ATIMAGEI SR1

> > Caption:
> > Michealangelo's Statue of David   Color photography  ATIMAGEI SR1

> > ---Image Two ---

> > Title:
> > Beckham, David ATIMAGEI SR1

> > Caption:
> > Beckham, David   English football player David Beckham.  ATIMAGEI SR1

> > --- ---

> > Both of these fields are indexed using SQL 2000 Full-Text searching
> > and are then available for users to search via the web site.

> > What I'm trying to understand is some odd behaviour from SQL Server
> > when it generates the CONTAINSTABLE rankings for these pictures. When
> > a user types "david beckham" into our search form, it is parsed into
> > the following full-text search condition:

> > ISABOUT(FORMSOF(INFLECTIONAL,david) Weight(1),
> > FORMSOF(INFLECTIONAL,beckham) Weight(1))
> > AND ISABOUT(ATDIAGRA weight(.1), ATILLUST weight(.1), ATIMAGEI
> > weight(.1), ATMAPMAP weight(.1))

> > Now, when we apply this criteria against the title index, Image Two
> > (Beckham, David) comes out with the higher ranking of the two. This is
> > fab and exactly what we want - Image One gets a rank of 50 and Image
> > Two gets a rank of 55.

> > However, when we apply the criteria to both fields (by joining the
> > results of two separate CONTAINSTABLE statements) and combine the
> > results to get a total rank, something odd happens and Image One
> > (Michealangelo's David) now comes top despite only containing the word
> > David twice and not containing the word Beckham at all. This is bad
> > for our users and I can't understand why it is happening. In this case
> > the rank for Image One is 105 and Image Two is 82.

> > Is anyone able to explain this behaviour to me? - If so, is there
> > anything we can do about it?

> > Many Thanks,

> > Edward

 
 
 

1. Ranking for multiple CONTAINSTABLE in one statement ?

Hi guys

Very quick question to which I expect to receive a very short answer, but
hell, I can always hope, right ?

OK, there are times when I would like to build a query using more than one
CONTAINS statement in the WHERE clause. This is fine, but I'd like to be
able to give the user the option to receive the results ordered by rank.

Problem. I can't think of any way to combine multiple CONTAINSTABLE clauses
and come up with any useful ranking based on the entire set of criteria
(especially since the RANK number is basically useless other than to use as
a sort order within each CONTAINSTABLE. Right now I think my only option is
to only allow ranking if only one CONTAINS clause is used.

Any ideas ?

Thanks

Steve

2. Need to read Access DB from Universe on AIX

3. CONTAINSTABLE RANK different between SQL 7 and 2000

4. Help! Jet.OLEDB is not installed

5. Containstable, variables and SQL Server 2000

6. A Problem about Server Register

7. Help: Understanding MDB vs ADP theology - Access 2002/SQL 2000

8. Possible? Manually set EXPTIME in USER$ to set EXPIRY_DATE in DBA_USERS

9. I don't understand why this worked in 7.0 and now not in 2000

10. VB6 does not understand access 2000/XP database

11. Trying to understand SQL Server, SQL*NET, ODBC interface

12. Trying to understand ODBC SQL*NET to SQL SERVER CONNECTION

13. Ranking (NOT ordering) in an SQL query.