Search engine language parser

Search engine language parser

Post by Rodney Bro » Wed, 19 Jul 2000 04:00:00



Hi all,

Like all newbies to a given group, I hope that this is on topic here.

My boss has me building a search engine. No problem, I've already got
most of the basic constructs for the various levels of the opperation
figured out.  What I'm sticking on is how to go about defining a set
of logical rules to apply to a search. That is, how to define a query
structure. Once I have a set of rules, I can parse whatever I get just
fine, writing parsers is most of what I do for a living.

Here's an example of what I mean:

- Somebody searches for: "hello world or dolly".
- I look at this as a human and say this would match:
        'hello', 'hello world', and 'hello dolly'
  No problem, simple english tells me that this shouldn't match:
        'hello', 'world', and 'dolly'
  That would negate the need for the "OR". But how do I ~know~ to define a
rule that would account for this? And how do I choose other rules to apply?

Thank you,
Rodney
[This is a pretty simple precedence question, whether to treat the
request as "(hello world) or dolly" or "hello (world or dolly)" -John]

 
 
 

Search engine language parser

Post by Roman Shaposhni » Mon, 24 Jul 2000 04:00:00



Quote:>My boss has me building a search engine. No problem, I've already got
>most of the basic constructs for the various levels of the opperation
>figured out.  What I'm sticking on is how to go about defining a set
>of logical rules to apply to a search. That is, how to define a query
>structure. Once I have a set of rules, I can parse whatever I get just
>fine, writing parsers is most of what I do for a living.

  Well, may be I'm a bit old fashioned, but whenever I meet a search
engine I dream about good old regexps or even eregexps. They are
really powerful and give a lot of flexibility to their users. Thus I
can suggest to start from regexps concept keeping in mind, that
terminals can be whatever I like, not just letters.

Thanks,
Roman.

 
 
 

Search engine language parser

Post by Marc Tardi » Fri, 28 Jul 2000 04:00:00


Quote:> My boss has me building a search engine. No problem, I've already got
> most of the basic constructs for the various levels of the opperation
> figured out.  What I'm sticking on is how to go about defining a set
> of logical rules to apply to a search. That is, how to define a query
> structure. Once I have a set of rules, I can parse whatever I get just
> fine, writing parsers is most of what I do for a living.

Other than not using parentheses, how is your question different than
using the yacc equivalent to:

expression: expression 'or' expression {}
      |     expression 'and' expression {}
      |     '(' expression ')' {}
      ;     /* well, more or less, but you get the idea */

Or is the fact your allowing not to use parentheses the root of your
problem, trying to write precedence rules to interpret whatever the user
is trying to convey?

 
 
 

Search engine language parser

Post by RKRayha » Fri, 28 Jul 2000 04:00:00


In order to impose upon the pattern
 "hello world or dolly"
the semantic
 "hello (world or dolly)"
and be economically useful, you would need an
agreement with the user of your tool that such meaning is to be applied to the
surface structure depicted with the 'or' particle.

That agreement would then need to be suspended when your engine encounters
 "Wayne's World or SNL"
which, naturally must be harnessed to a semantic of
 "(Wayne's World) or SNL".

And either will be an error when you do indeed encounter a true
 "apples oranges or pears"
intention so expressed.

The only thing you have going for you is a set of assumptions. As long
as the assumptions match the tool user's expectations, then you have
semantics that look meaningful (as in 'correct').  Proliferation of
web interfaces that do not know of the 'English or' but only the
'logical or' are tending to get folks used to that as an
assumption. But a tool can be designed anyway you want, it is just
that you may not really just want one thing. Which is how those
infernal parentheses creep into even the most user friendly interface.

The first investors approached by Alexander Graham Bell, turned him
down, saying that the American public could never become accustom to
something so complicated. They missed a chance of a life-time.  If at
all possible don't be afraid to have a little faith in your user. They
usually can understand combinations of connectors, and even
parenthesis, and some of them do understand precedence of operators.

At any rate, it might be a mistake to assume that only one kind of
meaning will occur in the input of the forms you mention.

Hope that helps,

Robert Rayhawk

 
 
 

1. Search engine query language parser

Greetings,

I'm looking for a lex/yacc/bison/flex etc. definition for a parser which
will parse Altavista-like advanced query syntax and can then be used to
generate SQL.

The tool really doesn't matter but need to stay away from interpreted
stuff like Perl.

Thanks!

Warren

wvw at fred dot net
[You might find one in aspseek or mnogo, two popular free web archivers
which use mysql underneath. -John]

2. PROGRAMMING JOBS.....

3. Your own Search Engine - combine 80+ Search Engine

4. debugger doesnt like ansi ppc apps?

5. search-engine, searching for 2 words

6. internet connection

7. GOLD Parser Engine source code is now available

8. Q: AT&T Worldnet Software

9. Web and Internet Search Engine FAQ Ver 3.9

10. Web and Internet Search Engine FAQ (WISE FAQ) Dec'98

11. Web and Internet Search Engine FAQ (Wise FAQ) Oct/1998