>I've had a fair amount of experience with numerically oriented databases,
>but now I have an application that is entirely text based, and I'm stumped
>as to which package to use. (UNIX platform)
>I'd like to create a database of items and their paragraph-sized
>descriptions. The most important feature of the database will be search
>capability. I'd like users to be able to enter queries in the most
>"natural" language possible (and I'm not sure exactly how natural that can
>be). I'd like to be able to search by category or by key word, and a
>"find similar" feature would be wonderful, as would relevance ranking and
>a thesaurus feature that would return related entries as well as exact
>I realize that this is an awful lot to ask for, but am hoping to get as
>close as possible. The type of search engines used in many web search
>services seem to function much like I'd like our database to work. In
>fact, our interface will be on the web, so compatibility there is
>Does anyone know anything about BRS/Search or have a better suggestion?
Web-oriented text database systems include:
- Glimpse - http://glimpse.cs.arizona.edu/
- GAIS - http://gais.cs.ccu.edu.tw/
This is certainly only a partial listing; various of the "web search
engine vendors" would have interest in offering products as well.
I would note that Glimpse and a related package called Harvest is
in *wide* use across the Internet.
Glimpse may not be suited to "very large text databases," although
what "very large" is this year, and what it will be next year may
be quite different things; as hardware and software improves, what
was unmanageable last year may be simple next year.
To that point, I would note that the text of the King James Bible is about
5.5MB long. A few years ago, that would be considered an *enormous*
document. You don't need to have 20 pages of text to have a file that
big these days with "industry leading" word processor software.
Just for laughs, I ran the text through TeX to see what would happen.
2000 odd pages, processed and displayable in about 3 minutes. Printing
would take rather a while, and would be largely pointless. A rather
bigger document than Windows-based wordprocessors can handle, but
nonetheless easy to process. If I ran a few searches using "agrep,"
it would go *very* quickly, because the whole file would ultimately be
cached in memory.
But I digress.
Web: http://www.conline.com/~cbbrowne SAP Basis Consultant, UNIX Guy
Windows NT - How to make a 100 MIPS Linux workstation perform like an 8 MHz 286