> I"m looking for a simple (read cheap -- free to several $100's) search
> engine to use on a web site, that will search PDF files; i.e. the user
> will be presented with a search text field where she types in some
> keywords and receives a list of links to some pdf files that matched,
> hopefully with some info about degree of relevancy. The package should
> first index the files (1000 files at most, each varying from 1-10
> pages), so the web-site search engine should utilize this index via some
> cgi interface. Our site is hosted on a Unix server (Free BSD) running
> Apache.The software should not require any root-privilieged
> installation or modification of the web server or ... -- we're hosted on
> a host-providers machine and have limited access. We do not require
> (but it would be nice) that the search link to the location of the
> phrase in the pdf file, but just to the file itself.
> All the PDF-enabled search engines I find are (1) expensive & (2)
> require a lot of installation. Does anyone have any info on such?
> Dave Van Rooy
Not sure of any Unix solutions.
Acrobat Catalog should be installable on some servers. Its problem would be
that it would ONLY search PDF files and not HTML. Details on the catalog
file internal structures are found in http://www.tinaja.com/acrob01.html
Microsoft IIS 3.0 with the Adobe plug-in worked reasonably for a lot of
people, and searches every reasonable file type.
We had SEVERE installation problems with IIS 4.0 just at the very same time
that Adobe mysteriously and suddenly pulled all their IIS plug in files from
CiTemplate errors are not for the faint of heart or the nonpersistent.
A resonable guess is that an improved Adobe plug in is imminent. 4.0
compatible at the least, and possibly solving the
byte-range-deliver-from-a-search-result problem as well.
Synergetics Press 3860 West First Street Box 809 Thatcher, AZ 85552
Visit my GURU's LAIR web site at http://www.tinaja.com
Know your acronymns: url = utterly rancid location
net = not entirely true
www = world wide wait