PDF Search Engines

PDF Search Engines

Post by Dave Van Roo » Thu, 07 May 1998 04:00:00



I"m looking for a simple (read cheap -- free to several $100's) search
engine to use on a web site, that will search PDF files; i.e. the user
will be presented with a search text field where she types in some
keywords and receives a list of links to some pdf files that matched,
hopefully with some info about degree of relevancy.  The package  should
first index the files (1000 files at most, each varying from 1-10
pages), so the web-site search engine should utilize this index via some
cgi interface.  Our site is hosted on a Unix server (Free BSD) running
Apache.The software  should  not require any root-privilieged
installation or modification of the web server or ... -- we're hosted on
a host-providers machine and have limited access.  We do not require
(but it would be nice) that the search link to the location of the
phrase in the pdf file, but just to the file itself.

All the PDF-enabled search engines I find are (1) expensive & (2)
require a lot of installation.   Does anyone have any info on such?

TIA,
Dave Van Rooy

 
 
 

PDF Search Engines

Post by Don Lancaste » Thu, 07 May 1998 04:00:00



> I"m looking for a simple (read cheap -- free to several $100's) search
> engine to use on a web site, that will search PDF files; i.e. the user
> will be presented with a search text field where she types in some
> keywords and receives a list of links to some pdf files that matched,
> hopefully with some info about degree of relevancy.  The package  should
> first index the files (1000 files at most, each varying from 1-10
> pages), so the web-site search engine should utilize this index via some
> cgi interface.  Our site is hosted on a Unix server (Free BSD) running
> Apache.The software  should  not require any root-privilieged
> installation or modification of the web server or ... -- we're hosted on
> a host-providers machine and have limited access.  We do not require
> (but it would be nice) that the search link to the location of the
> phrase in the pdf file, but just to the file itself.

> All the PDF-enabled search engines I find are (1) expensive & (2)
> require a lot of installation.   Does anyone have any info on such?

> TIA,
> Dave Van Rooy

Not sure of any Unix solutions.

Acrobat Catalog should be installable on some servers. Its problem would be
that it would ONLY search PDF files and not HTML. Details on the catalog
file internal structures are found in http://www.tinaja.com/acrob01.html

Microsoft IIS 3.0 with the Adobe plug-in worked reasonably for a lot of
people, and searches every reasonable file type.

We had SEVERE installation problems with IIS 4.0 just at the very same time
that Adobe mysteriously and suddenly pulled all their IIS plug in files from
their website.

CiTemplate errors are not for the faint of heart or the nonpersistent.

A resonable guess is that an improved Adobe plug in is imminent. 4.0
compatible at the least, and possibly solving the
byte-range-deliver-from-a-search-result problem as well.

--
Many thanks,

  Don Lancaster

  Synergetics Press  3860 West First Street  Box 809  Thatcher, AZ 85552

  Visit my GURU's LAIR web site at http://www.tinaja.com

  Know your acronymns:  url = utterly rancid location
                        net = not entirely true
                        www = world wide wait

 
 
 

1. is there a pdf search engine or pdf to plain text converter?

i have a whole bunch of pdf files that i need to post on my webpage, and
i want to be able to allow the readers to search through the file.  The
easiest way would be if there is already a web search engine out there
that can read pdf files.  If not, I figure another solution might be to
convert the pdf files to text files temporarily, and feed that file into
the search engine that I am currently using.  So if anyone has any info
about pdf search engines or pdf->text converters (for unix), I would
very much appreciate it.  If possible, can you email me at

thanks very much for your help!

Alan

2. Sending message from a Service to another Window in another application

3. PDF Search Engine?

4. Developer's Kit for TI DSP

5. PDF search engine spider

6. default mail handler IE6

7. "Reasonably Priced" PDF search engine for web

8. Lexmark X73 Printer

9. Pdf Search Engine

10. PDF search engine

11. PDF Search Engines

12. Is there such a PDF search engine?

13. W3 and PDF search engine?