Databases, Data Structures, B Trees, B+ Trees, Patricia Trees

Databases, Data Structures, B Trees, B+ Trees, Patricia Trees

Post by John M. Selle » Sun, 17 Apr 1994 12:52:15





Quote:>       The problem is this: I have about 6 million words of varying
>           length (1-8 characters roughly) that have to be accessed as      
>           quickly as possible.

Hmmm - lots of words, only up to 8 characters, need lots of speed to
get at everything.  If I was a pessimist, I might be tempted to
assume this was a list of potential passwords ...   Hmmm ...
 
 
 

Databases, Data Structures, B Trees, B+ Trees, Patricia Trees

Post by James Rober » Sun, 17 Apr 1994 18:22:40




>Subject: Databases, Data Structures, B Trees, B+ Trees, Patricia Trees
>Keywords: Databases, Data Structures, B Trees
>Date: Fri, 15 Apr 1994 23:31:39 GMT
>           The problem is this: I can't even get close to 95% even when
>           I use probabilistic algorithms.  B, B+ and Patricia trees have
>           not been much help.
>           Does anyone have *ANY* suggestions?  I welcome them all.
>Thanx in advance,
>gjb

I don't have any exact answers but it seems to me that you need a method to
compact the data base before you do your search. Use the same algorithm to
compact you search word. Something like soundex although thats external.
If i had time and money I would find an compression routine that could
decompress just part of a file. Seach through the compressed stuff and
extract only correct references. Just a though.
----------------------------------------------------------------------

James Roberts Computer Consulting            708/658-9409
----------------------------------------------------------------------

 
 
 

Databases, Data Structures, B Trees, B+ Trees, Patricia Trees

Post by The Little Big Gu » Sun, 17 Apr 1994 08:31:39


Hello All: I have a data structures problem I would like some help with.
           The problem is this: I have about 6 million words of varying
           length (1-8 characters roughly) that have to be accessed as  
           quickly as possible.  I need a data structure, algorithm,
           whatever that will access 95% of the data within 500 or less
           reads.  At 512 bytes per sector, about one half of the data
           can be accessed with that many reads.

           The problem is this: I can't even get close to 95% even when
           I use probabilistic algorithms.  B, B+ and Patricia trees have
           not been much help.

           Does anyone have *ANY* suggestions?  I welcome them all.

Thanx in advance,
gjb

 
 
 

Databases, Data Structures, B Trees, B+ Trees, Patricia Trees

Post by Paul Beardse » Sun, 17 Apr 1994 22:53:29




           "Greg "The Little Big Guy" Bertin" writes:

Quote:

>Hello All: I have a data structures problem I would like some help with.
>           The problem is this: I have about 6 million words of varying
>           length (1-8 characters roughly) that have to be accessed as  
>           quickly as possible.  I need a data structure, algorithm,
>           whatever that will access 95% of the data within 500 or less
>           reads.  At 512 bytes per sector, about one half of the data
>           can be accessed with that many reads.

>           The problem is this: I can't even get close to 95% even when
>           I use probabilistic algorithms.  B, B+ and Patricia trees have
>           not been much help.

>           Does anyone have *ANY* suggestions?  I welcome them all.

>Thanx in advance,
>gjb

I cannot think of a method that will do what you want 95% of the time.
Will this 100% method suffice?

Just hold the words in order in memory or on disk and separate them by
something convenient ('\0' or '\r', perhaps) and perform a binary search.
I.e.  goto the middle (using fseek() perhaps), read twice the number of
characters of your largest word (or 512 to suit stdio), find a delimiter,
compare found word with desired word, if less than then make current
position left bound else make it right bound, goto middle...

2^24 is greater than 6,000,000. So if a binary search takes more than 25
reads you're doing it wrong.

And WTF is a Patricia tree?

--
Paul Beardsell                          SAM Business Systems Ltd
~~~~~~~~~~~~~~                          21 Finn House, Bevenden St, HOXTON,


 
 
 

Databases, Data Structures, B Trees, B+ Trees, Patricia Trees

Post by Robert Patric » Mon, 18 Apr 1994 10:45:33


Excerpts from netnews.comp.databases.theory: 16-Apr-94 Re: Databases,



>           "Greg "The Little Big Guy" Bertin" writes:

>>Hello All: I have a data structures problem I would like some help with.
>>           The problem is this: I have about 6 million words of varying
>>           length (1-8 characters roughly) that have to be accessed as  
>>           quickly as possible.  I need a data structure, algorithm,
>>           whatever that will access 95% of the data within 500 or less
>>           reads.  At 512 bytes per sector, about one half of the data
>>           can be accessed with that many reads.

>>           The problem is this: I can't even get close to 95% even when
>>           I use probabilistic algorithms.  B, B+ and Patricia trees have
>>           not been much help.

>>           Does anyone have *ANY* suggestions?  I welcome them all.

>>Thanx in advance,
>>gjb

>I cannot think of a method that will do what you want 95% of the time.
>Will this 100% method suffice?

>Just hold the words in order in memory or on disk and separate them by
>something convenient ('\0' or '\r', perhaps) and perform a binary search.
>I.e.  goto the middle (using fseek() perhaps), read twice the number of
>characters of your largest word (or 512 to suit stdio), find a delimiter,
>compare found word with desired word, if less than then make current
>position left bound else make it right bound, goto middle...

>2^24 is greater than 6,000,000. So if a binary search takes more than 25
>reads you're doing it wrong.

If you want to add even more speed, create a hash table which divides
the space into smaller parts and use the hash table information as a
starting point for the binary search.