>Hi! I would need a function that receives a string as an argument, and
>returns a UNIQUE number.
>The idea is that, in a database, I have some composed candidate keys of
>string type fields, and I would like to speed the joins adding an
>"artificial" numeric field. It would be perfect if I can obtain that
>number from the concatenation of the components of the composed key.
interested in an answer ...
What you are describing is called "The Truncation Problem" --- how do you map
things in a huge domain (such as char(30) strings) to unique values in a small
domain (say, 16-bit or 32-bit integers). There are plenty of solutions that
give you close-to-unique solutions, and that's what I'd suggest you go after.
These are "hashing" functions, and a good one for you would probably be a
32-bit CRC (cyclical redundancy check) calculation, whose implementation in
"C" you can find in freeware at simtel or other sites. Certainly if you
indexed and joined on such a hash value, but kept the join condition on the
original full text field, you would gain the benefit of a small, dense index.
It would be very unusual, however, if the amount of CPU spent on string
comparisons versus integer comparisons was signifigant. A DBMS usually carries
so much baggage around just working out which type of comparison to use, that
the actual "memcmp" is pretty trivial at the end of it.
You may be disappointed in looking for perfect mappings. There is such a thing
as "perfect hashing", but it has the disadvantage of requiring recalculation
for every row, every time a single row is added. The calculation is not
insignificant, and the "integers" produced tend to creep up into the 64-bit
range.
You may, of course, have a special-case answer laying in wait for you, in your
own database: if the domains of the compound-key components are small, you
might have a way to map each component to small integers; then combine the
parts, by multiplying and adding as you would for mapping a multi-dimensional
array into a flat vector (you know, for array A[M][N][P] stored in V[M*N*P],
A[i][j][k] is at V[(i*N+j)*P+k], where i in 0..M-1, j in 0..N-1, k in 0..P-1).