> In my pattern formation algorithm I have to do several tenthousand nearest
> neighbour queries. Usually I end up with a dimension of about d=50 to
> d=100, the number of codewords (test vectors) is approx. k>100.000.
Yep, that's a tough one. 10^11 coordinate comparisons is a lot.
Quote:> AFAIK this means that a kd-tree or any other dimensional splitting algorithm
> is not efficient because I have too few codewords for too many dimension.
> IIRC the efficiency condition is k>2^d.
Indeed. The first "master layer" of a k-d tree in d dimensions has
2^d leaf nodes. So if you have a lot less than that number of points
to input, odds are they'll almost all be empty, whereas the core trick
of a (balanced) k-d tree is to have a small, non-zero and roughly
equal number of entries in every leaf node.
Are these 50 to 100 features all equally descriptive? If they aren't,
you may want to find those that have higher separative power and use
only those to subdivide on, in a lower-dimensional tree, and revert to
brute force searching for the others.
Whatever you do, you can't store that k-d tree as an actual tree, of
course. Use coordinate mapping, instead. E.g. if you normalized all
coordinates to 0..1, isolate the most significant bit from each
coordinate and concatenate them all into a 50-100 bit long integer
number (simulated, if your language doesn't support arbitrary
precision). You can then somewhat quickly extract the sum of leading
bits of all coordinate differences as
to speed up the search. Starting from this you can more quickly find
likely close candidates --- which in turn helps to quickly reject
others in the brute-force search to follow.
Quote:> Is there an efficient algorithm for this kind of problem?
I doubt it --- the problem itself is inefficient by definition, I'd
Even if all the snow were burnt, ashes would remain.