> In my pattern formation algorithm I have to do several tenthousand nearest

> neighbour queries. Usually I end up with a dimension of about d=50 to

> d=100, the number of codewords (test vectors) is approx. k>100.000.

Yep, that's a tough one. 10^11 coordinate comparisons is a lot.

Quote:> AFAIK this means that a kd-tree or any other dimensional splitting algorithm

> is not efficient because I have too few codewords for too many dimension.

> IIRC the efficiency condition is k>2^d.

Indeed. The first "master layer" of a k-d tree in d dimensions has

2^d leaf nodes. So if you have a lot less than that number of points

to input, odds are they'll almost all be empty, whereas the core trick

of a (balanced) k-d tree is to have a small, non-zero and roughly

equal number of entries in every leaf node.

Are these 50 to 100 features all equally descriptive? If they aren't,

you may want to find those that have higher separative power and use

only those to subdivide on, in a lower-dimensional tree, and revert to

brute force searching for the others.

Whatever you do, you can't store that k-d tree as an actual tree, of

course. Use coordinate mapping, instead. E.g. if you normalized all

coordinates to 0..1, isolate the most significant bit from each

coordinate and concatenate them all into a 50-100 bit long integer

number (simulated, if your language doesn't support arbitrary

precision). You can then somewhat quickly extract the sum of leading

bits of all coordinate differences as

count_bits(bitwise_and(bitstring(query), bitstring(set[i])))

to speed up the search. Starting from this you can more quickly find

likely close candidates --- which in turn helps to quickly reject

others in the brute-force search to follow.

Quote:> Is there an efficient algorithm for this kind of problem?

I doubt it --- the problem itself is inefficient by definition, I'd

say.

--

Even if all the snow were burnt, ashes would remain.