>> That's the simplest algorithm, but there is a faster one, which I know
>> from Knuth's _Art of Computer Programming Volume 2; Seminumerical
>> to multiply 2 2n-bit numbers u and v
>Shouldn't that middle term be (U1-U0)(V0-V1)? i.e.
>u * v = 2^2n U1V1 + 2^n (U1V1 + (U1-U0)(V0-V1) + U0V0) + U0V0
>Otherwise the U1V1 and U0V0 in the middle term don't cancel out.
Yes, you're correct. Thanks for noting that.
>Surely this needs a signed multiply for the (U1-U0)(V0-V1) term, with
>appropriate handling for the subsequent additions.
>I can't see how this would work out more efficiently if the processor
>only has an unsigned multiply instruction.
If the unsigned multiply operation is expensive, in terms of
processing time, (e.g. the processor has NO muliply instruction at
all) it is pretty easy to verify. Basically, the *algorithm* is more
efficient because we're trading one multiplication for a few additions
and subtractions, but the break-even point in the real world, on real
processors, depends on the relative costs of those operations and the
actual size of the multiplicands relative to the "native" size for the
machine. If we're doing multiplies of two 64-bit numbers on a 68HC11,
I think it's clear that it's going to come out ahead of the classic
O(n^2) implementation. If we're just multiplying two 16-bit numbers,
the timing probably depends more heavily on the details of the actual
implementation. As I mentioned, I haven't done the full analysis of
the cycle counts for that processor -- just a "back of the envelope"
calculation that seems to indicate that it's at least worth looking
Quote:>It would be necessary to
>check for negative operands, convert them to positive and calculate the
>sign of the product, correct the result as appropriate, and deal with
>mixed signed and unsigned addition to complete the 2^n term.
The middle term does need a signed multiply, but this is not too
difficult to effect, just as you've mentioned. As for mixed signed
and unsigned addition, they all look the same if we're dealing with
two's complement numbers. OTOH, if we're dealing with signed
magnitude numbers (like Knuth's pedagogical MIX processor) then it's
trivial to take the absolute value of a number, and we still don't
have any particular problem.
Quote:>Even with a signed multiply instruction, the U1-U0 and V0-V1 terms are
>not representable as an n-bit signed integer: at least one more bit is
>needed for the sign, since the difference terms may range between
>-(2^n - 1) and +(2^n - 1).
Actually exactly one "scratch" bit is needed, but more may be used for
convenience. First, do the first subtraction and take the absolute
value and store the original sign in the scratch bit location. Do the
second subtraction and take its absolute value, and XOR this sign with
the scratch bit. Do an unsigned multiply and negate the result if the
scratch bit is set. The scratch bit does not have to be initialized
to any particular value, and it doesn't matter which sign is
represented by a 1 as long as both tests are consistent and use the
On machines like the 68HC11, which don't have a bit complement
instruction, it might be handier to burn up a whole byte-wide register
-- in that case, you'd just zero the reg and increment it for each
time you got, say, a negative number. After doing the multiply, if
the low bit of the scratch byte is one, complement the result of the
multiplication. Again, while the algorithm is more efficient, the
details of the particular machine have a great deal to do with whether
it's really more practical in a given situation. Thanks for the great