## Multiplying two 16-bit numbers on 68HC11e2

### Multiplying two 16-bit numbers on 68HC11e2

Hi,

I am trying to multiply signed integer numbers (each of 16 bit) on a
Motorola 68HC11E2 chip. The problem I am having is that there are no
accumulators that could do this automatically. ( There is accumulator A
= 8bit accumulator B=8bit and accumulator D = AccA+AccB = 16bit). So I
will have to write some algorithm that will do this (Booth's Algorithm
comes to my mind) calling directly locations in the memory rather than
using registers. But that's yet another thing that could go wrong for a
Mechanical Engineer like myself.

Am I missing something? Is there an easier way of doing this?
Eventually, I will need to divide two 16bit numbers - am I dead or
dreaming? Has anyone written something that would do this ( both
multiplication and division)? Is there a place that I could find this
written, and tested?

Peter

P.S. Reply to e-mail as well as to the groups. Thanks .

### Multiplying two 16-bit numbers on 68HC11e2

> Hi,

> I am trying to multiply signed integer numbers (each of 16 bit) on a
> Motorola 68HC11E2 chip. The problem I am having is that there are no
> accumulators that could do this automatically. ( There is accumulator A
> = 8bit accumulator B=8bit and accumulator D = AccA+AccB = 16bit). So I
> will have to write some algorithm that will do this (Booth's Algorithm
> comes to my mind) calling directly locations in the memory rather than
> using registers. But that's yet another thing that could go wrong for a
> Mechanical Engineer like myself.

> Am I missing something? Is there an easier way of doing this?
> Eventually, I will need to divide two 16bit numbers - am I dead or
> dreaming? Has anyone written something that would do this ( both
> multiplication and division)? Is there a place that I could find this
> written, and tested?

Here's a 16-bit multiply from our SwiftX cross-development system,
using its Forth assembler.  Notation is the same as Motorola's,
except operands (addressing modes, etc.) _precede_ the instruction
mnemonics.

CODE * ( n1 n2 -- n3 )
N0 STD   2 ,X LDD   N2 STD           \ Move params to workspace
N1 LDAA   N3 LDAB   MUL   2 ,X STD
N0 LDAA   N3 LDAB   MUL   2 ,X ADDB   2 ,X STAB
N1 LDAA   N2 LDAB   MUL   2 ,X ADDB   2 ,X STAB
TPOP   RTS   END-CODE

This uses a set of scratch locations in on-board memory, N0 etc.

SwiftX includes a full library of 16-bit and mixed 16/32-bit
integer operators, as well as fixed-point fractions and other stuff.

Cheers,
Elizabeth

--
===============================================
Elizabeth D. Rather  (US & Canada) 800-55-FORTH
FORTH Inc.                      +1 310-372-8493
111 N. Sepulveda Blvd.     Fax: +1 310-318-7130
Manhattan Beach, CA 90266
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
===============================================

### Multiplying two 16-bit numbers on 68HC11e2

There have to be math routines on Motorola's site.

Mitch Berkson

>...
>Is there a place that I could find this written, and tested?

### Multiplying two 16-bit numbers on 68HC11e2

>Hi,

>I am trying to multiply signed integer numbers (each of 16 bit) on a
>Motorola 68HC11E2 chip. The problem I am having is that there are no
>accumulators that could do this automatically. ( There is accumulator A
>= 8bit accumulator B=8bit and accumulator D = AccA+AccB = 16bit). So I
>will have to write some algorithm that will do this (Booth's Algorithm
>comes to my mind) calling directly locations in the memory rather than
>using registers. But that's yet another thing that could go wrong for a
>Mechanical Engineer like myself.

>Am I missing something? Is there an easier way of doing this?
>Eventually, I will need to divide two 16bit numbers - am I dead or
>dreaming? Has anyone written something that would do this ( both
>multiplication and division)? Is there a place that I could find this
>written, and tested?

You'll realize how easy it is if you just take a piece of
paper, write down two numbers that are two digits each,
and multiply them together.  Examine how you did it.
You were only multiplying one digit times one digit
at a time just like the ALU is only multiplying
one byte times one byte.  Then you did a tricky
shift when you multiplied the second time.  Then

In the example above you were using base ten
numbers, each one represented by two digits.
In the one you are going to do with the 16 bit
numbers and it's mechanically the same thing, just
think of the bytes as if they were digits.

It's easier than you imagine.

Tom

### Multiplying two 16-bit numbers on 68HC11e2

Quote:>You'll realize how easy it is if you just take a piece of
>paper, write down two numbers that are two digits each,
>and multiply them together.  Examine how you did it.

This is no longer taught at school!! :-(

When I went to primary school, we all had to learn the multiplication-table.
By the time I reached high school this requirement had been removed. In
early high school we learned long division. I moved schools to discover that
the long division requirement was no longer an educational requirement
because "we all use calculators now". The only division taught is short
division (divisor < 10). Now I hear that this is also not a requirement. Of
course, spelling and grammar have been obsoleted by spellcheckers and now
grammar checkers as well.

My schooling was like walking down a bridge and hearing it fall away behind!
:-)

--
-- Lewin A.R.W. Edwards <http://www.zws.com/>
Realtime/Embedded Programmer & Embedded HW Eng
It is computed that more than eleven thousand persons have on
several occasions suffered death rather than submit to Windows NT.

### Multiplying two 16-bit numbers on 68HC11e2

>>I am trying to multiply signed integer numbers (each of 16 bit) on a
>>Motorola 68HC11E2 chip.

>You'll realize how easy it is if you just take a piece of
>paper, write down two numbers that are two digits each,
>and multiply them together.  Examine how you did it.

That's the simplest algorithm, but there is a faster one, which I know
from Knuth's _Art of Computer Programming Volume 2; Seminumerical
Algorithms_:

to multiply 2 2n-bit numbers u and v

u = 2^n U1 + U0
v = 2^n V1 + V0

where U1 and U0 are the high n bits and low n bits of u, and V1 and V0
are the high and low halves of v.

u * v = 2^2n U1 V1  +  2^n V1 U0  +  2^n U1 V0  +  U0 V0

That's the simple "shoolchild" algorithm we've already covered.

u * v = 2^2n U1 V1  +  2^n (U1 V1 + (U1-U0)(V1-V0) + U0V0)  + U0 V0

This looks more complex, but it only requires three multiplications
instead of four -- the only other operations are shifts, adds and
subtracts.

Ed

### Multiplying two 16-bit numbers on 68HC11e2

Quote:>>When I went to primary school, we all had to learn the

multiplication-table.

Quote:>>By the time I reached high school this requirement had been removed.
>This is astonishing. I wonder when they will remove the requirement to
>write. Judging from some of the stuff one reads here, this is already

I was talking about Australia, where I was educated, but I agree that the
USA has its problems also :-/ They used to print things like the 2x to 12x
tables on the back of exercise books and lecture notepads. Now they print
lists of public holidays and sports records.

I read a newspaper article the other day saying that ~40% of school leavers
have some kind of literacy or numeracy problem. I believe it too!

And people wonder why there are so few "hi-tech" workers coming from
Australia. It's kind of hard to be a credible "hi-tech" worker if you can't
put three words together sensibly :-( I make out with less electronics
skills than many (most?) of the other people in the marketplace, but my
better bullshit skills give me a competitive advantage ;-)

--
-- Lewin A.R.W. Edwards <http://www.zws.com/>
Realtime/Embedded Programmer & Embedded HW Eng
STOP RACISM! Equal rights for Danish & Swiss cheese!

### Multiplying two 16-bit numbers on 68HC11e2

>That's the simplest algorithm, but there is a faster one, which I know
>from Knuth's _Art of Computer Programming Volume 2; Seminumerical
>Algorithms_:

>to multiply 2 2n-bit numbers u and v

>u = 2^n U1 + U0
>v = 2^n V1 + V0

>where U1 and U0 are the high n bits and low n bits of u, and V1 and V0
>are the high and low halves of v.

>u * v = 2^2n U1 V1  +  2^n V1 U0  +  2^n U1 V0  +  U0 V0

>That's the simple "shoolchild" algorithm we've already covered.

>u * v = 2^2n U1 V1  +  2^n (U1 V1 + (U1-U0)(V1-V0) + U0V0)  + U0 V0

>This looks more complex, but it only requires three multiplications
>instead of four -- the only other operations are shifts, adds and
>subtracts.

I wanted him to try the simpliest one because he already knows the
answer to his own question, he just doesn't realize it.  We learned
this simple method a long time ago and, just like tying our own shoes,
we do it automatically by rote.

The algorithm that you give would be most optimal in a ALU that
has no mul or div.  In some architectures which have a single
cycle mul, it would be not so advantagous.

Tom

### Multiplying two 16-bit numbers on 68HC11e2

Quote:>>You'll realize how easy it is if you just take a piece of
>>paper, write down two numbers that are two digits each,
>>and multiply them together.  Examine how you did it.

>This is no longer taught at school!! :-(

>When I went to primary school, we all had to learn the multiplication-table.

Even though I was taught this in about the forth or fifth
grade, I really had no idea what I was doing and it was hard to
remember.  I've never been a big fan of rote learning.
_how_ multidigit multiplication worked that I had an easier time
of it.  That was a big drawback for me in math class, if I
didn't understand it then I couldn't remember the steps.
I have the tendency to want to derive everything
or I get lost.  I have trouble foolwing nonsensical lists
of instructions.

In the middle of my years in primary school the Russians
launched Sputnik, so our government starting teaching
"new math" as a retalitory act =-}.  The teachers were
issued these new textbooks which contained stuff that
they didn't understand and were ordered to teach it to
us.  Needless to say it was like the blind leading the
blind.  By the time I got to later secondary school they dropped
that and taught the classical algebra, geometry,
and trig.  Then the math teachers knew more about what they
were teaching and questions were allowed again =-}.

One of my chief critisisms of teaching math in the
primary and secondary level is that they use too much
rote learning technique and also that they give no examples
about interesting uses of the stuff.  That makes it look
useless.

Tom

### Multiplying two 16-bit numbers on 68HC11e2

On Sun, 07 Mar 1999 15:11:12 -0700, Peter Bubik

>Hi,

>I am trying to multiply signed integer numbers (each of 16 bit) on a
>Motorola 68HC11E2 chip. The problem I am having is that there are no
>accumulators that could do this automatically. ( There is accumulator A
>= 8bit accumulator B=8bit and accumulator D = AccA+AccB = 16bit). So I

Hello,

a complete multiplication of two 16 bit numbers gives a 32 bit result.

The multiplication of signed numbers may be simplified by multipling
the absolut values of both numbers. The sign of the result is negative
if only one of both factors is negative, the sign of the result is
positive if both factors are positive or negative.

Now I use only unsigned numbers.
If we write 'a' for the high order byte of the first factor and 'b'
for the low order byte of the first factor, the value of the first
factor is: a * 256 + b
we use c * 256 + d for the second factor.
the result is:
( a * 256 + b) * (c * 256 + d) =
a * c * 256 * 256 + b * c * 256 + a * d * 256 + b * d

All the multiplications a * c, b * c, a * d and b * d are
multiplications with two 8 bit operands and a 16 bit result.

The 32 bit result is stored in 4 bytes e, f, g and h, e is the highest
order byte and h the lowest.
UB(b * d) means the upper byte of the result of b * d and
LB(b * d) the lower byte.

h = LB(b * d)
g = UB(b * d) + LB(b * c) + LB(a * d)
but the result for g could not be stored in one byte only, we have to
consider the carry from g to f
f = UB(g) + UB(b * c) + UB(a * d) + LB(a * c)
e = UB(f) + UB(a * c)

Not so easy,
what about using a compiler with support of 16 bit signed and unsigned
integers ?

Good luck and bye,
Uwe
--
Uwe Hercksen
Elektronikwerkstatt
Universit?t Erlangen-Nrnberg
Cauerstr. 5
D91058 Erlangen

### Multiplying two 16-bit numbers on 68HC11e2

>>That's the simplest algorithm, but there is a faster one, which I know
>>from Knuth's _Art of Computer Programming Volume 2; Seminumerical
>>Algorithms_:

>>u * v = 2^2n U1 V1  +  2^n (U1 V1 + (U1-U0)(V1-V0) + U0V0)  + U0 V0

>>This looks more complex, but it only requires three multiplications
>>instead of four -- the only other operations are shifts, adds and
>>subtracts.

>I wanted him to try the simpliest one because he already knows the
>answer to his own question, he just doesn't realize it.

That makes sense, of course, but this would be a simple refinement
after he gets the simple version running.

Quote:>The algorithm that you give would be most optimal in a ALU that
>has no mul or div.

Actually, in my experience, the algorithm is also quite useful on
systems with a MUL instruction.  The benefits are especially
significant if you're doing larger multiplies (e.g. 64 or 128-bit
numbers) and you apply this algorithm recursively.

Quote:>In some architectures which have a single
>cycle mul, it would be not so advantagous.

True, but at least in my corner of the embedded world, one is not so
likely to encounter such processors, and the 68HC11 is, of course, not
among them.  For the 68HC11, I haven't gone through all the cycle
counting, but I think it would be an advantage since the multiplies
cost more than four times as much as the adds & subtracts.

The MUL instruction itself takes 10 cycles, and only operates on the A
and B registers, so you'd have to load 'em.  That burns up at least
another 4 cycles for a total of around 14 to multiply two 8-bit
numbers, compared to a 16-bit wide add taking only 5 cycles or 2.5

Ed

### Multiplying two 16-bit numbers on 68HC11e2

> That's the simplest algorithm, but there is a faster one, which I know
> from Knuth's _Art of Computer Programming Volume 2; Seminumerical
> Algorithms_:

> to multiply 2 2n-bit numbers u and v

> u = 2^n U1 + U0
> v = 2^n V1 + V0

> where U1 and U0 are the high n bits and low n bits of u, and V1 and V0
> are the high and low halves of v.

> u * v = 2^2n U1 V1  +  2^n V1 U0  +  2^n U1 V0  +  U0 V0

> That's the simple "shoolchild" algorithm we've already covered.

> u * v = 2^2n U1 V1  +  2^n (U1 V1 + (U1-U0)(V1-V0) + U0V0)  + U0 V0

> This looks more complex, but it only requires three multiplications
> instead of four -- the only other operations are shifts, adds and
> subtracts.

Shouldn't that middle term be (U1-U0)(V0-V1)?  i.e.

u * v = 2^2n U1V1 + 2^n (U1V1 + (U1-U0)(V0-V1) + U0V0) + U0V0

Otherwise the U1V1 and U0V0 in the middle term don't cancel out.

Surely this needs a signed multiply for the (U1-U0)(V0-V1) term, with
appropriate handling for the subsequent additions.

I can't see how this would work out more efficiently if the processor
only has an unsigned multiply instruction.  It would be necessary to
check for negative operands, convert them to positive and calculate the
sign of the product, correct the result as appropriate, and deal with
mixed signed and unsigned addition to complete the 2^n term.

Even with a signed multiply instruction, the U1-U0 and V0-V1 terms are
not representable as an n-bit signed integer: at least one more bit is
needed for the sign, since the difference terms may range between
-(2^n - 1) and +(2^n - 1).

A test case, using 8 * 8 -> 16 for simplicity: multiply 0x1A by 0x1A.

U1 = V1 = 0x1
U0 = V0 = 0xA

U1V1 = 0x01
U0V0 = 0x64

U1-U0 = -0x9
V0-V1 = 0x9

(U1-U0)(V0-V1) = -0x51
U1V1 + (U1-U0)(V0-V1) + U0V0 = 0x14

Shifting and adding: 0x0100 + 0x140 + 0x64 = 0x2A4

Converting back to decimal: 0x1A = 26, and 26 * 26 = 676 = 0x2A4.

If we ignore the sign, and assume a two's complement representation
using n bits for the subtraction result, we get:

U1-U0 = 0x7 (0x10 - 0x9)
V0-V1 = 0x9

(U1-U0)(V0-V1) = 0x3F
U1V1 + (U1-U0)(V0-V1) + U0V0 = 0xA4

Shifting and adding: 0x0100 + 0xA40 + 0x64 = 0xBA4

--
David Empson

Snail mail: P.O. Box 27-103, Wellington, New Zealand

### Multiplying two 16-bit numbers on 68HC11e2

> u * v = 2^2n U1V1 + 2^n (U1V1 + (U1-U0)(V0-V1) + U0V0) + U0V0

I've just done a little hunting and my Algrorithms book (Robert
Sedgewick) has the same algorithm (polynomial multiplication), but
writes the expression as:

u * v = 2^2n U1V1 + 2^n ((U1+U0)(V1+V0) - U1V1 - U0V0) + U0V0

This avoids the problem of a signed multiplication, but still leaves the
issue of more bits being required: U1+U0 and V1+V0 might not fit in N
bits, so you have to do an (N+1 bit) x (N+1 bit) multiply, with the
product potentially needing 2N+2 bits (e.g. 16 bits plus two carry
bits).

Nothing easy has occurred to me yet.  It is still faster to use the
"schoolchild" method on the 'HC11.

--
David Empson

Snail mail: P.O. Box 27-103, Wellington, New Zealand

### Multiplying two 16-bit numbers on 68HC11e2

>> That's the simplest algorithm, but there is a faster one, which I know
>> from Knuth's _Art of Computer Programming Volume 2; Seminumerical
>> Algorithms_:

>> to multiply 2 2n-bit numbers u and v
...
>Shouldn't that middle term be (U1-U0)(V0-V1)?  i.e.

>u * v = 2^2n U1V1 + 2^n (U1V1 + (U1-U0)(V0-V1) + U0V0) + U0V0

>Otherwise the U1V1 and U0V0 in the middle term don't cancel out.

Yes, you're correct.  Thanks for noting that.

Quote:>Surely this needs a signed multiply for the (U1-U0)(V0-V1) term, with
>appropriate handling for the subsequent additions.

>I can't see how this would work out more efficiently if the processor
>only has an unsigned multiply instruction.

If the unsigned multiply operation is expensive, in terms of
processing time, (e.g. the processor has NO muliply instruction at
all) it is pretty easy to verify.  Basically, the *algorithm* is more
and subtractions, but the break-even point in the real world, on real
processors, depends on the relative costs of those operations and the
actual size of the multiplicands relative to the "native" size for the
machine.  If we're doing multiplies of two 64-bit numbers on a 68HC11,
I think it's clear that it's going to come out ahead of the classic
O(n^2) implementation.  If we're just multiplying two 16-bit numbers,
the timing probably depends more heavily on the details of the actual
implementation.  As I mentioned, I haven't done the full analysis of
the cycle counts for that processor -- just a "back of the envelope"
calculation that seems to indicate that it's at least worth looking
into.

Quote:>It would be necessary to
>check for negative operands, convert them to positive and calculate the
>sign of the product, correct the result as appropriate, and deal with
>mixed signed and unsigned addition to complete the 2^n term.

The middle term does need a signed multiply, but this is not too
difficult to effect, just as you've mentioned.  As for mixed signed
and unsigned addition, they all look the same if we're dealing with
two's complement numbers.  OTOH, if we're dealing with signed
magnitude numbers (like Knuth's pedagogical MIX processor) then it's
trivial to take the absolute value of a number, and we still don't
have any particular problem.

Quote:>Even with a signed multiply instruction, the U1-U0 and V0-V1 terms are
>not representable as an n-bit signed integer: at least one more bit is
>needed for the sign, since the difference terms may range between
>-(2^n - 1) and +(2^n - 1).

Actually exactly one "scratch" bit is needed, but more may be used for
convenience.  First, do the first subtraction and take the absolute
value and store the original sign in the scratch bit location.  Do the
second subtraction and take its absolute value, and XOR this sign with
the scratch bit.  Do an unsigned multiply and negate the result if the
scratch bit is set.  The scratch bit does not have to be initialized
to any particular value, and it doesn't matter which sign is
represented by a 1 as long as both tests are consistent and use the
same representation.

On machines like the 68HC11, which don't have a bit complement
instruction, it might be handier to burn up a whole byte-wide register
-- in that case, you'd just zero the reg and increment it for each
time you got, say, a negative number.  After doing the multiply, if
the low bit of the scratch byte is one, complement the result of the
multiplication.  Again, while the algorithm is more efficient, the
details of the particular machine have a great deal to do with whether
it's really more practical in a given situation.  Thanks for the great
questions!

Ed

### Multiplying two 16-bit numbers on 68HC11e2

>I've just done a little hunting and my Algrorithms book (Robert
>Sedgewick) has the same algorithm (polynomial multiplication), but
>writes the expression as:

>u * v = 2^2n U1V1 + 2^n ((U1+U0)(V1+V0) - U1V1 - U0V0) + U0V0

>This avoids the problem of a signed multiplication, but still leaves the
>issue of more bits being required: U1+U0 and V1+V0 might not fit in N
>bits, so you have to do an (N+1 bit) x (N+1 bit) multiply, with the
>product potentially needing 2N+2 bits (e.g. 16 bits plus two carry
>bits).

I've never done it that way, but it might be interesting to try.  It
seems like it be more difficult to implement efficiently in assembly
language.  Any thoughts on that?

Quote:>Nothing easy has occurred to me yet.  It is still faster to use the
>"schoolchild" method on the 'HC11.

You might be right about multiplying 16-bit numbers.  I don't have an
'HC11 assembler handy, but I could probably code up a whole
implementation and count up the cycles manually.

Ed