## RPN as intermediate code?

### RPN as intermediate code?

I am struggling trying to convert my C interpreter from a tree-based
design to an intermediate-code design.   I have, up until now simply
used YACC to produce my parse tree, and then just stepped through the
tree executing the code.  This has worked fine for most things,
including if/else, compound statements, and for/while loops.

I now want to move to storing (and executing) some form of intermediate
code.  I really like the idea of encoding into an 'execution string', like
bc does, and reverse polish works great for this, as long as I stick to
encoding expressions.  But this model falls apart when I get into things
like conditionals and loops, and I am completely baffled how to do
short-circuit logic like this.  For instance, I can compile the following
code into the RPN strings shown (which execute easily):

x = a*(b+c)             ->  "a b c + * x ="
y = sqrt(b*b-4*a*c)     ->  "b b * 4 a * c * - 1 sqrt callfunc y ="
z = foo(a,bar(b),c)     ->  "a b 1 bar callfunc c 3 foo callfunc z ="

(actually, all the identifiers and constants go into a dictionary
and are replaced by a lookup index, but you get the idea...)

However, when I get into conditionals and loops, I have absolutely no idea
how to proceed.  Do I really have to get all the way down to the level of
computing jump offsets and implement these as conditional gotos?  Or is
there a more elegant way that I just haven't found yet?

And why can't I find any literature about this sort of thing?  All the
compiler books I can find say 'Here is reverse polish.  Its a good
intermediate representation because its stack-like', but then they go off
and do something completely different.

Anyways, I'll take any help I can get here.

[I've used conditional jumps with ofsets.  They're ugly, but they're not hard
to do.  The alternative is to have tokens like WHILE  ENDWHILE, IF THEN
ELSE ENDIF and have the interpreter scan over parts you're skipping and keep
a stack of places to retur tom but that's a lot of scanning and skipping.
-John]
--

### RPN as intermediate code?

>I am struggling trying to convert my C interpreter from a tree-based
>design to an intermediate-code design.

(snip)

Quote:>I now want to move to storing (and executing) some form of intermediate
>code.  I really like the idea of encoding into an 'execution string', like
>bc does, and reverse polish works great for this, as long as I stick to
>encoding expressions.  But this model falls apart when I get into things
>like conditionals and loops, and I am completely baffled how to do
>short-circuit logic like this.  For instance, I can compile the following
>code into the RPN strings shown (which execute easily):

(snip)

Quote:>However, when I get into conditionals and loops, I have absolutely no idea
>how to proceed.  Do I really have to get all the way down to the level of
>computing jump offsets and implement these as conditional gotos?

You could either use that approach (effectively both compile _and_ assemble
for your runtime "machine", I suppose), or use conditional branches and
store the target addresses into your dictionary (just compile). I've used
the following approach successfully:

Traditionally, Forth implements conditionals using two branch instructions,
call them BRANCH and ZBRANCH. BRANCH is unconditional, the target address
(or offset) is always "jumped to". ZBRANCH only performs the branch if the
current top-of-stack (TOS) is zero. An "if" statement like this:

if (a > b){
stuff;
stuff;
}

Would generate code something like:

a
b
>
ZBRANCH xx
stuff
stuff
(this is location xx)

A more complex case,

if (a > b){
stuff1;
stuff2;
}else{
stuff3;
stuff4;
}

Would generate

a
b
>
ZBRANCH xx
stuff1
stuff2
BRANCH yy
(this is location xx)
stuff3
stuff4
(this is location yy)

A loop like

while (a > b){
stuff5;
stuff6;
}

looks like

(this is location xx)
a
b
>
ZBRANCH yy
stuff5
stuff6
BRANCH xx
(this is location yy)

And so on. One disadvantage to this approach (or maybe not, you haven't
said whether your interpreter scans and runs line-by-line, or does an
entire tokenizing pass, _then_ runs) - you must always scan far enough
ahead to resolve any necessary addresses. In Forth, this is not a problem,
the system traditionally has two states, compiling and executing. While
compiling, it even uses the expression stack to hold both branch target
locations, and locations where the target addresses are to be stored (the
word following the "BRANCH" instruction, essentially). Also, Forth's
conditional words (IF, THEN, ELSE, and so on) actually _execute_ during
compilation, and generate the above structures into the dictionary. They
are called "compiling words" for that reason.

Quote:>And why can't I find any literature about this sort of thing?  All the
>compiler books I can find say 'Here is reverse polish.  Its a good
>intermediate representation because its stack-like', but then they go off
>and do something completely different.

I've often wondered that myself. Interpreters don't usually get much more
than a brief mention, anyway, in most compiler books. I learned how to set
up the above branch implementation by looking at the source code for
various Forth implementations.

Quote:>[I've used conditional jumps with ofsets.  They're ugly, but they're not
>hard
>to do.  The alternative is to have tokens like WHILE  ENDWHILE, IF THEN
>ELSE ENDIF and have the interpreter scan over parts you're skipping and
>keep
>a stack of places to retur tom but that's a lot of scanning and skipping.
>-John]

That would certainly work, though you would end up with more work at
runtime - unless you resolve things during a prepass, at which point you're
pretty-much back to jumps and offsets anyways!.

-Len
--

### RPN as intermediate code?

Quote:>And why can't I find any literature about this sort of thing?  All the
>compiler books I can find say 'Here is reverse polish.  Its a good
>intermediate representation because its stack-like', but then they go off
>and do something completely different.

Try Threaded Interpreted Language (TIL) published by Byte. It is a very
old book (1970's?) discussing the implementation of a language that
looks like Forth. (May be it was Forth's forgotten predecessor?). The
compiled code (as well as the source code) is in RPN. If I remembered
correctly, almost everything, (+, -, if, while, etc) are compiled into
"procedure calls". But I gather performance is not high in your criteria
list. The good thing about TIL is that if you stripped away its editor and
translator, the run-time part that is left is very small.

Nor Jaidi

--

### RPN as intermediate code?

Noel Gorelick writes (on RPN translations):

However, when I get into conditionals and loops, I have absolutely no idea
how to proceed.  Do I really have to get all the way down to the level of
computing jump offsets and implement these as conditional gotos?  Or is
there a more elegant way that I just haven't found yet?

I believe there is one.  All you need is a construct that pushes the
address of a given chunk of code on the execution stack. That address
can then be used by instructions for conditionals, loops etc. E.g.
you would translate

while x <= y do x := x * x

to:

pushc  x y <=
pushc  x x * x:=
while

Cheers

-- Martin Odersky
--

### RPN as intermediate code?

:     x = a*(b+c)             ->  "a b c + * x ="
:     y = sqrt(b*b-4*a*c)     ->  "b b * 4 a * c * - 1 sqrt callfunc y ="
:     z = foo(a,bar(b),c)     ->  "a b 1 bar callfunc c 3 foo callfunc z ="

:     (actually, all the identifiers and constants go into a dictionary
:     and are replaced by a lookup index, but you get the idea...)

: However, when I get into conditionals and loops, I have absolutely no idea
: how to proceed.  Do I really have to get all the way down to the level of
: computing jump offsets and implement these as conditional gotos?  Or is
: there a more elegant way that I just haven't found yet?

Ok I am not a compiler specialist, but I am quiet used to use
the HP48 wich is fully RPN

so I thought liske that

first replace '=' with STO wich would be much more general
like << 1 + >> 'INC' STO to create the INC function
now you could do

z = if b then a*(b+c) else sqrt(b*b-4*a*c)

in

<< b c + a * >> X STO << b b * a c 4 * * - >> Y STO
X Y b IFTE

it is important to understand that the form << >> will not execute
the stuff inside but only define a function
and the IFTE function would be

taking b from the stack if b is true then drop the first element from the
stack and execute the second, if not swap the elements drop execute

Hope this will help

--
Dave
& have a good night

--

### RPN as intermediate code?

> >I am struggling trying to convert my C interpreter from a tree-based
> >design to an intermediate-code design.

> >I now want to move to storing (and executing) some form of intermediate
> >code.  I really like the idea of encoding into an 'execution string', like
> >bc does, and reverse polish works great for this, as long as I stick to
> >encoding expressions.  But this model falls apart when I get into things
> >like conditionals and loops, and I am completely baffled how to do
> >short-circuit logic like this.  For instance, I can compile the following
> >code into the RPN strings shown (which execute easily):

Check out the following web stuff:

ftp://ftp.netcom.com/pub/hb/hbaker/ForthStack.html (also .ps.Z) and
ftp://ftp.netcom.com/pub/hb/hbaker/Use1Var.html  (also .ps.Z)

--
www/ftp directory:
ftp://ftp.netcom.com/pub/hb/hbaker/home.html

--

### RPN as intermediate code?

Try Threaded Interpreted Language (TIL) published by Byte. It is a very
old book (1970's?) discussing the implementation of a language that
looks like Forth. (May be it was Forth's forgotten predecessor?).

It is Forth.  Here's the relevant section of a comp.lang.forth FAQ on books :-

"Threaded Interpretive Languages"  R. G. Loeliger
Byte Books, 1981, ISBN: 0-07-038360-X

This book is out of print but sometimes it is available from
booksellers or used book stores.

Before I read the book I thought that, based on the title, it was
going to examine languages which are associated with threaded
interpretive implementations and also cover different threading
techniques.  I was disappointed to find that it only covers (FIG)
Forth and indirect threading, in fact a more accurate title for the
book would be "How to write an indirect threaded FIG Forth for the
Z80".  That said, the book is easy to read and detailed enough for
someone to implement a Forth from scratch on a Z80.  The bulk of the
book is taken up with Z80 code for the various words which make up
the Forth.  It is not a just listing though, each section is
carefully explained with all inputs/outputs, registers,
... etc. documented.  There are also introductory sections which
explain the concepts being used along with flowcharts (!) for some
routines.  Towards the end of the book around 30 pages are devoted
to an implementation of a Z80 Forth assembler.  Block-based VM,
editors and cross compilation are briefly discussed.

--

### RPN as intermediate code?

The book "TIL..." was written well after FORTH was written, by the
author when he was trying to do his own version of a TIL.  He was
accused of "not understanding FORTH", but in fact he was doing his
own TIL, not FORTH.  Even so, I think his variations on the FORTH
idea did not prove very promising.  This is why you still see
versions of FORTH (although more and more rarely), but almost never
his TIL.

The latest incarnation of FORTH is IEEE 1275 standard for _hardware
independent_firmware. For more info on 1275, see my Web page on Open
Firmware at: ftp://ftp.netcom.com/pub/me/mejohnsn/OFIntro.html.  I
will be re-writing it soon to presume less FORTH knowledge on the
part of the reader.

Matthew Johnson
Sabaki Engineering

--

### RPN as intermediate code?

>The latest incarnation of FORTH is IEEE 1275 standard for _hardware
>independent_firmware.

For most programmers, the ANSI Forth standard (X3.215-1994) might
prove more useful.

Sincerely,
Bob Corbett
--

I am trying to find recent references to P-code, T-code and Uni-Code, all
of which are intermediate code languages. I am somewhat familiar with the
Pascal P machine and P-code although not with what has been done lately,
but I have not heard much about T-code or Uni-Code. It would be greatly
appreciated if anyone knowing about these codes would send me e-mail.
Thank you very much in advance.

Aiqin (Angelina) Pan
Mitsubishi Electric Research Laboratory
Sunnyvale, California

--