Generating Source Code.

Generating Source Code.

Post by dara_gallag.. » Thu, 07 Oct 1999 04:00:00



Hi folks,

   I'd appreciate any pointers/references on the subject of generating
human readable source from an AST. I've looked at some of the source
code pretty printers which are floating around but all of them seem
are relatively simple and are based on lexical analysis. I realize
that "human readable" is a subjective term and that for any particular
language people will have preferences regarding indentation, line
splitting etc.

One issue which I've already resolved to my satisfaction is the
removal of superfluous parenthesis in generated expression code. I've
done this by carrying around "the binding context" (for lack of a
better term) during generation. This context is a measure of how
strongly binding the surrounding operator is; only if the current
operator is less binding (i.e. has a lower precedence) are parenthesis
written around the generated code.  This scheme also supports handing
binary operator associativity by slightly decreasing the "binding
context" for left or right sub-expressions as appropriate.

Some outstanding issues I'm trying to resolve include:
 1) How to split long lines especially for long expressions.
 2) Adding extra parenthesis even if they are not strictly
     necessary from a grammatical point of view but might
     improve the readability.
 3) Neatly abstracting "layout preferences" from the AST
     walking code.

Any pointers/help would be appreciated.
Cheers,

Sent via Deja.com http://www.deja.com/
Before you buy.

 
 
 

Generating Source Code.

Post by David Chas » Tue, 12 Oct 1999 04:00:00



>    I'd appreciate any pointers/references on the subject of generating
> human readable source from an AST. ...

I don't know exactly what is published, but this problem was fiddled
with for years at Rice.  The original language in question was
Fortran, fed either through a vectorizer, parallelizer, or living in a
"programming environment".  We had, at various times, a "pretty
printer", a "Lisp unparser", an "ugly printer" (card image Fortran),
and "incremental pretty printers" (for screen display).  These all
worked from ASTs.  Work on this continued for years after I left; you
might look for work by Warren (Scott, not Joe) or Baker (Don).  I
think Caplinger wrote his dissertation on some aspect of this problem,
but there was years of work that came after that.

Quote:> Some outstanding issues I'm trying to resolve include:
>  1) How to split long lines especially for long expressions.
>  2) Adding extra parenthesis even if they are not strictly
>      necessary from a grammatical point of view but might
>      improve the readability.
>  3) Neatly abstracting "layout preferences" from the AST
>      walking code.

I was wondering, could some of the dynamic-programming stuff from
Knuth's work apply here?  It would seem that there would be penalties
for breaking within a deep nesting, and penalties for badly unbalanced
lines.  Thus, you might want to say

x = a + b + c +
    d + e + f;

rather than

x = a + b + c + d + e +
    f;

I also recall that keeping track of comments, and placing them
nicely, is a big problem.

--

NaturalBridge LLC              --  http://www.naturalbridge.com

 
 
 

Generating Source Code.

Post by Rodney M. Bate » Tue, 12 Oct 1999 04:00:00



> Hi folks,

>    I'd appreciate any pointers/references on the subject of generating
> human readable source from an AST. ...

> One issue which I've already resolved to my satisfaction is the
> removal of superfluous parenthesis in generated expression code. I've
> done this by carrying around "the binding context" (for lack of a
> better term) during generation. This context is a measure of how
> strongly binding the surrounding operator is; only if the current
> operator is less binding (i.e. has a lower precedence) are parenthesis
> written around the generated code.  This scheme also supports handing
> binary operator associativity by slightly decreasing the "binding
> context" for left or right sub-expressions as appropriate.

> Some outstanding issues I'm trying to resolve include:
>  1) How to split long lines especially for long expressions.
>  2) Adding extra parenthesis even if they are not strictly
>      necessary from a grammatical point of view but might
>      improve the readability.
>  3) Neatly abstracting "layout preferences" from the AST
>      walking code.

I have prettyprinters for several languages going back a long way.
They parse and prettyprint the entire language, including expressions,
designators, etc.  The general idea is like this:

For each syntactic construct, the formatting routine tries to display
it "horizontally", i.e. all on one line, given a starting position on
the current line.  If it won't fit, then it is displayed "vertically",
i.e. on multiple lines.  Each construct has its own rule for vertical
formatting.  Either way, the nested constructs are formatted in their
own way.

The kicker is that the starting position has to come from the parent
construct, and the parent formatting routine doesn't know this yet,
until it has seen how much space the children constructs will require.

In my old prettyprinters, there is no AST built.  It is done on the
fly, using recursive descent parsing and formatting.  So each routine
tentatively formats its construct vertically, passing down start
positions to child constructs as determined by vertical formatting.
The childen routines do the same, but also return the number of
characters their construct whould take if formatted horizontally.

The parent must, of course, compute this same number for its own
construct, to return to its caller.  But it uses it to make a late
decision to change to horizontal formatting.

A consequence of this approach is that if a construct is horizontal,
then so are all its descendents.  This precludes "L-formatting", e.g.:

CallProcedureWithALongName ( ThisIsTheFirstActualParameter
                           , ThisIsTheSecontActualParameter
                           , ThisIsExpressionTermOne
                             + AndHereIsExpressionTermTwo
                           )
Sometimes not using L-formatting takes more lines.

The details of representing the speculatively-formatted stuff get a
bit messy.  If you felt you could just let the entire program
accumulate in in-memory data structure at once, it would be easy, but
I didn't think I could afford that at the time.  With today's
real/virtual memory sizes, it might be different.

There are a number of additional problems.  There is also a "filled"
formatting style for lists of things (e.g. enumeration literals in an
enumeration type).  Each line is filled with list items like filling a
line with words in a word processor, except that an item is not
necessarily a single word, e.g. an actual parameter.  Occasionally, I
have rules to fill certain pieces of a construct which are not a
variable length list, but fixed syntax.

To this day, whether formulating rules for automated prettyprinting or
manually formatting code, I still can't decide whether I like the
filled format.  All vertical seems pretty extravagant, but filled
seems pretty hard to read for things like parameter lists.

I sometimes have a few constructs which are always vertical, e.g. a
module/package/compilation unit, a procedure body, etc.

Comments and blank lines are another complication.  I have some
convoluted rules that distinguish different kinds of comments by the
way they appear in the input.  They do a passible job some of the
time, not what I really want other times.

Recently, I have done a little of this from an AST.  This is a lot
easier to implement, because the node attribute
DisplayedLengthAssumingHorizontal can be first computed bottom up,
then the formatting done in the obvious way, emitting output directly.
There is no need to do speculate-then-change-your-mind formatting,
with its messy data structure.

The method you mentioned for inserting parentheses just where they are
needed to preserve evaluation order works fine.  You can also have a
one-child AST node which means "Enclose in parentheses whether they
are needed or not" to preserve places where input code had them (or,
you decided they would improve readability).

Gnat uses a different method.  Instead of introducing extra AST nodes
for added parentheses, every expression node has a field which is the
number of pairs of parentheses around this subexpression.  However, it
is packed into two bits, so more than three pairs will get lost.  I
agree with the designers that this is not a big loss.

I have experimented with designing a "format syntax" notation.  Basic
format syntax gives, for each AST node, the delimiters which need to
be inserted before/between/after the formattings of the children of
the node.  By itself, it only tells how to construct a token stream.

You can add LineBreak symbols (with relative indentations) to a format
syntax rule to show the vertical formatting.  A propery of the rule
can tell if it is always vertical, horizontal if possible, or filled.
You can also allow bracketing of substrings of the token stream
defined by the rule and let these have different formatting
properties.

I have designed such a metalanguage, but I did not implement it as
such, as these have all been hurry-up jobs.  Instead, I wrote a few
procedures with very short names corresponding to the formatting
symbols. Hard coding of the formatting for a particular AST node, with
calls on these routines, looks something like the rule would look, if
written in the format syntax.

Two political problems:

1) Programmmers often have extremely strong feelings about formatting
(myself included), and

2) It would be rare indeed to find any two whose formatting
preferences agree.

All the better reason to abstract the formatting rules.

Rodney Bates

 
 
 

Generating Source Code.

Post by Craig Smit » Tue, 12 Oct 1999 04:00:00



>  2) Adding extra parenthesis even if they are not strictly
>      necessary from a grammatical point of view but might
>      improve the readability.

I don't know what language you are generating code for, but if it were
C/C++ I'd make a couple of rules based on precedence. Some people,
myself included, aren't entirely familiar with the precedence rules.
I therefore tend to add parenthesis when the precedence isn't clear (I
believe John Ousterhoot was the first person I heard say this).  Some
simple rules for this would be easy to add for a AST for C.  Take for
instance the dereference operator, if the expression is very simple
parenthesis are not necessary:

int * p = ...;
*p = 5

If the expression is more complex then, add parens:

*(p->Pattern_ref()->Slot(2))

Similarly for the address-of operator '&'.

But you probably already thought of this already!

(raig

--
Craig Smith              'Discussion is an exchange of knowledge;
Mitarbeiter, Informatik   argument an exchange of ignorance.'
University of Trier                    - Robert Quillen
Trier, Germany            www.informatik.uni-trier.de/~smith

 
 
 

Generating Source Code.

Post by Christopher W. Milne » Tue, 12 Oct 1999 04:00:00



>    I'd appreciate any pointers/references on the subject of generating
> human readable source from an AST.
[snip]
> One issue which I've already resolved to my satisfaction is the
> removal of superfluous parenthesis in generated expression code.

Try  "Unparsing Expressions with Prefix and Postfix Operators"
by Norman Ramsey
http://www.eecs.harvard.edu/~nr/pubs/unparse-abstract.html

chris milner
UVA Grad Student

 
 
 

Generating Source Code.

Post by Anton Er » Thu, 14 Oct 1999 04:00:00



Quote:>I was wondering, could some of the dynamic-programming stuff from
>Knuth's work apply here?

Yes, he mentions that in the "Breaking Paragraphs into Lines" chapter
in Digital Typography (around page 97).

- anton
--
M. Anton Ertl                    Some things have to be seen to be believed

http://www.complang.tuwien.ac.at/anton/home.html

 
 
 

Generating Source Code.

Post by Scott Stanchfiel » Thu, 14 Oct 1999 04:00:00




Quote:> For each syntactic construct, the formatting routine tries to display
> it "horizontally", i.e. all on one line, given a starting position on
> the current line.  If it won't fit, then it is displayed "vertically",
> i.e. on multiple lines.  Each construct has its own rule for vertical
> formatting.  Either way, the nested constructs are formatted in their
> own way.

> The kicker is that the starting position has to come from the parent
> construct, and the parent formatting routine doesn't know this yet,
> until it has seen how much space the children constructs will require.

Seems to me that a nice way to deal with nesting would be to use the
LayoutManager approach taken in Java's AWT libraries (a strategy
pattern).

If we were to come up with a "layout manager" for each type of
construct, it could perform layout on its children, asking them for
their "preferred size". The children have layouts as well, which
return preferred size based on _their_ children and so on.

This type of strategy leads to very simple layout manager design
(ain't recusrion grand!) _and_ ease of changing layout strategies. You
can simply plug in different layout strategies for the AST nodes to
change layout behavior for different user styles.

This type of thing has been done in the Swing text framework.

Take a look at the Swing text framework (see
http://www.theswingconnection.com for some details). They use an
Element/View representation to display the data. Elements map nicely to
ASTs (well, they really _are_ an AST of the text's layout). Views are
like layout managers, showing how to display the data on the screen.
There's built-in help for tweaking display (splitting and combining
lines as necessary).

The Swing text framework is rather complex, but designed for just this
type of work to display a document.

You could use this as a model for the pretty printer, or perhaps create
a set of Views that write code to a file.

Hope this spawns some interesting discussion...

--
-- Scott

jGuru Training by MageLang Institute      http://www.jGuru.com
VisualAge for Java Tips --> http://www.jGuru.com/thetick/visualage/tips

 
 
 

Generating Source Code.

Post by H. Ellenberge » Fri, 15 Oct 1999 04:00:00



> [...] Two political problems:

> 1) Programmmers often have extremely strong feelings about formatting
> (myself included), and
> 2) It would be rare indeed to find any two whose formatting
> preferences agree.

OK, but some companies require their staff to use a set of rules or
a specific prettyprinter.

Is the situation in your country the same?

Quote:> All the better reason to abstract the formatting rules.

Yes.
HE
 
 
 

Generating Source Code.

Post by Christian Stapfe » Fri, 22 Oct 1999 04:00:00



Quote:>    I'd appreciate any pointers/references on the subject of generating
> human readable source from an AST. I've looked at some of the source
> code pretty printers which are floating around but all of them seem
> are relatively simple and are based on lexical analysis. I realize
> that "human readable" is a subjective term and that for any particular
> language people will have preferences regarding indentation, line
> splitting etc.

Have you seen Philip Wadler's recent paper "Prettier Printing?" (see:
http://www.bell-labs.com/~wadler/recent.html), and Vance Maverick's
dissertation "Presentation by Tree Transformation" (University of
California, Berkeley Report CSD-97-947, see:
http://sunsite.berkeley.edu/NCSTRL)?

True, Maverick's dissertation concerns a more general problem: that of
laying out and editing tree-structured (parsed) documents. But your
problem is a special case and his paper seems to me to contain some
interesting ideas (especially as regards "abstracting out layout
preferences") as well as useful pointers to other papers.

Christian Stapfer

 
 
 

Generating Source Code.

Post by Sara Henderso » Thu, 28 Oct 1999 04:00:00


Quote:>David Chase wrote
>I also recall that keeping track of comments, and placing them
>nicely, is a big problem.

I appologize in advance if these ideas have already been adressed in
the book you refered to (Knuth), but I don't have that book handy and
I wanted to bounce this idea back as quickly as possible to let the
group run with it.

There would of course be no way to systematically format comments for
all source code, because their content, meanings, and associations are
in no way specified by the language (unlike every other contruct in
the language).  BUT, most people DO use their own comment formatting
rules, because it would be very hard to make sense of comments with no
order or clear visual relation to them.  So here is my idea:

Tie the commment construct to either the preceeding or following syntactic
element based on both what the elements are and what the input looked liked.
Example of a potential rules:
Comments that share lines with other constructs -
    1.  Any comment (// or /**/) that is on the same line as another
construct should be tied to the end of the other construct.
    2.  Any (/**/) comment that begins a line that also contains another
construct should be tied to the beginning of that construct.
    3.  Not sure what to do about comments wedged between 2 constructs on
the same line, but it could be evident be obvious in certain situations
based on white space or on the nature of the 2 surrounding constructs.
Comments(or comment blocks) that are on their own line -
    4.  Any comment block that resides on a line or lines by itself would
be analized by the blank lines around it:  a)blank line above - attact to
follwing construct, b)blank line below - attach to above construct,
c)engulfed in white space - make it a free floating comment (preserve the
white space gap in pretty printouts), d)no blank lines - must resort to
rules of preference for each construct(see below).

As you can see, there are some cases where the rules are quite
obvious, and others that are actually only valid because most
programmers follow some standard conventions.  One such example would
be the evaluation of comments that are equadistant from two constructs
(3 and 4d).  One example of standard practice is that if the two
constructs are function bodies or prototypes, the comment probably
belongs to the construct that follows it (because we almost always
comment BEFORE functions).

Well...there's my 2.5 cents on the matter.

Keith McAfee

[I've seen systems that do that.  If you have programmers that format
comments in a reasonably systematic way, it works. -John]

 
 
 

1. why signal sample source code generates exception (VxWorks 5.4 for PPC)

HI,

I use Tornado II and VxWorks 5.4 for PPC. Why the following signal sample source code generates exception, no matter running in VxSim or running on an actual MPC860 board?

/* includes */
#include "vxWorks.h"
#include "signal.h"

/* forward declarations */
static void mysigHandle (int);
void receivSig();
void sendSig();

static int rtaskid;

void mysignal()
{
   rtaskid = taskSpawn("receivSig",100,0,4000,receivSig,0,0,0,0,0,0,0,0,0,0);

void receivSig()
{
  int n;

  logMsg("enter rec task.\n");

  if(signal(SIGUSR1,mysigHandle)!=0)
     {
          logMsg("sigaction failed.\n");
     }

   taskSpawn("sendSig",90,0,4000,sendSig,0,0,0,0,0,0,0,0,0,0);

     /*wait somewhile then exit*/
        n=10;
     while(n-->0)
     {
             logMsg("exitting rec task...\n");
     }

void sendSig()
{
    int id;

    logMsg("enter send task.\n");

    if(kill(rtaskid, SIGUSR1) != 0)
     {
          logMsg("send signal failed.\n");
     }

    logMsg("exit send task.\n");

static void mysigHandle (int signo)
{
        logMsg("got the signal!\n");

2. Novell 3.11 and Y2k issue

3. Generating listing files containg source and assemby code

4. Urgent: New worm threat upgraded - NOT FIZZER

5. Looking for source code for generating intermediate representation

6. trend pc-cillin 6.0

7. Source Code! Source Code!

8. Networking: How to login to remote computer using VB.Net

9. Data Recovery SOURCE CODE ( SOURCE CODES of Professional Data Recovery Software )

10. Strange string generated and included into unidrv generated PCL code

11. embedding java code in auto generated code

12. Code Generation Not generating VB Class Code.

13. Sources Wanted : Audio source coding (perceptive)