Simple lex v. yacc question

Simple lex v. yacc question

Post by s.. » Tue, 19 Jan 1999 04:00:00



I have never used either lex or yacc, but I read the documentation
for both. I have no idea which is better suited for my task though.
Basically, I have a simple file format in which I have multiple items,
each with different attributes. Each item is bounded by { }, and the
attributes are not all necessarily present. They are case-insensitive
and predefined. Comments are permitted. For example:

% foo
{ NAME = foo; COLOR=bar; QUANTITY=3} {name=fubar} {name =barfoo;;; COLOR=blue}

Should I use lex and or yacc?
Thank you very much

        Joseph

                                                           ---
                                                joseph p turian

"I must Create a System or be enslav'd by another Man's.
I will not Reason & Compare: my business is to Create."
---William Blake

 
 
 

Simple lex v. yacc question

Post by Calum Mitchel » Tue, 19 Jan 1999 04:00:00



> I have never used either lex or yacc, but I read the documentation
> for both. I have no idea which is better suited for my task though.

You might want to re-read it.

Quote:> Should I use lex and or yacc?

You should use both. lex is used to split your input stream (file)
into tokens which yacc then matches against the grammer you define
to represent your file format.

I found the lex & yacc documentation a bit difficult to understand,
it isn't very clear. There is a book published by O'Reilly that
covers these programs: lex & yacc, by Levine, Mason & Brown. It's
excellent.

Calum Mitchell.

 
 
 

Simple lex v. yacc question

Post by Erik OShaughness » Tue, 19 Jan 1999 04:00:00


The answer is: both.

Lex is used to parse your file into tokens. Yacc is used to assign
meaning to the tokens, once they have been identified.  I highly
recommend the O'Reilly book "lex & yacc" for a thorough treatment of
how to use these two very useful tools (and their random variants).

ISBN 1-56592-000-7

regards
ejo


> I have never used either lex or yacc, but I read the documentation
> for both. I have no idea which is better suited for my task though.
> Basically, I have a simple file format in which I have multiple items,
> each with different attributes. Each item is bounded by { }, and the
> attributes are not all necessarily present. They are case-insensitive
> and predefined. Comments are permitted. For example:

> % foo
> { NAME = foo; COLOR=bar; QUANTITY=3} {name=fubar} {name =barfoo;;; COLOR=blue}

> Should I use lex and or yacc?
> Thank you very much

>    Joseph

--
Erik O'Shaughnessy:eriko at austin.ibm.com:[838|678]-2622:bldg 905/7F-18
Performance Design Analysis Tools:Division Formerly Known As RS/6000:IBM
 
 
 

Simple lex v. yacc question

Post by Mike Cast » Tue, 19 Jan 1999 04:00:00




Quote:>I found the lex & yacc documentation a bit difficult to understand,
>it isn't very clear. There is a book published by O'Reilly that
>covers these programs: lex & yacc, by Levine, Mason & Brown. It's
>excellent.

I also liked the Flex and Bison documentation for the GNU projects
replacements for Lex and Yacc.  Check out www.gnu.org.

mrc

 
 
 

Simple lex v. yacc question

Post by Aaron Cra » Wed, 20 Jan 1999 04:00:00




> I have never used either lex or yacc, but I read the documentation
> for both. I have no idea which is better suited for my task though.

It is not appropriate to post the same question to a Unix group and to a GNU
group, merely substituting `flex' for `lex' and `bison' for `yacc'.  If the
question is interesting to both groups, crosspost to both; even better, try
to find a single group where it is on-topic.  And if this is a homework
assignment, you'll learn far more if you attempt to complete it yourself.

--

 
 
 

Simple lex v. yacc question

Post by R. Allen Conwa » Thu, 21 Jan 1999 04:00:00


Well, I haven't read Levine, Mason & Brown, though I have read a presumably
older one by the same authors minus Levine, which I didn't find very good.
Not that I didn't learn anything, quite the contrary, really. But what
really troubles me is ...
1. What's the best approach to follow in distributing the work between Lex
and Yacc? As much lexical analysis with as rich a set of lexical tokens as
possible, and grammar rules specifically based on these tokens (decimal,
octal, hex numbers, telephone number, pathname, filename, boolean, ...)?
Or ... as few tokens as you can get away with and checks in the Yacc stuff
to see that what you have is logically correct?
2. What is the best way to tackle error detection and reporting?
3. Is the use of context via <state> labels a good, a bad , the only
(pragmatic) way to program your lexical analysis?

It seems to me that the rich set of lexical tokens makes error reporting
more difficult because you have so many more situations you have to
explicitly set out in your Yacc code just so as to be able to say "No, a
telephone number can't be a hexadecimal number".

These remain unanswered questions for me : my point of view seems to change
with every program!

Another point ... I find the default Make rules rather dangerous because of
the file rename operations - having a prog.c, prog.y and a prog.l is a
recipe for disaster which isn't made very obvious, at least in the
documentation I've looked at.



>I found the lex & yacc documentation a bit difficult to understand,
>it isn't very clear. There is a book published by O'Reilly that
>covers these programs: lex & yacc, by Levine, Mason & Brown. It's
>excellent.

>Calum Mitchell.

 
 
 

Simple lex v. yacc question

Post by Calum Mitchel » Thu, 21 Jan 1999 04:00:00


"R. Allen Conway" wrote:

> 2. What is the best way to tackle error detection and reporting?

I found error handling to be the most problematic area of using
Lex & Yacc. I handle syntax errors (bad tokens) in the lexer
and grammatical errors (invalid constructs) in the parser.
I added error handling after I had written the lexer and parser
and I had to completely change the way the lexer worked in
order to output a meaningful error message.

> 3. Is the use of context via <state> labels a good, a bad , the only
> (pragmatic) way to program your lexical analysis?

I'd say the only way, but the matter is complicated by the lack
of support for exclusive start states with some implementation
of Lex. You can work around this problem but it would be better
if you didn't have to.

Here's how I wrote a recent config file parser:

The config file

#
# APF CONFIGURATION FILE
#

APF        A    001    "TEST APF REGION"      # comment
DPROGS     50
DCONNS     50
SIGNAL     15
LOGFILE    100
PORT       6001
PROGRAM    TTAPF001  N   N    "WRIT PROGRAM"
PROGRAM    TTAPF002  N   N    "READ PROGRAM"
CONNECTION 0001 TTAPF001 TTAPF002 0 0 N "001 TO 002"

lexer.l

%{

/********************************************************************/

#include        <string.h>
#include        "common.h"
#include        "y.tab.h"
#include        "config.h"

char    linebuf[500]    = "";
int     lineno          = 0;
int     tokenpos        = 0;
int     showerror       = 1;

static  int first_time  = 1;

%}

%s N
%s P

line            .*
comment         #.*
whitespace      [ \t]
number          [0-9]+/[ \t\n]
word            [A-Za-z0-9]*
string          [\"][^\"\n]*[\"]|['][^'\n]*[']
badstring       ['\"][^'\"\n]*$
eol             \n

%%
%{
        /*
         * Start in normal state, see "lex & yacc", p173.
         */
        if (first_time) {
                BEGIN N;
                first_time = 0;
        }
%}

<N>{line}         {

                /*
                 * Match the entire line so we can
                 * copy it into a buffer and use it
                 * when displaying error messages.
                 * Put the entire line back with yyless
                 * and switch to P (parse) state to parse
                 * the line.
                 */

                strcpy(linebuf, (const char *)yytext);
                tokenpos        = 0;
                showerror       = 1;
                lineno++;
                yyless(0);
                BEGIN P;

                }

<N>{eol}  { /* blank line */; }

<P>{comment}      ;
<P>{whitespace}   tokenpos += yyleng;

<P>{eol}  {

                /*
                 * Switch out of P (parse) state
                 * into N (normal) state so we can
                 * match the next line.
                 */

                BEGIN N;
                return 0;

                }

<P>APF            { tokenpos += yyleng;   return APF;             }
<P>DPROGS { tokenpos += yyleng;   return DPROGS;          }
<P>DCONNS { tokenpos += yyleng;   return DCONNS;          }
<P>SIGNAL { tokenpos += yyleng;   return SIGNAL;          }
<P>LOGFILE        { tokenpos += yyleng;   return LOGFILE;         }
<P>PORT           { tokenpos += yyleng;   return PORT;            }
<P>SOCKET { tokenpos += yyleng;   return PORT;            }
<P>PROGRAM        { tokenpos += yyleng;   return PROGRAM;         }
<P>CONNECTION     { tokenpos += yyleng;   return CONNECTION;      }

<P>{number}       { yylval.string = strdup((const char *)yytext);
                  tokenpos += yyleng;   return NUMBER;  }
<P>{word} { yylval.string = strdup((const char *)yytext);
                  tokenpos += yyleng;   return WORD;            }
<P>{string}       {

                yylval.string = strdup((const char *)yytext+1);
                yylval.string[yyleng-2] = '\0';
                tokenpos += yyleng;    
                return STRING;

                }      

<P>{badstring}    {

                yyerror("syntax error");
                tokenpos += yyleng;    
                return BADSTRING;

                }

<P>.              { tokenpos += yyleng;   return yytext[0];       }

%%

yyerror(char *s)
{
        static  char linenobuf[10];

        if (showerror) {
                sprintf(linenobuf, "%d: ", lineno);
                errorf("%s%s\n", linenobuf, linebuf);
                errorf("%*s\n", 1+tokenpos+strlen(linenobuf), "^");
                errorf("error: %s\n\n", s);
                showerror = 0;
        }
        parse_error();

}

void myerror(char *s)
{
        static  char linenobuf[10];

        sprintf(linenobuf, "%d: ", lineno);
        errorf("%s%s\nerror: %s\n\n", linenobuf, linebuf, s);
        showerror = 0;
        parse_error();

}

/********************************************************************/

parser.y

%{

/********************************************************************/

#include        <stdio.h>
#include        "apfdefs.h"
#include        "config.h"

#define DPROGLEN                4
#define DCONNLEN                5
#define CONNIDLEN               4
#define SIGNALLEN               2
#define LOGFILELEN              5
#define PORTLEN                 5

extern void myerror(char *s);

static  char err[256];

struct opt_prog_args_s
{
unsigned short  traceTblSize;
unsigned short  maxLogFileRecSize;

} prog_args;

char    identifier[256];

%}

%union {
        char *string;

}

%token APF DPROGS DCONNS SIGNAL LOGFILE PORT PROGRAM CONNECTION
%token <string> STRING WORD NUMBER BADSTRING

%%

entry:          blank
        |       apf
        |       dprogs
        |       dconns
        |       signal
        |       logfile
        |       port
        |       program
        |       connection

blank:          /* empty */                     ;
        |       error                           { yyerror("syntax error"); }

apf:            APF WORD NUMBER STRING
        {
                char            id;
                unsigned short  configGen;
                char            name[NAMELEN+1];
                int             ok = 1;

                if (strlen($2) > 1) {
                        myerror("invalid region id");
                        ok = 0;
                } else
                        id = $2[0];

                configGen = atoi($3);

                if (strlen($4) > NAMELEN) {
                        myerror("invalid region name");
                        ok = 0;
                } else
                        strcpy(name, $4);

                if (ok)
                        if (add_apf(err, id, configGen, name) != 0)
                                myerror(err);

                free($2);
                free($3);
                free($4);
        }

dprogs:         DPROGS NUMBER
        {
                unsigned short  maxDynProg;
                int             ok = 1;

                if (strlen($2) > DPROGLEN) {
                        myerror("invalid number of dynamic programs");
                        ok = 0;
                } else
                        maxDynProg = atoi($2);

                if (ok)
                        if (add_dprogs(err, maxDynProg) != 0)
                                myerror(err);

                free($2);
        }

dconns:         DCONNS NUMBER
        {
                unsigned short  maxDynConn;
                int     ok = 1;

                if (strlen($2) > DCONNLEN) {
                        myerror("invalid number of dynamic connections");
                        ok = 0;
                } else
                        maxDynConn = atoi($2);

                if (ok)
                        if (add_dconns(err, maxDynConn) != 0)
                                myerror(err);

                free($2);
        }

signal:         SIGNAL NUMBER
        {
                char    sig;
                int     ok = 1;

                if (strlen($2) > SIGNALLEN) {
                        myerror("invalid signal number");
                        ok = 0;
                } else
                        sig = atoi($2);

                if (ok)
                        if (add_signal(err, sig) != 0)
                                myerror(err);

                free($2);
        }

logfile:        LOGFILE NUMBER
        {
                unsigned short  maxBlocks;
                int             ok = 1;

                if (strlen($2) > LOGFILELEN) {
                        myerror("invalid logfile value");
                        ok = 0;
                } else
                        maxBlocks = atoi($2);

                if (ok)
                        if (add_logfile(err, maxBlocks) != 0)
                                myerror(err);

                free($2);
        }

port:  
                PORT NUMBER
        {
                unsigned short  port;
                int             ok = 1;

                if (strlen($2) > PORTLEN) {
                        myerror("invalid port number");
                        ok = 0;
                } else
                        port = atoi($2);

                if (ok)
                        if (add_port(err, port) != 0)
                                myerror(err);

                free($2);
        }

program:        PROGRAM error
                { yyerror("syntax error"); }
        |       PROGRAM identifier WORD WORD STRING opt_prog_args
        {
                char    id[PROGIDLEN+1];
                char    notifyOpenFlag;
                char    traceFlag;
                char    name[NAMELEN+1];
                int     ok = 1;

                if (strlen(identifier) > NAMELEN) {
                        myerror("invalid program identifier");
                        ok = 0;
                } else
                        strcpy(id, identifier);

                if (strlen($3) > 1 || ($3[0] != 'Y' && $3[0] != 'N')) {
                        myerror("invalid notify flag");
                        ok = 0;
                } else
                        notifyOpenFlag = $3[0];

                if (strlen($4) > 1 || ($4[0] != 'T' && $4[0] != 'N')) {
                        myerror("invalid trace flag");
                        ok = 0;
                } else
                        traceFlag = $4[0];

                if (strlen($5) > 25) {
                        myerror("invalid program name");
                        ok = 0;
                } else
                        strcpy(name, $5);

                if (ok)
                        if (add_program(err, id, notifyOpenFlag,
                                traceFlag, name,
                                prog_args.traceTblSize,
                                prog_args.maxLogFileRecSize) != 0)
                                myerror(err);

                free($3);
                free($4);
                free($5);
        }

opt_prog_args:  
                /* empty */
        {
                prog_args.traceTblSize          = 0;
                prog_args.maxLogFileRecSize     = 0;
        }
        |       NUMBER
        {
                myerror("Record size must accompany trace table size");
                YYERROR;
        }
        |       NUMBER error
        {
                YYERROR;
        }
        |       NUMBER NUMBER
        {
                prog_args.traceTblSize          = atoi($1);
                prog_args.maxLogFileRecSize     = atoi($2);

                free($1);
                free($2);
        }
        ;

connection:     CONNECTION identifier WORD WORD NUMBER NUMBER WORD STRING
        {
                unsigned short  id;
                char            acceptProgid[PROGIDLEN+1];
                char            initProgid[PROGIDLEN+1];
                char            acceptConnidx;
                char            initConnidx;
                char            traceFlag;
                char            name[NAMELEN+1];
                int             ok = 1;

                if (strlen(identifier) > CONNIDLEN) {
                        myerror("invalid connection identifier");
                        ok = 0;
                } else {
                        int     i, len = strlen(identifier);

                        for (i=0; i < len; i++)
                                if (!isxdigit(identifier[i])) {
                                        myerror("invalid connection identifier");
                                        return parse_error();
                                }

                        id = strtol(identifier, (char **)NULL, 16);
                }

                if (strlen($3) > PROGIDLEN) {
                        myerror("invalid acceptor program identifier");
                        ok = 0;
                } else
                        strcpy(acceptProgid, $3);

                if (strlen($4) > PROGIDLEN) {
                        myerror("invalid initiator program identifier");
                        ok = 0;
                } else
                        strcpy(initProgid, $4);

                acceptConnidx   = atoi($5);
                initConnidx     = atoi($6);

                if (strlen($7) > 1 ||
                    ($7[0] != 'A' && $7[0] != 'B' &&
                     $7[0] != 'C' && $7[0] != 'N')) {
                        myerror("invalid trace flag");
                        ok = 0;
                } else
                        traceFlag = $7[0];

                if (strlen($8) > NAMELEN) {
                        myerror("invalid connection description");
                        ok = 0;
                } else
                        strcpy(name, $8);

                if (ok)
                        if (add_connection(err, id, acceptProgid,
                                initProgid, acceptConnidx, initConnidx,
                                traceFlag, name) != 0)
                                myerror(err);

                free($3);
                free($4);
                free($5);
                free($6);
                free($7);
                free($8);
        }

identifier:     WORD
        {
                strcpy(identifier, $1);
                free($1);
        }
        |       NUMBER
        {
                strcpy(identifier, $1);
                free($1);
        }
        ;

%%

extern  FILE *yyin;

/*
main(int argc, char **argv)
{
        if (argc > 1) {
                FILE    *file;

                file = fopen(argv[1], "r");
                if (!file) {
                        perror(argv[1]);
                        exit(1);
                }
                yyin = file;
        }

        while (!feof(yyin)) {
                yyparse();
        }

}

*/

/********************************************************************/

 
 
 

1. Lex (Flex) and Yacc (or Bison) questions.

I am not quite sure if these are the right newsgroups for these questions, but
I cannot find any Lex or Yacc (or even bison) Newsgroups.

I would like to know if it is possible to redirect the input while you are
parsing a grammar (like if you were parsing an include command for example)

foo bar command etc...
#include "foo.text"
other command etc...

For example in the previous example, I want to parse everything (let say from
stdin) before the #include, and as soon as I find the include, I want to
continue to parse from the file foo.text. Upon EOF, I want to switch back to
the stdin. If this is possible, I would appreciate an example.

Another related question, is it possible to parse more than one grammar in the
same executable?  

Last question... Is it possible to use call yyparse for a subset of a grammar?
or to explicitly call the parser on one specific rule?

Please, answer by e-mail, as I do not read this newsgroup regularly.

Thanks in advance,
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Francois Felix INGRAND                   "Read my Lisp... No new syntax" (nil)

2. Very strange trailing slash problem

3. LEX / YACC yywrap() question

4. New Linux site for Newbies and Vets

5. Followup YACC & LEX Question

6. using tar with a SCSI disk and tape off the same controller

7. YACC and LEX question

8. Yast2 completely wiped out my homedir!

9. lex/yacc question

10. lex, yacc question

11. Yacc & Lex Question

12. lex/yacc/cc/unix question

13. YACC and LEX question