16 Jun, 2010, JohnnyStarr wrote in the 1st comment:

Votes: 0

I'm at the point in a simple interpreter project to start the parsing process.
After playing with Ruby (to get a feel for things), I'm now using C++ as my implementation language.
I've written a Lexer that scans the source code, looks for comments, lexical errors like unfinished strings etc,
and creates valid tokens with values.

Heres an output of the following code:

** comments in my language
name = "John"
print("hi " + name)

+————–+
Line: (  1) |      NEWLINE |
Line: (  2) |   IDENTIFIER | val: name
Line: (  2) |   ASSIGNMENT |
Line: (  2) |      LITERAL | val: John
Line: (  2) |      NEWLINE |
Line: (  3) |   IDENTIFIER | val: print
Line: (  3) |    OPENPAREN |
Line: (  3) |      LITERAL | val: hi 
Line: (  3) |         PLUS |
Line: (  3) |   IDENTIFIER | val: name
Line: (  3) |   CLOSEPAREN |
Line: (  3) |      NEWLINE |
            +————–+

The Lexer was the easy part, now I need to parse the token list and analyze the grammar of the syntax.
I've been reading up on Backus-Naur Form and
like what I'm seeing. Before creating the AST, you would have to know what you are adding. Most resources online
have focused on Bison/Yacc. I'm wanting to use NFB or ENFB as a guideline in the "rules" section of my parser. I'm hoping some of you might have experience with this in C++? Or at least Java to give me an idea of how you have / would define the rules of the language.

16 Jun, 2010, David Haley wrote in the 2nd comment:

Votes: 0

I'm not sure what the question is. Are you asking how to write context-free rules to parse a C-like language? Try this course website, and in particular this lecture.

16 Jun, 2010, Runter wrote in the 3rd comment:

Votes: 0

David Haley said:

I'm not sure what the question is. Are you asking how to write context-free rules to parse a C-like language? Try this course website, and in particular this lecture.

+1. I found the lecture well designed.

16 Jun, 2010, JohnnyStarr wrote in the 4th comment:

Votes: 0

That was very helpful when it comes to getting the concept down. I suppose what I was trying
to ask though, is what kind of data structure in C++ you would use, or have used? Might one
define a function for each rule? Or would it be less overhead to define the rules in some sort
of table, and then have a generic function that steps through each field in a specified rule?

I even thought of parsing a string full of rules to generate a table with numeric numbers only:

ifRule = "if-statement := <if> <expression> | <list> <block> | <else> <block> <end>" //might convert to:
[12, 2, -1, 5, 9, -1, 13, 9, 14] // -1 representing 'OR'

Hope that explains what I'm looking for this time. :redface:

16 Jun, 2010, David Haley wrote in the 5th comment:

Votes: 0

Whether or not you're defining functions per rule depends on the parser generator tool you're using. In yacc/bison, you write one rule per disjunct of a rule. This is in fact rather common in parser generators. You would never see numbers like that yourself; if you went down that road you'd be essentially trying to write your own parser generator (hint: you probably don't want to do that).

I use a fairly straightforward set of data structures that mirror rather closely the syntax tree. I'd have an abstract 'Statement', which can be specialized to 'WhileStatement', 'IfStatement', 'ExpressionStatement', etc. There'd be the 'Expression', which can be a 'FunctionCall', 'BinaryExpression', 'UnaryExpression', and so on.

16 Jun, 2010, JohnnyStarr wrote in the 6th comment:

Votes: 0

So, when using Bison, do you generate a C file that you could paste into C++, or do you link Bison in your host apps makefile?

16 Jun, 2010, David Haley wrote in the 7th comment:

Votes: 0

Bison can generate C++ files, although there are some subtleties to be aware of. Google for "bison C++"; the first result explains some of the issues.

17 Jun, 2010, Tyche wrote in the 8th comment:

Votes: 0

JohnnyStarr said:

ifRule = "if-statement := <if> <expression> | <list> <block> | <else> <block> <end>" //might convert to:
[12, 2, -1, 5, 9, -1, 13, 9, 14] // -1 representing 'OR'

Hope that explains what I'm looking for this time. :redface:

I don't know enough about your language or target to say what sort of data structure would be useful.
The table of numbers above doesn't buy you anything more than you've already done in lexical transformation into tokens.
The grammar above is ambiguous, well even erroneous regardless of the notation.
Concentrate on refining/defining that. I'd do it from the bottom up.

Do you have functions, types, objects?
If you are doing an interpreter, sometimes the simplest structure is simply using the calling programs stack as an AST, but that depends on the complexity of the language.

18 Jun, 2010, Igabod wrote in the 9th comment:

Votes: 0

JohnnyStarr said:

I even thought of parsing a string full of rules to generate a table with numeric numbers only:

hehe numeric numbers? maybe you meant numeric values.

18 Jun, 2010, David Haley wrote in the 10th comment:

Votes: 0

lol he made a typo rofl let's talk about it for a while.

Random Picks

The Burning Eye

Forgotten Reign

Escape From Destiny