🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Compiler design: assignment and lvalues

Started by
6 comments, last by Kylotan 19 years, 9 months ago
Hi, I'm writing a bytecode-interpreted scripting language for fun and education and have trouble implementing the parsing/compiling of assigments cleanly. Currently the language only supports assignment of form
 <identifier> = <expression> 
and this is clearly rather limiting, for example, you can't assign into an array element. The problem in allowing more complex expressions as the left-hand side is that reading from and writing to an lvalue must be compiled differently, and the parser can't know whether an expression is an assignment until it has already parsed the lhs and hits a '=' token, and then it is too late. Or is it? I can think of a couple of ways to solve the problem, but they are more or less hacky and involve dynamic_casts which I'd like to avoid as much as possible. Thanks, -Johannes
Advertisement
Why is it 'too late' once the parser hits the =? Generally, the parser will split your code up into tokens, then you'll make an sytnax tree from that (AST).
One token look-ahead is typical in any parser.

Note that token != character. Thus, most parsers will first scan the text for tokens, which are returned to the semantic parser, that builds an expression tree based on the tokens that come in. You can make it using recursive descent, or using a stack based state machine.

Try using a compiler construction tool set, like bison/flex. They're designed to make these things easier. There are even C++-specific tools that are more modern than b/f.
enum Bool { True, False, FileNotFound };
Yes, my parser reads a stream of tokens produced by the lexer, and builds a syntax tree using recursive descent. The syntax tree consists of polymorphic nodes that the compiler traverses using the visitor pattern. This is quite nice and clean, this far.

Now, when the parser encounters the assignment operator, it has already created a syntax tree representation of the left hand side, and that subtree doesn't "know" (contain the information) that it is the lhs of an assignment, and thus the compiler thinks it is a load, not a store operation (by default, an identifier gets compiled to a load, as does an array element access).

When the parser hits the '=', it could set a flag in the subtree that the compiler could check, and compile the tree differently depending on the flag's value. Implementing this cleanly is what I have problems with.

As the project is largely to gain insight into the world of compilers, I'd rather not use any automatic lexer/parser generators.

The source (admittedly messy at places) can be found here (tarball here). The relevant files are parser.h/cpp and compiler.h/cpp.
I'd say you basically need to unify the mechanisms. The subtree shouldn't just yield a value, but should also yield a storage location if it's possible to use it as an l-value. Maybe it'll help if you think of such an expression as being one that takes a given memory location and, after applying modifiers to it, returns a new memory location.
I could be misunderstanding the problem, but it sounds to me like all you need to do is write the grammar as
<expression> = <expression>
and make sure that the left hand side is an lvalue in the semantic phase. (this is basically a matter of recursively asserting that the expression only contains dots and array subscripts)

From there, you just need to emit a load and a store.
"There is only one everything"
SpeedBump: The problem is lack of context. The function to compile an <expression> doesn't know (and, IMHO, shouldn't need to know) if the expression happens to be an lhs of an assignment or not. The compiler could set a flag when compiling <assignment> and compile <expression> differently depending on its value, but IMHO it is a somewhat kludgy solution, particulary considering that everything else in the compiler is completely context-independent.

Kylotan: Yep, I have been thinking about something similar. What about this: when loading a variable to a register, instead of copying its value, load a pointer to it. AFAICS this would work correctly in all situations. Gotta try it out.



Yeah, basically you need to allow some operations to yield the actual variable rather than the variable's value. This lets you apply offsets (eg. for arrays) or membership operators (eg. for objects/classes). Then you read the value only at the very point that an operation needs it. A pointer is the low-level way of doing this but as long as you're preserving the variable semantics throughout then you'll be ok.

This topic is closed to new replies.

Advertisement