<!-- MHonArc v2.4.4 --> <!--X-Subject: Tutorial: Let's build a Compiler! - Part XIII: Procedures --> <!--X-From-R13: "Xba O. Znzoreg" <wyflfvapNvk.argpbz.pbz> --> <!--X-Date: Mon, 02 Mar 1998 21:39:29 +0000 --> <!--X-Message-Id: 000601bd4623$dd492540$ca3bd8ce@default --> <!--X-Content-Type: text/plain --> <!--X-Head-End--> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>MUD-Dev message, Tutorial: Let's build a Compiler! - Part XIII: Procedures</title> <!-- meta name="robots" content="noindex,nofollow" --> <link rev="made" href="mailto:jlsysinc#ix,netcom.com"> </head> <body background="/backgrounds/paperback.gif" bgcolor="#ffffff" text="#000000" link="#0000FF" alink="#FF0000" vlink="#006000"> <font size="+4" color="#804040"> <strong><em>MUD-Dev<br>mailing list archive</em></strong> </font> <br> [ <a href="../">Other Periods</a> | <a href="../../">Other mailing lists</a> | <a href="/search.php3">Search</a> ] <br clear=all><hr> <!--X-Body-Begin--> <!--X-User-Header--> <!--X-User-Header-End--> <!--X-TopPNI--> Date: [ <a href="msg00671.html">Previous</a> | <a href="msg00673.html">Next</a> ] Thread: [ <a href="msg00679.html">Previous</a> | <a href="msg00671.html">Next</a> ] Index: [ <A HREF="author.html#00672">Author</A> | <A HREF="#00672">Date</A> | <A HREF="thread.html#00672">Thread</A> ] <!--X-TopPNI-End--> <!--X-MsgBody--> <!--X-Subject-Header-Begin--> <H1>Tutorial: Let's build a Compiler! - Part XIII: Procedures</H1> <HR> <!--X-Subject-Header-End--> <!--X-Head-of-Message--> <UL> <LI><em>To</em>: <<A HREF="mailto:mud-dev#null,net">mud-dev#null,net</A>></LI> <LI><em>Subject</em>: Tutorial: Let's build a Compiler! - Part XIII: Procedures</LI> <LI><em>From</em>: "Jon A. Lambert" <<A HREF="mailto:jlsysinc#ix,netcom.com">jlsysinc#ix,netcom.com</A>></LI> <LI><em>Date</em>: Mon, 2 Mar 1998 16:40:19 -0500</LI> </UL> <!--X-Head-of-Message-End--> <!--X-Head-Body-Sep-Begin--> <HR> <!--X-Head-Body-Sep-End--> <!--X-Body-of-Message--> <PRE> LET'S BUILD A COMPILER! By Jack W. Crenshaw, Ph.D. 27 August 1989 Part XIII: PROCEDURES ***************************************************************** * * * COPYRIGHT NOTICE * * * * Copyright (C) 1989 Jack W. Crenshaw. All rights reserved. * * * ***************************************************************** INTRODUCTION At last we get to the good part! At this point we've studied almost all the basic features of compilers and parsing. We have learned how to translate arithmetic expressions, Boolean expressions, control constructs, data declarations, and I/O statements. We have defined a language, TINY 1.3, that embodies all these features, and we have written a rudimentary compiler that can translate them. By adding some file I/O we could indeed have a working compiler that could produce executable object files from programs written in TINY. With such a compiler, we could write simple programs that could read integer data, perform calculations with it, and output the results. That's nice, but what we have is still only a toy language. We can't read or write even a single character of text, and we still don't have procedures. It's the features to be discussed in the next couple of installments that separate the men from the toys, so to speak. "Real" languages have more than one data type, and they support procedure calls. More than any others, it's these two features that give a language much of its character and personality. Once we have provided for them, our languages, TINY and its successors, will cease to become toys and will take on the character of real languages, suitable for serious programming jobs. For several installments now, I've been promising you sessions on these two important subjects. Each time, other issues came up that required me to digress and deal with them. Finally, we've been able to put all those issues to rest and can get on with the mainstream of things. In this installment, I'll cover procedures. Next time, we'll talk about the basic data types. ONE LAST DIGRESSION This has been an extraordinarily difficult installment for me to write. The reason has nothing to do with the subject itself ... I've known what I wanted to say for some time, and in fact I presented most of this at Software Development '89, back in February. It has more to do with the approach. Let me explain. When I first began this series, I told you that we would use several "tricks" to make things easy, and to let us learn the concepts without getting too bogged down in the details. Among these tricks was the idea of looking at individual pieces of a compiler at a time, i.e. performing experiments using the Cradle as a base. When we studied expressions, for example, we dealt with only that part of compiler theory. When we studied control structures, we wrote a different program, still based on the Cradle, to do that part. We only incorporated these concepts into a complete language fairly recently. These techniques have served us very well indeed, and led us to the development of a compiler for TINY version 1.3. When I first began this session, I tried to build upon what we had already done, and just add the new features to the existing compiler. That turned out to be a little awkward and tricky ... much too much to suit me. I finally figured out why. In this series of experiments, I had abandoned the very useful techniques that had allowed us to get here, and without meaning to I had switched over into a new method of working, that involved incremental changes to the full TINY compiler. You need to understand that what we are doing here is a little unique. There have been a number of articles, such as the Small C articles by Cain and Hendrix, that presented finished compilers for one language or another. This is different. In this series of tutorials, you are watching me design and implement both a language and a compiler, in real time. In the experiments that I've been doing in preparation for this article, I was trying to inject the changes into the TINY compiler in such a way that, at every step, we still had a real, working compiler. In other words, I was attempting an incremental enhancement of the language and its compiler, while at the same time explaining to you what I was doing. That's a tough act to pull off! I finally realized that it was dumb to try. Having gotten this far using the idea of small experiments based on single-character tokens and simple, special-purpose programs, I had abandoned them in favor of working with the full compiler. It wasn't working. So we're going to go back to our roots, so to speak. In this installment and the next, I'll be using single-character tokens again as we study the concepts of procedures, unfettered by the other baggage that we have accumulated in the previous sessions. As a matter of fact, I won't even attempt, at the end of this session, to merge the constructs into the TINY compiler. We'll save that for later. After all this time, you don't need more buildup than that, so let's waste no more time and dive right in. THE BASICS All modern CPU's provide direct support for procedure calls, and the 68000 is no exception. For the 68000, the call is a BSR (PC-relative version) or JSR, and the return is RTS. All we have to do is to arrange for the compiler to issue these commands at the proper place. Actually, there are really THREE things we have to address. One of them is the call/return mechanism. The second is the mechanism for DEFINING the procedure in the first place. And, finally, there is the issue of passing parameters to the called procedure. None of these things are really very difficult, and we can of course borrow heavily on what people have done in other languages ... there's no need to reinvent the wheel here. Of the three issues, that of parameter passing will occupy most of our attention, simply because there are so many options available. A BASIS FOR EXPERIMENTS As always, we will need some software to serve as a basis for what we are doing. We don't need the full TINY compiler, but we do need enough of a program so that some of the other constructs are present. Specifically, we need at least to be able to handle statements of some sort, and data declarations. The program shown below is that basis. It's a vestigial form of TINY, with single-character tokens. It has data declarations, but only in their simplest form ... no lists or initializers. It has assignment statements, but only of the kind <ident> = <ident> In other words, the only legal expression is a single variable name. There are no control constructs ... the only legal statement is the assignment. Most of the program is just the standard Cradle routines. I've shown the whole thing here, just to make sure we're all starting from the same point: {--------------------------------------------------------------} program Calls; {--------------------------------------------------------------} { Constant Declarations } const TAB = ^I; CR = ^M; LF = ^J; {--------------------------------------------------------------} { Variable Declarations } var Look: char; { Lookahead Character } var ST: Array['A'..'Z'] of char; {--------------------------------------------------------------} { Read New Character From Input Stream } procedure GetChar; begin Read(Look); end; {--------------------------------------------------------------} { Report an Error } procedure Error(s: string); begin WriteLn; WriteLn(^G, 'Error: ', s, '.'); end; {--------------------------------------------------------------} { Report Error and Halt } procedure Abort(s: string); begin Error(s); Halt; end; {--------------------------------------------------------------} { Report What Was Expected } procedure Expected(s: string); begin Abort(s + ' Expected'); end; {--------------------------------------------------------------} { Report an Undefined Identifier } procedure Undefined(n: string); begin Abort('Undefined Identifier ' + n); end; {--------------------------------------------------------------} { Report an Duplicate Identifier } procedure Duplicate(n: string); begin Abort('Duplicate Identifier ' + n); end; {--------------------------------------------------------------} { Get Type of Symbol } function TypeOf(n: char): char; begin TypeOf := ST[n]; end; {--------------------------------------------------------------} { Look for Symbol in Table } function InTable(n: char): Boolean; begin InTable := ST[n] <> ' '; end; {--------------------------------------------------------------} { Add a New Symbol to Table } procedure AddEntry(Name, T: char); begin if Intable(Name) then Duplicate(Name); ST[Name] := T; end; {--------------------------------------------------------------} { Check an Entry to Make Sure It's a Variable } procedure CheckVar(Name: char); begin if not InTable(Name) then Undefined(Name); if TypeOf(Name) <> 'v' then Abort(Name + ' is not a variable'); end; {--------------------------------------------------------------} { Recognize an Alpha Character } function IsAlpha(c: char): boolean; begin IsAlpha := upcase(c) in ['A'..'Z']; end; {--------------------------------------------------------------} { Recognize a Decimal Digit } function IsDigit(c: char): boolean; begin IsDigit := c in ['0'..'9']; end; {--------------------------------------------------------------} { Recognize an AlphaNumeric Character } function IsAlNum(c: char): boolean; begin IsAlNum := IsAlpha(c) or IsDigit(c); end; {--------------------------------------------------------------} { Recognize an Addop } function IsAddop(c: char): boolean; begin IsAddop := c in ['+', '-']; end; {--------------------------------------------------------------} { Recognize a Mulop } function IsMulop(c: char): boolean; begin IsMulop := c in ['*', '/']; end; {--------------------------------------------------------------} { Recognize a Boolean Orop } function IsOrop(c: char): boolean; begin IsOrop := c in ['|', '~']; end; {--------------------------------------------------------------} { Recognize a Relop } function IsRelop(c: char): boolean; begin IsRelop := c in ['=', '#', '<', '>']; end; {--------------------------------------------------------------} { Recognize White Space } function IsWhite(c: char): boolean; begin IsWhite := c in [' ', TAB]; end; {--------------------------------------------------------------} { Skip Over Leading White Space } procedure SkipWhite; begin while IsWhite(Look) do GetChar; end; {--------------------------------------------------------------} { Skip Over an End-of-Line } procedure Fin; begin if Look = CR then begin GetChar; if Look = LF then GetChar; end; end; {--------------------------------------------------------------} { Match a Specific Input Character } procedure Match(x: char); begin if Look = x then GetChar else Expected('''' + x + ''''); SkipWhite; end; {--------------------------------------------------------------} { Get an Identifier } function GetName: char; begin if not IsAlpha(Look) then Expected('Name'); GetName := UpCase(Look); GetChar; SkipWhite; end; {--------------------------------------------------------------} { Get a Number } function GetNum: char; begin if not IsDigit(Look) then Expected('Integer'); GetNum := Look; GetChar; SkipWhite; end; {--------------------------------------------------------------} { Output a String with Tab } procedure Emit(s: string); begin Write(TAB, s); end; {--------------------------------------------------------------} { Output a String with Tab and CRLF } procedure EmitLn(s: string); begin Emit(s); WriteLn; end; {--------------------------------------------------------------} { Post a Label To Output } procedure PostLabel(L: string); begin WriteLn(L, ':'); end; {--------------------------------------------------------------} { Load a Variable to the Primary Register } procedure LoadVar(Name: char); begin CheckVar(Name); EmitLn('MOVE ' + Name + '(PC),D0'); end; {--------------------------------------------------------------} { Store the Primary Register } procedure StoreVar(Name: char); begin CheckVar(Name); EmitLn('LEA ' + Name + '(PC),A0'); EmitLn('MOVE D0,(A0)') end; {--------------------------------------------------------------} { Initialize } procedure Init; var i: char; begin GetChar; SkipWhite; for i := 'A' to 'Z' do ST[i] := ' '; end; {--------------------------------------------------------------} { Parse and Translate an Expression } { Vestigial Version } procedure Expression; begin LoadVar(GetName); end; {--------------------------------------------------------------} { Parse and Translate an Assignment Statement } procedure Assignment; var Name: char; begin Name := GetName; Match('='); Expression; StoreVar(Name); end; {--------------------------------------------------------------} { Parse and Translate a Block of Statements } procedure DoBlock; begin while not(Look in ['e']) do begin Assignment; Fin; end; end; {--------------------------------------------------------------} { Parse and Translate a Begin-Block } procedure BeginBlock; begin Match('b'); Fin; DoBlock; Match('e'); Fin; end; {--------------------------------------------------------------} { Allocate Storage for a Variable } procedure Alloc(N: char); begin if InTable(N) then Duplicate(N); ST[N] := 'v'; WriteLn(N, ':', TAB, 'DC 0'); end; {--------------------------------------------------------------} { Parse and Translate a Data Declaration } procedure Decl; var Name: char; begin Match('v'); Alloc(GetName); end; {--------------------------------------------------------------} { Parse and Translate Global Declarations } procedure TopDecls; begin while Look <> 'b' do begin case Look of 'v': Decl; else Abort('Unrecognized Keyword ' + Look); end; Fin; end; end; {--------------------------------------------------------------} { Main Program } begin Init; TopDecls; BeginBlock; end. {--------------------------------------------------------------} Note that we DO have a symbol table, and there is logic to check a variable name to make sure it's a legal one. It's also worth noting that I have included the code you've seen before to provide for white space and newlines. Finally, note that the main program is delimited, as usual, by BEGIN-END brackets. Once you've copied the program to Turbo, the first step is to compile it and make sure it works. Give it a few declarations, and then a begin-block. Try something like: va (for VAR A) vb (for VAR B) vc (for VAR C) b (for BEGIN) a=b b=c e. (for END.) As usual, you should also make some deliberate errors, and verify that the program catches them correctly. DECLARING A PROCEDURE If you're satisfied that our little program works, then it's time to deal with the procedures. Since we haven't talked about parameters yet, we'll begin by considering only procedures that have no parameter lists. As a start, let's consider a simple program with a procedure, and think about the code we'd like to see generated for it: PROGRAM FOO; . . PROCEDURE BAR; BAR: BEGIN . . . . . END; RTS BEGIN { MAIN PROGRAM } MAIN: . . . . FOO; BSR BAR . . . . END. END MAIN Here I've shown the high-order language constructs on the left, and the desired assembler code on the right. The first thing to notice is that we certainly don't have much code to generate here! For the great bulk of both the procedure and the main program, our existing constructs take care of the code to be generated. The key to dealing with the body of the procedure is to recognize that although a procedure may be quite long, declaring it is really no different than declaring a variable. It's just one more kind of declaration. We can write the BNF: <declaration> ::= <data decl> | <procedure> This means that it should be easy to modify TopDecl to deal with procedures. What about the syntax of a procedure? Well, here's a suggested syntax, which is essentially that of Pascal: <procedure> ::= PROCEDURE <ident> <begin-block> There is practically no code generation required, other than that generated within the begin-block. We need only emit a label at the beginning of the procedure, and an RTS at the end. Here's the required code: {--------------------------------------------------------------} { Parse and Translate a Procedure Declaration } procedure DoProc; var N: char; begin Match('p'); N := GetName; Fin; if InTable(N) then Duplicate(N); ST[N] := 'p'; PostLabel(N); BeginBlock; Return; end; {--------------------------------------------------------------} Note that I've added a new code generation routine, Return, which merely emits an RTS instruction. The creation of that routine is "left as an exercise for the student." To finish this version, add the following line within the Case statement in DoBlock: 'p': DoProc; I should mention that this structure for declarations, and the BNF that drives it, differs from standard Pascal. In the Jensen & Wirth definition of Pascal, variable declarations, in fact ALL kinds of declarations, must appear in a specific sequence, i.e. labels, constants, types, variables, procedures, and main program. To follow such a scheme, we should separate the two declarations, and have code in the main program something like DoVars; DoProcs; DoMain; However, most implementations of Pascal, including Turbo, don't require that order and let you freely mix up the various declarations, as long as you still don't try to refer to something before it's declared. Although it may be more aesthetically pleasing to declare all the global variables at the top of the program, it certainly doesn't do any HARM to allow them to be sprinkled around. In fact, it may do some GOOD, in the sense that it gives you the opportunity to do a little rudimentary information hiding. Variables that should be accessed only by the main program, for example, can be declared just before it and will thus be inaccessible by the procedures. OK, try this new version out. Note that we can declare as many procedures as we choose (as long as we don't run out of single- character names!), and the labels and RTS's all come out in the right places. It's worth noting here that I do _NOT_ allow for nested procedures. In TINY, all procedures must be declared at the global level, the same as in C. There has been quite a discussion about this point in the Computer Language Forum of CompuServe. It turns out that there is a significant penalty in complexity that must be paid for the luxury of nested procedures. What's more, this penalty gets paid at RUN TIME, because extra code must be added and executed every time a procedure is called. I also happen to believe that nesting is not a good idea, simply on the grounds that I have seen too many abuses of the feature. Before going on to the next step, it's also worth noting that the "main program" as it stands is incomplete, since it doesn't have the label and END statement. Let's fix that little oversight: {--------------------------------------------------------------} { Parse and Translate a Main Program } procedure DoMain; begin Match('b'); Fin; Prolog; DoBlock; Epilog; end; {--------------------------------------------------------------} </PRE> <!--X-Body-of-Message-End--> <!--X-MsgBody-End--> <!--X-Follow-Ups--> <HR> <!--X-Follow-Ups-End--> <!--X-References--> <!--X-References-End--> <!--X-BotPNI--> <UL> <LI>Prev by Date: <STRONG><A HREF="msg00671.html">Tutorial: Let's build a Compiler! - Part XII: Miscellany</A></STRONG> </LI> <LI>Next by Date: <STRONG><A HREF="msg00673.html">Re: [MUD-Dev] Net protocols for MUDing</A></STRONG> </LI> <LI>Prev by thread: <STRONG><A HREF="msg00679.html">Re: [MUD-Dev] Describing the environment</A></STRONG> </LI> <LI>Next by thread: <STRONG><A HREF="msg00671.html">Tutorial: Let's build a Compiler! - Part XII: Miscellany</A></STRONG> </LI> <LI>Index(es): <UL> <LI><A HREF="index.html#00672"><STRONG>Date</STRONG></A></LI> <LI><A HREF="thread.html#00672"><STRONG>Thread</STRONG></A></LI> </UL> </LI> </UL> <!--X-BotPNI-End--> <!--X-User-Footer--> <!--X-User-Footer-End--> <ul><li>Thread context: <BLOCKQUOTE><UL> <LI><strong><A NAME="00682" HREF="msg00682.html">Monthly FAQ Posting</A></strong>, Ling <a href="mailto:K.L.Lo-94#student,lboro.ac.uk">K.L.Lo-94#student,lboro.ac.uk</a>, Wed 04 Mar 1998, 13:28 GMT <LI><strong><A NAME="00681" HREF="msg00681.html">Re: [MUD-Dev] Tutorial: Let's build a Compiler!</A></strong>, Chris Gray <a href="mailto:cg#ami-cg,GraySage.Edmonton.AB.CA">cg#ami-cg,GraySage.Edmonton.AB.CA</a>, Wed 04 Mar 1998, 01:39 GMT <LI><strong><A NAME="00678" HREF="msg00678.html">Describing the environment</A></strong>, Stephen Zepp <a href="mailto:zoran#enid,com">zoran#enid,com</a>, Tue 03 Mar 1998, 22:36 GMT <UL> <LI><strong><A NAME="00679" HREF="msg00679.html">Re: [MUD-Dev] Describing the environment</A></strong>, Richard Woolcock <a href="mailto:KaVir#dial,pipex.com">KaVir#dial,pipex.com</a>, Tue 03 Mar 1998, 23:43 GMT </LI> </UL> </LI> <LI><strong><A NAME="00672" HREF="msg00672.html">Tutorial: Let's build a Compiler! - Part XIII: Procedures</A></strong>, Jon A. Lambert <a href="mailto:jlsysinc#ix,netcom.com">jlsysinc#ix,netcom.com</a>, Mon 02 Mar 1998, 21:39 GMT <LI><strong><A NAME="00671" HREF="msg00671.html">Tutorial: Let's build a Compiler! - Part XII: Miscellany</A></strong>, Jon A. Lambert <a href="mailto:jlsysinc#ix,netcom.com">jlsysinc#ix,netcom.com</a>, Mon 02 Mar 1998, 21:38 GMT <LI><strong><A NAME="00670" HREF="msg00670.html">Tutorial: Let's build a Compiler! - Part XI: Lexical Scan Revisited</A></strong>, Jon A. Lambert <a href="mailto:jlsysinc#ix,netcom.com">jlsysinc#ix,netcom.com</a>, Mon 02 Mar 1998, 21:38 GMT <LI><strong><A NAME="00667" HREF="msg00667.html">Re: [MUD-Dev] Net protocols for MUDing</A></strong>, Chris Gray <a href="mailto:cg#ami-cg,GraySage.Edmonton.AB.CA">cg#ami-cg,GraySage.Edmonton.AB.CA</a>, Mon 02 Mar 1998, 09:35 GMT <UL> <li><Possible follow-up(s)><br> <LI><strong><A NAME="00803" HREF="msg00803.html">Re: [MUD-Dev] Net protocols for MUDing</A></strong>, Chris Gray <a href="mailto:cg#ami-cg,GraySage.Edmonton.AB.CA">cg#ami-cg,GraySage.Edmonton.AB.CA</a>, Sat 21 Mar 1998, 02:29 GMT </LI> </UL> </LI> </UL></BLOCKQUOTE> </ul> <hr> <center> [ <a href="../">Other Periods</a> | <a href="../../">Other mailing lists</a> | <a href="/search.php3">Search</a> ] </center> <hr> </body> </html>