I was wondering if there happened to be any x86 assembly people about (or perhaps individuals who have worked with this ISA some). Someone pointed me to the documents on Intel's website but I noticed that these documents are quite hefty. Are there in general some simple guidelines on emitting x86 instructions? I noticed that unlike RISC(MIPS/Sparc) setups it would seem that mapping intermediate representations to x86 machine is exceedingly more tricky (certain instructions are only associated with certain registers, a smaller number of registers, variable instruction sizes and execution times in cycles). If no one does that is ok too I will try asking on comp.compilers or maybe some open Intel forum.
Welcome to the world of keeping our 4GHz Quad-Core CPU's backwards compatible to the 4044 calculator chip. *sigh*
If you're used to RISC assembly, you might want to try your hand at Motorola 68000 as a stepping stone. It has the same CISC features of a small number of registers (some of which have semi-fixed uses) and variable instruction times, but a bit less complexity and uses "normal" byte ordering instead of the backwards idiocy that intel uses.
I loved 6502 assembly (C64 FTW!), and 68k (Amiga/Mac II)…. when I poked at x86, I decided to learn C. :cry:
There's a pretty good tutorial here: http://www.drpaulcarter.com/pcasm/ It's rather useful because you'll usually be programming against one or more OS ABIs and C ABIs. There is example code for several common environments. Also you might take a look at tcc at http://bellard.org/tcc/ which is a small and fairly understandable C compiler, which emits code in ELF and PE formats.
Actually backwards compatibility isn't really the biggest problem here. x86 truly is a complex language – hence the terminology CISC vs. RISC. It's not just the number of registers, either. In fact, the instruction set has only gotten more complex over the years, as I understand it, to take advantage of features on the newer processors/chipsets. (MMX being an example of a now-somewhat-dated addition)
Thanks guys. Are there any guides/references which tell you how many micro-ops on a Core 2/Core i7 a single instruction uses (and which ones maybe)? I was hoping for emitting code there would be some canonical subset which is typically used to keep the micro-op counts low.
The micro-ops usage varies by processor architecture, and since those get completely redone every two years, I'm not sure there's a good guide on it around. You may just want to go check out the GCC and/or LLVM sources, which have optimizers that include tweaking for different processor architectures. I believe GCC 4.4 has i7-specific optimizations, as well as Core2.
Out of curiosity, why are you trying to generate the assembly yourself, rather than generate IR for gcc/LLVM/etc. and have it do the code generation? Efficiency assembly code generation isn't exactly easy in general, and especially not on x86…
I am actually just exploring the feasibility of the whole thing at this point rather than committing to actually pursuing it. I actually have multiple goals for this project and one of them was to explore code generation issues and optimization to better familiarize myself with these aspects of how a compiler works. Targeting LLVM would probably be easier though I would have to probably convert CPS -> SSA form. As for gcc I am not entirely sure if even an impure FP would map properly into this IR since it would need to do tail recursive optimizations which typically compilers for imperative languages I dont think do(I believe this has been corrected in the latest releases of LLVM).
Efficient code generation on x86 probably is quite a challenge though and perhaps alot of it would not be particularly instructive (admittedly having knowledge of how the x86 family works might come in handy since most of the machines I own are x86 based). However doing code generation for something like a superscalar MIPS/Sparc chip might be quite an interesting exercise. However given that machines are predominantly x86 based might mean the program finds a very narrow/small audience.
I actually have multiple goals for this project and one of them was to explore code generation issues and optimization to better familiarize myself with these aspects of how a compiler works.
If the goal is just to play with optimization, I think there are better ways of achieving that that require less initial overhead. This class for instance is centered around a framework called Joeq that is used to analyze Java bytecode; you can do real optimizations fairly quickly. Look at HW2 and HW4 in particular.
As for gcc I am not entirely sure if even an impure FP would map properly into this IR since it would need to do tail recursive optimizations which typically compilers for imperative languages I dont think do(I believe this has been corrected in the latest releases of LLVM).
Googling for "gcc tail recursion" brings up several results suggesting that gcc has done this since at least 2008. :wink:
(admittedly having knowledge of how the x86 family works might come in handy since most of the machines I own are x86 based)
Well, I'll fess up and say that I have relatively little knowledge of x86 assembly (we did MIPS at school), and I get along pretty well! :smile: I'm not sure why knowing x86 would be useful unless you plan on doing things like disassembly.
Learning x86 assembly is a good exercise, especially if you have interest in language design or compiler design or hardware. I'm going to be spending a semester or three studying it again while at DigiPen, I'm sure… and I'd have definitely been using it quite often if I had managed to land that LLVM job I wanted at Apple. General application development doesn't need it at all, but it's still good to know.
There's also a lot of other interesting bits of hardware to learn that can actually help application development a bit. For example, understanding how buses works, memory works, and so on. If you're coding something that needs high performance (or is running in a resource constrained environment) that kind of knowledge can help you figure out how to get the best performance out of the hardware, especially given how doing so is often counter-intuitive on today's hardware, or just really different from when many of us went to school. I recall being taught performance tricks like building tables of pre-computed data, for example, which on today's machines will actually make things slower – the CPU can calculate quite a bit of data in the time it takes for RAM to fill a cache line, for example.
If you're interested in more practical purposes of code generation, projects like GNU Lightning, LLVM, or libJIT are all worth looking into. GNU Lightning is pretty limited, but it works. The main developer of libJIT is a complete asshat, but the code works and the project does what it claims to. (The man has a huge grudge against LLVM for being more popular, so just take anything he or his site says about LLVM with a giant grain of salt.) LLVM's big advantage is that it's not really just about x86, but can target a number of different architectures using a single interface, plus it includes a lot of functionality beyond just the raw IR->assembly features.
Elanthis- Thanks. I think it probably is quite important but also quite involved. It I guess is unfortunate in some ways that the predominant instruction set in use at the moment is something as complex as the x86 IS —-> A (I guess really it's a diverse set of architectures running similar/reasonably compatible instruction sets as someone pointed out to me). For me writing a compiler and playing with different types of optimizations hopefully will drive home some things about how hardware works nowadays. With multicore processors this landscape is probably going to continue to change rather rapidly. Nehalem is hyperthreaded + cc-NUMA I believe (shared L3 caches). I wonder to what extent technologies you find in more powerful computing systems will make it into desktop systems in order to try to speed things up.
I guess I would go with LLVM if I wasnt so interested in FP languages. I believe more natural mappings than SSA exist for FP (like CPS and some others). Admittedly converting CPS -> SSA probably isnt difficult (I think I ran across a paper describing this) but working with x86 directly seems to be better long term for me so I am thinking that it might be worth the effort.
DH- I looked over those homeworks and they seem to cover basic optimizations I probably would want to have in my compiler but not some of the stuff I have been looking at that people seem to be playing with (I have been strip-mining PLDI proceedings- alot of interesting stuff in there :P). The reason for my preoccupation with tail recursion optimization is that for languages like Scheme they have rather stringent requirements into how this needs to be implemented. I believe the Chicken Scheme compiler had to use some tricks to get around this particular problem (it compiles to C). I have some experience with both SPARC and now MIPS (been reading Patterson & Hennessy in my spare time- both the COA and CA books are very instructive). I have read about the SUIF project- it seems pretty cool. Maybe at some point I will get good enough to understand what Monica Lam & co. are actually doing/have done over there.
I strongly second acquiring a basic understanding of architecture, but I'm less convinced that you need to understand the details of assembly unless you're doing something extremely specific.
I looked over those homeworks and they seem to cover basic optimizations I probably would want to have in my compiler but not some of the stuff I have been looking at that people seem to be playing with (I have been strip-mining PLDI proceedings- alot of interesting stuff in there :P).
Well, you have to learn to walk before you run and all that.
Maybe at some point I will get good enough to understand what Monica Lam & co. are actually doing/have done over there.
You have to start with the basics and accept that your initial results will be suboptimal. :smile: That seems to be the biggest thing holding you back: you aren't content to start with basic x86, you want to dive into multi-core architecture with micro-operations while at the same time implementing optimizations that are only just being described in PLDI research papers. I think this is one of those cases where trying to push for too much just ends up holding the whole thing back.
The neat thing about tail recursion is that it can be implemented in several different ways in an imperative language with no compiler support, either by converting to loops (for self-recursive functions) or gotos. Gotos (jumps) are exactly how you'd have to implement it in assembly anyways, iirc. The same is true for any imperative-oriented IR, like LLVM's or GCC's. .NET/CLR supports functional languages and I believe LLVM has a partial CLR VM in its VMKit, so that may give some examples of using it.
Using another IR form better suited to FP is of course a good idea – I'm just not personally away of any good projects in a usable state with multiple target support using any FP-oriented IR, unfortunately.
I don't know of any programs using FP specific IR's either which is why I probably have to implement my own. I have been working on this actually and learning to read PLDI articles etc and being able to parse some of them can sometimes be a bit difficult. The problem with FP is also the optimizations arent quite the same (and how you would implement them) arent quite the same as the ones you find in a standard imperative compiler as well. So the issue isnt so much about running/walking etc it's about selecting the right set of basic optimizations and also which advance optimizations you want to experiment with which are appropriate for FP. I think reading PLDI stuff is a skill- alot of the optimizations you find in textbooks you can find in old acm journal articles (i.e. ones that are considered basic/canonical and are now taught in classroom situations). I am not aiming for optimal either :P. I suspect I might have as my first try something pretty much without optimization passes at all. Just simple instruction selection and register allocation etc. I am sort of waiting atm for my MIPS machine to get here so I can start experimenting :P.
It could well be. I haven't looked into it. I guess the difficulty with a Scheme compiler -> C would be you would have the guarantee that all C compilers correctly optimize the case (since the mandatory tail recursion thing my understanding is is part of the language spec). The other troublesome case is call-cc which also needs special handling. I heard it is most natural to do this with CPS over other types of IRs- but my knowledge here is limited. Perhaps I will know more as I progress further in this implementation project.