8 years ago · 51c46ba691
--- a/doc/il.txt
+++ b/doc/il.txt
@@ -46,18 +46,18 @@ to focus on language design issues.
 
				 ~ Input Files
			
 
				 ~~~~~~~~~~~~~
			
 
				 
			
 
				-The intermediate language is provided to QBE as text files.
			
 
				-Usually, one file is generated per each compilation unit of
			
 
				+The intermediate language is provided to QBE as text.
			
 
				+Usually, one file is generated per each compilation unit from
			
 
				 the frontend input language.  An IL file is a sequence of
			
 
				 <@ Definitions > for data, functions, and types.  Once
			
 
				 processed by QBE, the resulting file can be assembled and
			
 
				 linked using a standard toolchain (e.g., GNU binutils).
			
 
				 
			
 
				-Here is a complete "Hello World" IL file, it defines a
			
 
				+Here is a complete "Hello World" IL file which defines a
			
 
				 function that prints to the screen.  Since the string is
			
 
				 not a first class object (only the pointer is) it is
			
 
				 defined outside the function's body.  Comments start with
			
 
				-a # character and run until the end of the line.
			
 
				+a # character and finish with the end of the line.
			
 
				 
			
 
				     # Define the string constant.
			
 
				     data $str = { b "hello world", b 0 }
			
@@ -70,7 +70,7 @@ a # character and run until the end of the line.
 
				     }
			
 
				 
			
 
				 If you have read the LLVM language reference, you might
			
 
				-recognize the above example.  In comparison, QBE makes a
			
 
				+recognize the example above.  In comparison, QBE makes a
			
 
				 much lighter use of types and the syntax is terser.
			
 
				 
			
 
				 ~ BNF Notation
			
@@ -86,7 +86,7 @@ are listed below.
 
				   * `( ... ),` designates a comma-separated list of the
			
 
				     enclosed syntax;
			
 
				   * `...*` and `...+` are used for arbitrary and
			
 
				-    at-least-once repetition.
			
 
				+    at-least-once repetition respectively.
			
 
				 
			
 
				 ~ Sigils
			
 
				 ~~~~~~~~
			
@@ -94,14 +94,14 @@ are listed below.
 
				 The intermediate language makes heavy use of sigils, all
			
 
				 user-defined names are prefixed with a sigil.  This is
			
 
				 to avoid keyword conflicts, and also to quickly spot the
			
 
				-scope and kind of an identifier.
			
 
				+scope and nature of identifiers.
			
 
				 
			
 
				  * `:` is for user-defined <@ Aggregate Types>
			
 
				  * `$` is for globals (represented by a pointer)
			
 
				  * `%` is for function-scope temporaries
			
 
				  * `@` is for block labels
			
 
				 
			
 
				-In BNF syntax, we use `?IDENT` to designate an identifier
			
 
				+In this BNF syntax, we use `?IDENT` to designate an identifier
			
 
				 starting with the sigil `?`.
			
 
				 
			
 
				 - 2. Types
			
@@ -114,7 +114,7 @@ starting with the sigil `?`.
 
				     BASETY := 'w' | 'l' | 's' | 'd'  # Base types
			
 
				     EXTTY  := BASETY    | 'b' | 'h'  # Extended types
			
 
				 
			
 
				-The IL makes very minimal use of types.  By design, the types
			
 
				+The IL makes minimal use of types.  By design, the types
			
 
				 used are restricted to what is necessary for unambiguous
			
 
				 compilation to machine code and C interfacing.  Unlike LLVM,
			
 
				 QBE is not using types as a means to safety; they are only
			
@@ -140,16 +140,16 @@ section.
 
				 ~ Subtyping
			
 
				 ~~~~~~~~~~~
			
 
				 
			
 
				-The IL has a minimal subtyping feature for integer types.
			
 
				+The IL has a minimal subtyping feature, for integer types only.
			
 
				 Any value of type `l` can be used in a `w` context.  In that
			
 
				 case, only the 32 least significant bits of the word value
			
 
				 are used.
			
 
				 
			
 
				-Make note that it is the inverse of the usual subtyping on
			
 
				+Make note that it is the opposite of the usual subtyping on
			
 
				 integers (in C, we can safely use an `int` where a `long`
			
 
				 is expected).  A long value cannot be used in word context.
			
 
				 The rationale is that a word can be signed or unsigned, so
			
 
				-extending it to a long can be done in two ways, either
			
 
				+extending it to a long could be done in two ways, either
			
 
				 by zero-extension, or by sign-extension.
			
 
				 
			
 
				 - 3. Constants
			
@@ -184,9 +184,9 @@ operand of the subtraction is a word (32-bit) context.
 
				 
			
 
				 Because specifying floating-point constants by their bits
			
 
				 makes the code less readable, syntactic sugar is provided
			
 
				-to express them.  Standard scientific notation is used with
			
 
				-a prefix of `s_` for single and `d_` for double-precision
			
 
				-numbers.  Once again, the following example defines twice
			
 
				+to express them.  Standard scientific notation is prefixed
			
 
				+with `s_` and `d_` for single and double precision numbers
			
 
				+respectively. Once again, the following example defines twice
			
 
				 the same double-precision constant.
			
 
				 
			
 
				     %x =d add d_0, d_-1
			
@@ -200,7 +200,7 @@ constants by the linker.
 
				 ----------------
			
 
				 
			
 
				 Definitions are the essential components of an IL file.
			
 
				-They can define three types of objects: Aggregate types,
			
 
				+They can define three types of objects: aggregate types,
			
 
				 data, and functions.  Aggregate types are never exported
			
 
				 and do not compile to any code.  Data and function
			
 
				 definitions have file scope and are mutually recursive
			
@@ -221,14 +221,14 @@ using the `export` keyword.
 
				         'type' :IDENT '=' 'align' NUMBER '{' NUMBER '}'
			
 
				 
			
 
				 Aggregate type definitions start with the `type` keyword.
			
 
				-They have file scope, but types must be defined before their
			
 
				-first use.  The inner structure of a type is expressed by a
			
 
				+They have file scope, but types must be defined before being
			
 
				+referenced.  The inner structure of a type is expressed by a
			
 
				 comma-separated list of <@ Simple Types> enclosed in curly
			
 
				 braces.
			
 
				 
			
 
				     type :fourfloats = { s, s, d, d }
			
 
				 
			
 
				-For ease of generation, a trailing comma is tolerated by
			
 
				+For ease of IL generation, a trailing comma is tolerated by
			
 
				 the parser.  In case many items of the same type are
			
 
				 sequenced (like in a C array), the shorter array syntax
			
 
				 can be used.
			
@@ -243,7 +243,7 @@ explicitly specified by the programmer.
 
				 
			
 
				 Opaque types are used when the inner structure of an
			
 
				 aggregate cannot be specified; the alignment for opaque
			
 
				-types is mandatory.  They are defined by simply enclosing
			
 
				+types is mandatory.  They are defined simply by enclosing
			
 
				 their size between curly braces.
			
 
				 
			
 
				     type :opaque = align 16 { 32 }
			
@@ -264,7 +264,7 @@ their size between curly braces.
 
				       |  '"' ... '"'         # String
			
 
				       |  CONST               # Constant
			
 
				 
			
 
				-Data definitions define objects that will be emitted in the
			
 
				+Data definitions express objects that will be emitted in the
			
 
				 compiled file.  They can be local to the file or exported
			
 
				 with global visibility to the whole program.
			
 
				 
			
@@ -282,11 +282,11 @@ initialize multiple fields of the same size.
 
				 The members of a struct will be packed.  This means that
			
 
				 padding has to be emitted by the frontend when necessary.
			
 
				 Alignment of the whole data objects can be manually specified,
			
 
				-and when no alignment is provided, the maximum alignment of
			
 
				+and when no alignment is provided, the maximum alignment from
			
 
				 the platform is used.
			
 
				 
			
 
				 When the `z` letter is used the number following indicates
			
 
				-the size of the field, the contents of the field are zero
			
 
				+the size of the field; the contents of the field are zero
			
 
				 initialized.  It can be used to add padding between fields
			
 
				 or zero-initialize big arrays.
			
 
				 
			
@@ -325,19 +325,18 @@ Here are various examples of data definitions.
 
				 Function definitions contain the actual code to emit in
			
 
				 the compiled file.  They define a global symbol that
			
 
				 contains a pointer to the function code.  This pointer
			
 
				-can be used in call instructions or stored in memory.
			
 
				+can be used in `call` instructions or stored in memory.
			
 
				 
			
 
				 The type given right before the function name is the
			
 
				 return type of the function.  All return values of this
			
 
				-function must have the return type.  If the return
			
 
				+function must have this return type.  If the return
			
 
				 type is missing, the function cannot return any value.
			
 
				 
			
 
				 The parameter list is a comma separated list of
			
 
				 temporary names prefixed by types.  The types are used
			
 
				 to correctly implement C compatibility.  When an argument
			
 
				-has an aggregate type, is is set on entry of the
			
 
				-function to a pointer to the aggregate passed by the
			
 
				-caller.  In the example below, we have to use a load
			
 
				+has an aggregate type, a pointer to the aggregate is passed
			
 
				+by the caller.  In the example below, we have to use a load
			
 
				 instruction to get the value of the first (and only)
			
 
				 member of the struct.
			
 
				 
			
@@ -350,7 +349,7 @@ member of the struct.
 
				     }
			
 
				 
			
 
				 If the parameter list ends with `...`, the function is
			
 
				-a variadic function: It can accept a variable number of
			
 
				+a variadic function: it can accept a variable number of
			
 
				 arguments.  To access the extra arguments provided by
			
 
				 the caller, use the `vastart` and `vaarg` instructions
			
 
				 described in the <@ Variadic > section.
			
@@ -375,10 +374,10 @@ very good compatibility with C.  The <@ Call > section
 
				 explains how to pass an environment parameter.
			
 
				 
			
 
				 Since global symbols are defined mutually recursive,
			
 
				-there is no need for function declarations: A function
			
 
				+there is no need for function declarations: a function
			
 
				 can be referenced before its definition.
			
 
				 Similarly, functions from other modules can be used
			
 
				-without previous declarations.  All the type information
			
 
				+without previous declaration.  All the type information
			
 
				 is provided in the call instructions.
			
 
				 
			
 
				 The syntax and semantics for the body of functions
			
@@ -389,8 +388,8 @@ are described in the <@ Control > section.
 
				 
			
 
				 The IL represents programs as textual transcriptions of
			
 
				 control flow graphs.  The control flow is serialized as
			
 
				-a sequence of blocks of straight-line code and connected
			
 
				-using jump instructions.
			
 
				+a sequence of blocks of straight-line code which are
			
 
				+connected using jump instructions.
			
 
				 
			
 
				 ~ Blocks
			
 
				 ~~~~~~~~
			
@@ -406,12 +405,12 @@ All blocks have a name that is specified by a label at
 
				 their beginning.  Then follows a sequence of instructions
			
 
				 that have "fall-through" flow.  Finally one jump terminates
			
 
				 the block.  The jump can either transfer control to another
			
 
				-block of the same function or return, they are described
			
 
				+block of the same function or return; they are described
			
 
				 further below.
			
 
				 
			
 
				 The first block in a function must not be the target of
			
 
				-any jump in the program.  If this need is encountered,
			
 
				-the frontend can always insert an empty prelude block
			
 
				+any jump in the program.  If this is really needed,
			
 
				+the frontend could insert an empty prelude block
			
 
				 at the beginning of the function.
			
 
				 
			
 
				 When one block jumps to the next block in the IL file,
			
@@ -453,7 +452,7 @@ the following list.
 
				 
			
 
				     When its word argument is non-zero, it jumps to its
			
 
				     first label argument; otherwise it jumps to the other
			
 
				-    label.  The argument must be of word type, because of
			
 
				+    label.  The argument must be of word type; because of
			
 
				     subtyping a long argument can be passed, but only its
			
 
				     least significant 32 bits will be compared to 0.
			
 
				 
			
@@ -461,7 +460,7 @@ the following list.
 
				 
			
 
				     Terminates the execution of the current function,
			
 
				     optionally returning a value to the caller.  The value
			
 
				-    returned must have the type given in the function
			
 
				+    returned must be of the type given in the function
			
 
				     prototype.  If the function prototype does not specify
			
 
				     a return type, no return value can be used.
			
 
				 
			
@@ -498,12 +497,12 @@ This is made explicit by the instruction suffix.
 
				 The types of instructions are described below using a short
			
 
				 type string.  A type string specifies all the valid return
			
 
				 types an instruction can have, its arity, and the type of
			
 
				-its arguments in function of its return type.
			
 
				+its arguments depending on its return type.
			
 
				 
			
 
				 Type strings begin with acceptable return types, then
			
 
				 follows, in parentheses, the possible types for the arguments.
			
 
				-If the n-th return type of the type string is used for an
			
 
				-instruction, the arguments must use the n-th type listed for
			
 
				+If the N-th return type of the type string is used for an
			
 
				+instruction, the arguments must use the N-th type listed for
			
 
				 them in the type string.  When an instruction does not have a
			
 
				 return type, the type string only contains the types of the
			
 
				 arguments.
			
@@ -513,7 +512,7 @@ The following abbreviations are used.
 
				   * `T` stands for `wlsd`
			
 
				   * `I` stands for `wl`
			
 
				   * `F` stands for `sd`
			
 
				-  * `m` stands for the type of pointers on the target, on
			
 
				+  * `m` stands for the type of pointers on the target; on
			
 
				     64-bit architectures it is the same as `l`
			
 
				 
			
 
				 For example, consider the type string `wl(F)`, it mentions
			
@@ -540,7 +539,7 @@ towards zero.
 
				 The signed and unsigned remainder operations are available
			
 
				 as `rem` and `urem`.  The sign of the remainder is the same
			
 
				 as the one of the dividend.  Its magnitude is smaller than
			
 
				-the divisor's.  These two instructions and `udiv` are only
			
 
				+the divisor one.  These two instructions and `udiv` are only
			
 
				 available with integer arguments and result.
			
 
				 
			
 
				 Bitwise OR, AND, and XOR operations are available for both
			
@@ -548,8 +547,8 @@ integer types.  Logical operations of typical programming
 
				 languages can be implemented using <@ Comparisons > and
			
 
				 <@ Jumps >.
			
 
				 
			
 
				-Shift instructions `sar`, `shr`, and `shl` shift right or
			
 
				-left their first operand by the amount in the second
			
 
				+Shift instructions `sar`, `shr`, and `shl`, shift right or
			
 
				+left their first operand by the amount from the second
			
 
				 operand.  The shifting amount is taken modulo the size of
			
 
				 the result type.  Shifting right can either preserve the
			
 
				 sign of the value (using `sar`), or fill the newly freed
			
@@ -591,8 +590,8 @@ towards zero.
 
				       * `loadsb`, `loadub` -- `I(mm)`
			
 
				 
			
 
				     For types smaller than long, two variants of the load
			
 
				-    instruction is available: one will sign extend the value
			
 
				-    loaded, while the other will zero extend it.  Remark that
			
 
				+    instruction are available: one will sign extend the loaded
			
 
				+    value, while the other will zero extend it.  Note that
			
 
				     all loads smaller than long can load to either a long or
			
 
				     a word.
			
 
				 
			
@@ -635,9 +634,9 @@ instructions.  Pointers are stored in long temporaries.
 
				 ~~~~~~~~~~~~~
			
 
				 
			
 
				 Comparison instructions return an integer value (either a word
			
 
				-or a long), and compare values of arbitrary types.  The value
			
 
				-returned is 1 if the two operands satisfy the comparison
			
 
				-relation, and 0 otherwise.  The names of comparisons respect
			
 
				+or a long), and compare values of arbitrary types.  The returned
			
 
				+value is 1 if the two operands satisfy the comparison
			
 
				+relation, or 0 otherwise.  The names of comparisons respect
			
 
				 a standard naming scheme in three parts.
			
 
				 
			
 
				  1. All comparisons start with the letter `c`.
			
@@ -676,7 +675,7 @@ a standard naming scheme in three parts.
 
				 
			
 
				 For example, `cod` (`I(dd,dd)`) compares two double-precision
			
 
				 floating point numbers and returns 1 if the two floating points
			
 
				-are not NaNs, and 0 otherwise.  The `csltw` (`I(ww,ww)`)
			
 
				+are not NaNs, or 0 otherwise.  The `csltw` (`I(ww,ww)`)
			
 
				 instruction compares two words representing signed numbers and
			
 
				 returns 1 when the first argument is smaller than the second one.
			
 
				 
			
@@ -727,7 +726,7 @@ instruction to lower the precision of an integer temporary.
 
				 ~~~~~~~~~~~~~~~
			
 
				 
			
 
				 The `cast` and `copy` instructions return the bits of their
			
 
				-argument verbatim.  A `cast` will however change an integer
			
 
				+argument verbatim.  However a `cast` will change an integer
			
 
				 into a floating point of the same width and vice versa.
			
 
				 
			
 
				   * `cast` -- `wlsd(sdwl)`
			
@@ -755,7 +754,7 @@ single-precision floating point number `%f` into `%rs`.
 
				 
			
 
				     ABITY := BASETY | :IDENT
			
 
				 
			
 
				-The call instruction is special in many ways.  It is not
			
 
				+The call instruction is special in several ways.  It is not
			
 
				 a three-address instruction and requires the type of all
			
 
				 its arguments to be given.  Also, the return type can be
			
 
				 either a base type or an aggregate type.  These specifics
			
@@ -801,7 +800,7 @@ is essentially effectful: calling it twice in a row will
 
				 return two consecutive arguments from the argument list.
			
 
				 
			
 
				 Both instructions take a pointer to a variable argument
			
 
				-list as only argument.  The size and alignment of variable
			
 
				+list as sole argument.  The size and alignment of variable
			
 
				 argument lists depend on the target used.  However, it
			
 
				 is possible to conservatively use the maximum size and
			
 
				 alignment required by all the targets.
			
@@ -890,7 +889,7 @@ translate it in SSA form is to insert a phi instruction.
 
				 
			
 
				 Phi instructions return one of their arguments depending
			
 
				 on where the control came from.  In the example, `%y` is
			
 
				-set to 1 if the `@ift` branch is taken, and it is set to
			
 
				+set to 1 if the `@ift` branch is taken, or it is set to
			
 
				 2 otherwise.
			
 
				 
			
 
				 An important remark about phi instructions is that QBE