|
|
@@ -46,18 +46,18 @@ to focus on language design issues.
|
|
|
~ Input Files
|
|
|
~~~~~~~~~~~~~
|
|
|
|
|
|
-The intermediate language is provided to QBE as text files.
|
|
|
-Usually, one file is generated per each compilation unit of
|
|
|
+The intermediate language is provided to QBE as text.
|
|
|
+Usually, one file is generated per each compilation unit from
|
|
|
the frontend input language. An IL file is a sequence of
|
|
|
<@ Definitions > for data, functions, and types. Once
|
|
|
processed by QBE, the resulting file can be assembled and
|
|
|
linked using a standard toolchain (e.g., GNU binutils).
|
|
|
|
|
|
-Here is a complete "Hello World" IL file, it defines a
|
|
|
+Here is a complete "Hello World" IL file which defines a
|
|
|
function that prints to the screen. Since the string is
|
|
|
not a first class object (only the pointer is) it is
|
|
|
defined outside the function's body. Comments start with
|
|
|
-a # character and run until the end of the line.
|
|
|
+a # character and finish with the end of the line.
|
|
|
|
|
|
# Define the string constant.
|
|
|
data $str = { b "hello world", b 0 }
|
|
|
@@ -70,7 +70,7 @@ a # character and run until the end of the line.
|
|
|
}
|
|
|
|
|
|
If you have read the LLVM language reference, you might
|
|
|
-recognize the above example. In comparison, QBE makes a
|
|
|
+recognize the example above. In comparison, QBE makes a
|
|
|
much lighter use of types and the syntax is terser.
|
|
|
|
|
|
~ BNF Notation
|
|
|
@@ -86,7 +86,7 @@ are listed below.
|
|
|
* `( ... ),` designates a comma-separated list of the
|
|
|
enclosed syntax;
|
|
|
* `...*` and `...+` are used for arbitrary and
|
|
|
- at-least-once repetition.
|
|
|
+ at-least-once repetition respectively.
|
|
|
|
|
|
~ Sigils
|
|
|
~~~~~~~~
|
|
|
@@ -94,14 +94,14 @@ are listed below.
|
|
|
The intermediate language makes heavy use of sigils, all
|
|
|
user-defined names are prefixed with a sigil. This is
|
|
|
to avoid keyword conflicts, and also to quickly spot the
|
|
|
-scope and kind of an identifier.
|
|
|
+scope and nature of identifiers.
|
|
|
|
|
|
* `:` is for user-defined <@ Aggregate Types>
|
|
|
* `$` is for globals (represented by a pointer)
|
|
|
* `%` is for function-scope temporaries
|
|
|
* `@` is for block labels
|
|
|
|
|
|
-In BNF syntax, we use `?IDENT` to designate an identifier
|
|
|
+In this BNF syntax, we use `?IDENT` to designate an identifier
|
|
|
starting with the sigil `?`.
|
|
|
|
|
|
- 2. Types
|
|
|
@@ -114,7 +114,7 @@ starting with the sigil `?`.
|
|
|
BASETY := 'w' | 'l' | 's' | 'd' # Base types
|
|
|
EXTTY := BASETY | 'b' | 'h' # Extended types
|
|
|
|
|
|
-The IL makes very minimal use of types. By design, the types
|
|
|
+The IL makes minimal use of types. By design, the types
|
|
|
used are restricted to what is necessary for unambiguous
|
|
|
compilation to machine code and C interfacing. Unlike LLVM,
|
|
|
QBE is not using types as a means to safety; they are only
|
|
|
@@ -140,16 +140,16 @@ section.
|
|
|
~ Subtyping
|
|
|
~~~~~~~~~~~
|
|
|
|
|
|
-The IL has a minimal subtyping feature for integer types.
|
|
|
+The IL has a minimal subtyping feature, for integer types only.
|
|
|
Any value of type `l` can be used in a `w` context. In that
|
|
|
case, only the 32 least significant bits of the word value
|
|
|
are used.
|
|
|
|
|
|
-Make note that it is the inverse of the usual subtyping on
|
|
|
+Make note that it is the opposite of the usual subtyping on
|
|
|
integers (in C, we can safely use an `int` where a `long`
|
|
|
is expected). A long value cannot be used in word context.
|
|
|
The rationale is that a word can be signed or unsigned, so
|
|
|
-extending it to a long can be done in two ways, either
|
|
|
+extending it to a long could be done in two ways, either
|
|
|
by zero-extension, or by sign-extension.
|
|
|
|
|
|
- 3. Constants
|
|
|
@@ -184,9 +184,9 @@ operand of the subtraction is a word (32-bit) context.
|
|
|
|
|
|
Because specifying floating-point constants by their bits
|
|
|
makes the code less readable, syntactic sugar is provided
|
|
|
-to express them. Standard scientific notation is used with
|
|
|
-a prefix of `s_` for single and `d_` for double-precision
|
|
|
-numbers. Once again, the following example defines twice
|
|
|
+to express them. Standard scientific notation is prefixed
|
|
|
+with `s_` and `d_` for single and double precision numbers
|
|
|
+respectively. Once again, the following example defines twice
|
|
|
the same double-precision constant.
|
|
|
|
|
|
%x =d add d_0, d_-1
|
|
|
@@ -200,7 +200,7 @@ constants by the linker.
|
|
|
----------------
|
|
|
|
|
|
Definitions are the essential components of an IL file.
|
|
|
-They can define three types of objects: Aggregate types,
|
|
|
+They can define three types of objects: aggregate types,
|
|
|
data, and functions. Aggregate types are never exported
|
|
|
and do not compile to any code. Data and function
|
|
|
definitions have file scope and are mutually recursive
|
|
|
@@ -221,14 +221,14 @@ using the `export` keyword.
|
|
|
'type' :IDENT '=' 'align' NUMBER '{' NUMBER '}'
|
|
|
|
|
|
Aggregate type definitions start with the `type` keyword.
|
|
|
-They have file scope, but types must be defined before their
|
|
|
-first use. The inner structure of a type is expressed by a
|
|
|
+They have file scope, but types must be defined before being
|
|
|
+referenced. The inner structure of a type is expressed by a
|
|
|
comma-separated list of <@ Simple Types> enclosed in curly
|
|
|
braces.
|
|
|
|
|
|
type :fourfloats = { s, s, d, d }
|
|
|
|
|
|
-For ease of generation, a trailing comma is tolerated by
|
|
|
+For ease of IL generation, a trailing comma is tolerated by
|
|
|
the parser. In case many items of the same type are
|
|
|
sequenced (like in a C array), the shorter array syntax
|
|
|
can be used.
|
|
|
@@ -243,7 +243,7 @@ explicitly specified by the programmer.
|
|
|
|
|
|
Opaque types are used when the inner structure of an
|
|
|
aggregate cannot be specified; the alignment for opaque
|
|
|
-types is mandatory. They are defined by simply enclosing
|
|
|
+types is mandatory. They are defined simply by enclosing
|
|
|
their size between curly braces.
|
|
|
|
|
|
type :opaque = align 16 { 32 }
|
|
|
@@ -264,7 +264,7 @@ their size between curly braces.
|
|
|
| '"' ... '"' # String
|
|
|
| CONST # Constant
|
|
|
|
|
|
-Data definitions define objects that will be emitted in the
|
|
|
+Data definitions express objects that will be emitted in the
|
|
|
compiled file. They can be local to the file or exported
|
|
|
with global visibility to the whole program.
|
|
|
|
|
|
@@ -282,11 +282,11 @@ initialize multiple fields of the same size.
|
|
|
The members of a struct will be packed. This means that
|
|
|
padding has to be emitted by the frontend when necessary.
|
|
|
Alignment of the whole data objects can be manually specified,
|
|
|
-and when no alignment is provided, the maximum alignment of
|
|
|
+and when no alignment is provided, the maximum alignment from
|
|
|
the platform is used.
|
|
|
|
|
|
When the `z` letter is used the number following indicates
|
|
|
-the size of the field, the contents of the field are zero
|
|
|
+the size of the field; the contents of the field are zero
|
|
|
initialized. It can be used to add padding between fields
|
|
|
or zero-initialize big arrays.
|
|
|
|
|
|
@@ -325,19 +325,18 @@ Here are various examples of data definitions.
|
|
|
Function definitions contain the actual code to emit in
|
|
|
the compiled file. They define a global symbol that
|
|
|
contains a pointer to the function code. This pointer
|
|
|
-can be used in call instructions or stored in memory.
|
|
|
+can be used in `call` instructions or stored in memory.
|
|
|
|
|
|
The type given right before the function name is the
|
|
|
return type of the function. All return values of this
|
|
|
-function must have the return type. If the return
|
|
|
+function must have this return type. If the return
|
|
|
type is missing, the function cannot return any value.
|
|
|
|
|
|
The parameter list is a comma separated list of
|
|
|
temporary names prefixed by types. The types are used
|
|
|
to correctly implement C compatibility. When an argument
|
|
|
-has an aggregate type, is is set on entry of the
|
|
|
-function to a pointer to the aggregate passed by the
|
|
|
-caller. In the example below, we have to use a load
|
|
|
+has an aggregate type, a pointer to the aggregate is passed
|
|
|
+by the caller. In the example below, we have to use a load
|
|
|
instruction to get the value of the first (and only)
|
|
|
member of the struct.
|
|
|
|
|
|
@@ -350,7 +349,7 @@ member of the struct.
|
|
|
}
|
|
|
|
|
|
If the parameter list ends with `...`, the function is
|
|
|
-a variadic function: It can accept a variable number of
|
|
|
+a variadic function: it can accept a variable number of
|
|
|
arguments. To access the extra arguments provided by
|
|
|
the caller, use the `vastart` and `vaarg` instructions
|
|
|
described in the <@ Variadic > section.
|
|
|
@@ -375,10 +374,10 @@ very good compatibility with C. The <@ Call > section
|
|
|
explains how to pass an environment parameter.
|
|
|
|
|
|
Since global symbols are defined mutually recursive,
|
|
|
-there is no need for function declarations: A function
|
|
|
+there is no need for function declarations: a function
|
|
|
can be referenced before its definition.
|
|
|
Similarly, functions from other modules can be used
|
|
|
-without previous declarations. All the type information
|
|
|
+without previous declaration. All the type information
|
|
|
is provided in the call instructions.
|
|
|
|
|
|
The syntax and semantics for the body of functions
|
|
|
@@ -389,8 +388,8 @@ are described in the <@ Control > section.
|
|
|
|
|
|
The IL represents programs as textual transcriptions of
|
|
|
control flow graphs. The control flow is serialized as
|
|
|
-a sequence of blocks of straight-line code and connected
|
|
|
-using jump instructions.
|
|
|
+a sequence of blocks of straight-line code which are
|
|
|
+connected using jump instructions.
|
|
|
|
|
|
~ Blocks
|
|
|
~~~~~~~~
|
|
|
@@ -406,12 +405,12 @@ All blocks have a name that is specified by a label at
|
|
|
their beginning. Then follows a sequence of instructions
|
|
|
that have "fall-through" flow. Finally one jump terminates
|
|
|
the block. The jump can either transfer control to another
|
|
|
-block of the same function or return, they are described
|
|
|
+block of the same function or return; they are described
|
|
|
further below.
|
|
|
|
|
|
The first block in a function must not be the target of
|
|
|
-any jump in the program. If this need is encountered,
|
|
|
-the frontend can always insert an empty prelude block
|
|
|
+any jump in the program. If this is really needed,
|
|
|
+the frontend could insert an empty prelude block
|
|
|
at the beginning of the function.
|
|
|
|
|
|
When one block jumps to the next block in the IL file,
|
|
|
@@ -453,7 +452,7 @@ the following list.
|
|
|
|
|
|
When its word argument is non-zero, it jumps to its
|
|
|
first label argument; otherwise it jumps to the other
|
|
|
- label. The argument must be of word type, because of
|
|
|
+ label. The argument must be of word type; because of
|
|
|
subtyping a long argument can be passed, but only its
|
|
|
least significant 32 bits will be compared to 0.
|
|
|
|
|
|
@@ -461,7 +460,7 @@ the following list.
|
|
|
|
|
|
Terminates the execution of the current function,
|
|
|
optionally returning a value to the caller. The value
|
|
|
- returned must have the type given in the function
|
|
|
+ returned must be of the type given in the function
|
|
|
prototype. If the function prototype does not specify
|
|
|
a return type, no return value can be used.
|
|
|
|
|
|
@@ -498,12 +497,12 @@ This is made explicit by the instruction suffix.
|
|
|
The types of instructions are described below using a short
|
|
|
type string. A type string specifies all the valid return
|
|
|
types an instruction can have, its arity, and the type of
|
|
|
-its arguments in function of its return type.
|
|
|
+its arguments depending on its return type.
|
|
|
|
|
|
Type strings begin with acceptable return types, then
|
|
|
follows, in parentheses, the possible types for the arguments.
|
|
|
-If the n-th return type of the type string is used for an
|
|
|
-instruction, the arguments must use the n-th type listed for
|
|
|
+If the N-th return type of the type string is used for an
|
|
|
+instruction, the arguments must use the N-th type listed for
|
|
|
them in the type string. When an instruction does not have a
|
|
|
return type, the type string only contains the types of the
|
|
|
arguments.
|
|
|
@@ -513,7 +512,7 @@ The following abbreviations are used.
|
|
|
* `T` stands for `wlsd`
|
|
|
* `I` stands for `wl`
|
|
|
* `F` stands for `sd`
|
|
|
- * `m` stands for the type of pointers on the target, on
|
|
|
+ * `m` stands for the type of pointers on the target; on
|
|
|
64-bit architectures it is the same as `l`
|
|
|
|
|
|
For example, consider the type string `wl(F)`, it mentions
|
|
|
@@ -540,7 +539,7 @@ towards zero.
|
|
|
The signed and unsigned remainder operations are available
|
|
|
as `rem` and `urem`. The sign of the remainder is the same
|
|
|
as the one of the dividend. Its magnitude is smaller than
|
|
|
-the divisor's. These two instructions and `udiv` are only
|
|
|
+the divisor one. These two instructions and `udiv` are only
|
|
|
available with integer arguments and result.
|
|
|
|
|
|
Bitwise OR, AND, and XOR operations are available for both
|
|
|
@@ -548,8 +547,8 @@ integer types. Logical operations of typical programming
|
|
|
languages can be implemented using <@ Comparisons > and
|
|
|
<@ Jumps >.
|
|
|
|
|
|
-Shift instructions `sar`, `shr`, and `shl` shift right or
|
|
|
-left their first operand by the amount in the second
|
|
|
+Shift instructions `sar`, `shr`, and `shl`, shift right or
|
|
|
+left their first operand by the amount from the second
|
|
|
operand. The shifting amount is taken modulo the size of
|
|
|
the result type. Shifting right can either preserve the
|
|
|
sign of the value (using `sar`), or fill the newly freed
|
|
|
@@ -591,8 +590,8 @@ towards zero.
|
|
|
* `loadsb`, `loadub` -- `I(mm)`
|
|
|
|
|
|
For types smaller than long, two variants of the load
|
|
|
- instruction is available: one will sign extend the value
|
|
|
- loaded, while the other will zero extend it. Remark that
|
|
|
+ instruction are available: one will sign extend the loaded
|
|
|
+ value, while the other will zero extend it. Note that
|
|
|
all loads smaller than long can load to either a long or
|
|
|
a word.
|
|
|
|
|
|
@@ -635,9 +634,9 @@ instructions. Pointers are stored in long temporaries.
|
|
|
~~~~~~~~~~~~~
|
|
|
|
|
|
Comparison instructions return an integer value (either a word
|
|
|
-or a long), and compare values of arbitrary types. The value
|
|
|
-returned is 1 if the two operands satisfy the comparison
|
|
|
-relation, and 0 otherwise. The names of comparisons respect
|
|
|
+or a long), and compare values of arbitrary types. The returned
|
|
|
+value is 1 if the two operands satisfy the comparison
|
|
|
+relation, or 0 otherwise. The names of comparisons respect
|
|
|
a standard naming scheme in three parts.
|
|
|
|
|
|
1. All comparisons start with the letter `c`.
|
|
|
@@ -676,7 +675,7 @@ a standard naming scheme in three parts.
|
|
|
|
|
|
For example, `cod` (`I(dd,dd)`) compares two double-precision
|
|
|
floating point numbers and returns 1 if the two floating points
|
|
|
-are not NaNs, and 0 otherwise. The `csltw` (`I(ww,ww)`)
|
|
|
+are not NaNs, or 0 otherwise. The `csltw` (`I(ww,ww)`)
|
|
|
instruction compares two words representing signed numbers and
|
|
|
returns 1 when the first argument is smaller than the second one.
|
|
|
|
|
|
@@ -727,7 +726,7 @@ instruction to lower the precision of an integer temporary.
|
|
|
~~~~~~~~~~~~~~~
|
|
|
|
|
|
The `cast` and `copy` instructions return the bits of their
|
|
|
-argument verbatim. A `cast` will however change an integer
|
|
|
+argument verbatim. However a `cast` will change an integer
|
|
|
into a floating point of the same width and vice versa.
|
|
|
|
|
|
* `cast` -- `wlsd(sdwl)`
|
|
|
@@ -755,7 +754,7 @@ single-precision floating point number `%f` into `%rs`.
|
|
|
|
|
|
ABITY := BASETY | :IDENT
|
|
|
|
|
|
-The call instruction is special in many ways. It is not
|
|
|
+The call instruction is special in several ways. It is not
|
|
|
a three-address instruction and requires the type of all
|
|
|
its arguments to be given. Also, the return type can be
|
|
|
either a base type or an aggregate type. These specifics
|
|
|
@@ -801,7 +800,7 @@ is essentially effectful: calling it twice in a row will
|
|
|
return two consecutive arguments from the argument list.
|
|
|
|
|
|
Both instructions take a pointer to a variable argument
|
|
|
-list as only argument. The size and alignment of variable
|
|
|
+list as sole argument. The size and alignment of variable
|
|
|
argument lists depend on the target used. However, it
|
|
|
is possible to conservatively use the maximum size and
|
|
|
alignment required by all the targets.
|
|
|
@@ -890,7 +889,7 @@ translate it in SSA form is to insert a phi instruction.
|
|
|
|
|
|
Phi instructions return one of their arguments depending
|
|
|
on where the control came from. In the example, `%y` is
|
|
|
-set to 1 if the `@ift` branch is taken, and it is set to
|
|
|
+set to 1 if the `@ift` branch is taken, or it is set to
|
|
|
2 otherwise.
|
|
|
|
|
|
An important remark about phi instructions is that QBE
|