22 years ago · ce54b0aec1
--- a/docs/mini-porting.txt
+++ b/docs/mini-porting.txt
@@ -1,349 +1,424 @@
 
				-			Mono JIT porting guide.
			
 
				-		Paolo Molaro ([email protected])
			
 
				+		       Mono JIT porting guide.
			
 
				+		   Paolo Molaro ([email protected])
			
 
				 
			
 
				 * Introduction
			
 
				 
			
 
				-This documents describes the process of porting the mono JIT
			
 
				-to a new CPU architecture. The new mono JIT has been designed 
			
 
				-to make porting easier though at the same time enable the port
			
 
				-to take full advantage from the new architecture features and 
			
 
				-instructions. Knowledge of the mini architecture (described in the
			
 
				-mini-doc.txt file) is a requirement for understanding this guide,
			
 
				-as well as an earlier document about porting the mono interpreter
			
 
				-(available on the web site).
			
 
				-
			
 
				-There are six main areas that a port needs to implement to
			
 
				-have a fully-functional JIT for a given architecture:
			
 
				-
			
 
				-	1) instruction selection
			
 
				-	2) native code emission
			
 
				-	3) call conventions and register allocation
			
 
				-	4) method trampolines
			
 
				-	5) exception handling
			
 
				-	6) minor helper methods
			
 
				-
			
 
				-To take advantage of some not-so-common processor features (for example
			
 
				-conditional execution of instructions as may be found on ARM or ia64), it may
			
 
				-be needed to develop an high-level optimization, but doing so is not a 
			
 
				-requirement for getting the JIT to work.
			
 
				-
			
 
				-We'll see in more details each of the steps required, note, though,
			
 
				-that a new port may just as well start from a cut&paste of an existing
			
 
				-port to a similar architecture (for example from x86 to amd64, or from
			
 
				-powerpc to sparc).
			
 
				-The architecture specific code is split from the rest of the JIT,
			
 
				-for example the x86 specific code and data is all included in the 
			
 
				-following files in the distribution:
			
 
				-
			
 
				-	mini-x86.h mini-x86.c
			
 
				-	inssel-x86.brg
			
 
				-	cpu-pentium.md
			
 
				-	tramp-x86.c 
			
 
				-	exceptions-x86.c 
			
 
				-
			
 
				-I suggest a similar split for other architectures as well.
			
 
				-
			
 
				-Note that this document is still incomplete: some sections are only
			
 
				-sketched and some are missing, but the important info to get a port 
			
 
				-going is already described.
			
 
				+	This documents describes the process of porting the mono JIT
			
 
				+	to a new CPU architecture. The new mono JIT has been designed
			
 
				+	to make porting easier though at the same time enable the port
			
 
				+	to take full advantage from the new architecture features and
			
 
				+	instructions. Knowledge of the mini architecture (described in
			
 
				+	the mini-doc.txt file) is a requirement for understanding this
			
 
				+	guide, as well as an earlier document about porting the mono
			
 
				+	interpreter (available on the web site).
			
 
				+	
			
 
				+	There are six main areas that a port needs to implement to
			
 
				+	have a fully-functional JIT for a given architecture:
			
 
				+	
			
 
				+		1) instruction selection
			
 
				+		2) native code emission
			
 
				+		3) call conventions and register allocation
			
 
				+		4) method trampolines
			
 
				+		5) exception handling
			
 
				+		6) minor helper methods
			
 
				+	
			
 
				+	To take advantage of some not-so-common processor features
			
 
				+	(for example conditional execution of instructions as may be
			
 
				+	found on ARM or ia64), it may be needed to develop an
			
 
				+	high-level optimization, but doing so is not a requirement for
			
 
				+	getting the JIT to work.
			
 
				+	
			
 
				+	We'll see in more details each of the steps required, note,
			
 
				+	though, that a new port may just as well start from a
			
 
				+	cut&paste of an existing port to a similar architecture (for
			
 
				+	example from x86 to amd64, or from powerpc to sparc).
			
 
				+	
			
 
				+	The architecture specific code is split from the rest of the
			
 
				+	JIT, for example the x86 specific code and data is all
			
 
				+	included in the following files in the distribution:
			
 
				+	
			
 
				+		mini-x86.h mini-x86.c
			
 
				+		inssel-x86.brg
			
 
				+		cpu-pentium.md
			
 
				+		tramp-x86.c 
			
 
				+		exceptions-x86.c 
			
 
				+	
			
 
				+	I suggest a similar split for other architectures as well.
			
 
				+	
			
 
				+	Note that this document is still incomplete: some sections are
			
 
				+	only sketched and some are missing, but the important info to
			
 
				+	get a port going is already described.
			
 
				 
			
 
				 
			
 
				 * Architecture-specific instructions and instruction selection.
			
 
				 
			
 
				-The JIT already provides a set of instructions that can be easily
			
 
				-mapped to a great variety of different processor instructions.
			
 
				-Sometimes it may be necessary or advisable to add a new instruction
			
 
				-that represent more closely an instruction in the architecture.
			
 
				-Note that a mini instruction can be used to represent also a short
			
 
				-sequence of CPU low-level instructions, but note that each
			
 
				-instruction represents the minimum amount of code the instruction 
			
 
				-scheduler will handle (i.e., the scheduler won't schedule the instructions
			
 
				-that compose the low-level sequence as individual instructions, but just
			
 
				-the whole sequence, as an indivisible block).
			
 
				-New instructions are created by adding a line in the mini-ops.h file,
			
 
				-assigning an opcode and a name. To specify the input and output for 
			
 
				-the instruction, there are two different places, depending on the context 
			
 
				-in which the instruction gets used.
			
 
				-If the instruction is used in the tree representation, the input and output
			
 
				-types are defined by the BURG rules in the *.brg files (the usual 
			
 
				-non-terminals are 'reg' to represent a normal register, 'lreg' to 
			
 
				-represent a register or two that hold a 64 bit value, freg for a
			
 
				-floating point register).
			
 
				-If an instruction is used as a low-level CPU instruction, the info
			
 
				-is specified in a machine description file. The description file is
			
 
				-processed by the genmdesc program to provide a data structure that
			
 
				-can be easily used from C code to query the needed info about the 
			
 
				-instruction.
			
 
				-As an example, let's consider the add instruction for both x86 and ppc:
			
 
				-
			
 
				-x86 version:
			
 
				-	add: dest:i src1:i src2:i len:2 clob:1
			
 
				-ppc version:
			
 
				-	add: dest:i src1:i src2:i len:4
			
 
				-
			
 
				-Note that the instruction takes two input integer registers on both CPU,
			
 
				-but on x86 the first source register is clobbered (clob:1) and the length
			
 
				-in bytes of the instruction differs.
			
 
				-Note that integer adds and floating point adds use different opcodes, unlike
			
 
				-the IL language (64 bit add is done with two instructions on 32 bit architectures,
			
 
				-using a add that sets the carry and an add with carry).
			
 
				-A specific CPU port may assign any meaning to the clob field for an instruction
			
 
				-since the value will be processed in an arch-specific file anyway.
			
 
				-See the top of the existing cpu-pentium.md file for more info on other fields:
			
 
				-the info may or may not be applicable to a different CPU, in this latter case
			
 
				-the info can be ignored.
			
 
				-The code in mini.c together with the BURG rules in inssel.brg, inssel-float.brg
			
 
				-and inssel-long32.brg provides general purpose mappings from the tree representation 
			
 
				-to a set of instructions that should be easily implemented in any architecture.
			
 
				-To allow for additional arch-specific functionality, an arch-specific BURG file
			
 
				-can be used: in this file arch-specific instructions can be selected that provide
			
 
				-better performance than the general instructions or that provide functionality
			
 
				-that is needed by the JIT but that cannot be expressed in a general enough way.
			
 
				-As an example, x86 has the special instruction "push" to make it easier to
			
 
				-implement the default call convention (passing arguments on the stack): almost
			
 
				-all the other architectures don't have such an instruction (and don't need it anyway),
			
 
				-so we added a special rule in the inssel-x86.brg file for it.
			
 
				-
			
 
				-So, one of the first things needed in a port is to write a cpu-$(arch).md machine
			
 
				-description file and fill it with the needed info. As a start, only a few
			
 
				-instructions can be specified, like the ones required to do simple integer
			
 
				-operations. The default rules of the instruction selector will emit the common
			
 
				-instructions and so we're ready to go for the next step in porting the JIT.
			
 
				-
			
 
				+	The JIT already provides a set of instructions that can be
			
 
				+	easily mapped to a great variety of different processor
			
 
				+	instructions.  Sometimes it may be necessary or advisable to
			
 
				+	add a new instruction that represent more closely an
			
 
				+	instruction in the architecture.  Note that a mini instruction
			
 
				+	can be used to represent also a short sequence of CPU
			
 
				+	low-level instructions, but note that each instruction
			
 
				+	represents the minimum amount of code the instruction
			
 
				+	scheduler will handle (i.e., the scheduler won't schedule the
			
 
				+	instructions that compose the low-level sequence as individual
			
 
				+	instructions, but just the whole sequence, as an indivisible
			
 
				+	block).
			
 
				+
			
 
				+	New instructions are created by adding a line in the
			
 
				+	mini-ops.h file, assigning an opcode and a name. To specify
			
 
				+	the input and output for the instruction, there are two
			
 
				+	different places, depending on the context in which the
			
 
				+	instruction gets used.
			
 
				+
			
 
				+	If the instruction is used in the tree representation, the
			
 
				+	input and output types are defined by the BURG rules in the
			
 
				+	*.brg files (the usual non-terminals are 'reg' to represent a
			
 
				+	normal register, 'lreg' to represent a register or two that
			
 
				+	hold a 64 bit value, freg for a floating point register).
			
 
				+
			
 
				+	If an instruction is used as a low-level CPU instruction, the
			
 
				+	info is specified in a machine description file. The
			
 
				+	description file is processed by the genmdesc program to
			
 
				+	provide a data structure that can be easily used from C code
			
 
				+	to query the needed info about the instruction.
			
 
				+
			
 
				+	As an example, let's consider the add instruction for both x86
			
 
				+	and ppc:
			
 
				+	
			
 
				+	x86 version:
			
 
				+		add: dest:i src1:i src2:i len:2 clob:1
			
 
				+	ppc version:
			
 
				+		add: dest:i src1:i src2:i len:4
			
 
				+	
			
 
				+	Note that the instruction takes two input integer registers on
			
 
				+	both CPU, but on x86 the first source register is clobbered
			
 
				+	(clob:1) and the length in bytes of the instruction differs.
			
 
				+
			
 
				+	Note that integer adds and floating point adds use different
			
 
				+	opcodes, unlike the IL language (64 bit add is done with two
			
 
				+	instructions on 32 bit architectures, using a add that sets
			
 
				+	the carry and an add with carry).
			
 
				+
			
 
				+	A specific CPU port may assign any meaning to the clob field
			
 
				+	for an instruction since the value will be processed in an
			
 
				+	arch-specific file anyway.
			
 
				+
			
 
				+	See the top of the existing cpu-pentium.md file for more info
			
 
				+	on other fields: the info may or may not be applicable to a
			
 
				+	different CPU, in this latter case the info can be ignored.
			
 
				+
			
 
				+	The code in mini.c together with the BURG rules in inssel.brg,
			
 
				+	inssel-float.brg and inssel-long32.brg provides general
			
 
				+	purpose mappings from the tree representation to a set of
			
 
				+	instructions that should be easily implemented in any
			
 
				+	architecture.  To allow for additional arch-specific
			
 
				+	functionality, an arch-specific BURG file can be used: in this
			
 
				+	file arch-specific instructions can be selected that provide
			
 
				+	better performance than the general instructions or that
			
 
				+	provide functionality that is needed by the JIT but that
			
 
				+	cannot be expressed in a general enough way.
			
 
				+	
			
 
				+	As an example, x86 has the special instruction "push" to make
			
 
				+	it easier to implement the default call convention (passing
			
 
				+	arguments on the stack): almost all the other architectures
			
 
				+	don't have such an instruction (and don't need it anyway), so
			
 
				+	we added a special rule in the inssel-x86.brg file for it.
			
 
				+	
			
 
				+	So, one of the first things needed in a port is to write a
			
 
				+	cpu-$(arch).md machine description file and fill it with the
			
 
				+	needed info. As a start, only a few instructions can be
			
 
				+	specified, like the ones required to do simple integer
			
 
				+	operations. The default rules of the instruction selector will
			
 
				+	emit the common instructions and so we're ready to go for the
			
 
				+	next step in porting the JIT.
			
 
				+	
			
 
				 
			
 
				 *) Native code emission
			
 
				 
			
 
				-Since the first step in porting mono to a new CPU is to port the interpreter,
			
 
				-there should be already a file that allows the emission of binary native code
			
 
				-in a buffer for the architecture. This file should be placed in the 
			
 
				-	mono/arch/$(arch)/
			
 
				-directory.
			
 
				-
			
 
				-The bulk of the code emission happens in the mini-$(arch).c file, in a function
			
 
				-called mono_arch_output_basic_block (). This function takes a basic block, walks the
			
 
				-list of instructions in the block and emits the binary code for each.
			
 
				-Optionally a peephole optimization pass is done on the basic block, but this can be
			
 
				-left for later, when the port actually works.
			
 
				-This function is very simple, there is just a big switch on the instruction opcode
			
 
				-and in the corresponding case the functions or macros to emit the binary native code
			
 
				-are used. Note that in this function the lengths of the instructions are used to
			
 
				-determine if the buffer for the code needs enlarging.
			
 
				-
			
 
				-To complete the code emission for a method, a few other functions need
			
 
				-implementing as well:
			
 
				-
			
 
				-	mono_arch_emit_prolog ()
			
 
				-	mono_arch_emit_epilog ()
			
 
				-	mono_arch_patch_code ()
			
 
				-
			
 
				-mono_arch_emit_prolog () will emit the code to setup the stack frame for a method,
			
 
				-optionally call the callbacks used in profiling and tracing, and move the
			
 
				-arguments to their home location (in a caller-save register if the variable was 
			
 
				-allocated to one, or in a stack location if the argument was passed in a volatile 
			
 
				-register and wasn't allocated a non-volatile one). caller-save registers used by the
			
 
				-function are saved in the prolog as well.
			
 
				-
			
 
				-mono_arch_emit_epilog () will emit the code needed to return from the function,
			
 
				-optionally calling the profiling or tracing callbacks. At this point the basic blocks
			
 
				-or the code that was moved out of the normal flow for the function can be emitted 
			
 
				-as well (this is usually done to provide better info for the static branch predictor).
			
 
				-In the epilog, caller-save registers are restored if they were used.
			
 
				-Note that, to help exception handling and stack unwinding, when there is a transition
			
 
				-from managed to unmanaged code, some special processing needs to be done (basically,
			
 
				-saving all the registers and setting up the links in the Last Managed Frame
			
 
				-structure).
			
 
				-
			
 
				-When the epilog has been emitted, the upper level code arranges for the buffer of 
			
 
				-memory that contains the native code to be copied in an area of executable memory
			
 
				-and at this point, instructions that use relative addressing need to be patched
			
 
				-to have the right offsets: this work is done by mono_arch_patch_code ().
			
 
				+	Since the first step in porting mono to a new CPU is to port
			
 
				+	the interpreter, there should be already a file that allows
			
 
				+	the emission of binary native code in a buffer for the
			
 
				+	architecture. This file should be placed in the
			
 
				+
			
 
				+		mono/arch/$(arch)/
			
 
				+
			
 
				+	directory.
			
 
				+
			
 
				+	The bulk of the code emission happens in the mini-$(arch).c
			
 
				+	file, in a function called mono_arch_output_basic_block
			
 
				+	(). This function takes a basic block, walks the list of
			
 
				+	instructions in the block and emits the binary code for each.
			
 
				+	Optionally a peephole optimization pass is done on the basic
			
 
				+	block, but this can be left for later, when the port actually
			
 
				+	works.
			
 
				+
			
 
				+	This function is very simple, there is just a big switch on
			
 
				+	the instruction opcode and in the corresponding case the
			
 
				+	functions or macros to emit the binary native code are
			
 
				+	used. Note that in this function the lengths of the
			
 
				+	instructions are used to determine if the buffer for the code
			
 
				+	needs enlarging.
			
 
				+	
			
 
				+	To complete the code emission for a method, a few other
			
 
				+	functions need implementing as well:
			
 
				+	
			
 
				+		mono_arch_emit_prolog ()
			
 
				+		mono_arch_emit_epilog ()
			
 
				+		mono_arch_patch_code ()
			
 
				+	
			
 
				+	mono_arch_emit_prolog () will emit the code to setup the stack
			
 
				+	frame for a method, optionally call the callbacks used in
			
 
				+	profiling and tracing, and move the arguments to their home
			
 
				+	location (in a caller-save register if the variable was
			
 
				+	allocated to one, or in a stack location if the argument was
			
 
				+	passed in a volatile register and wasn't allocated a
			
 
				+	non-volatile one). caller-save registers used by the function
			
 
				+	are saved in the prolog as well.
			
 
				+	
			
 
				+	mono_arch_emit_epilog () will emit the code needed to return
			
 
				+	from the function, optionally calling the profiling or tracing
			
 
				+	callbacks. At this point the basic blocks or the code that was
			
 
				+	moved out of the normal flow for the function can be emitted
			
 
				+	as well (this is usually done to provide better info for the
			
 
				+	static branch predictor).  In the epilog, caller-save
			
 
				+	registers are restored if they were used.
			
 
				+
			
 
				+	Note that, to help exception handling and stack unwinding,
			
 
				+	when there is a transition from managed to unmanaged code,
			
 
				+	some special processing needs to be done (basically, saving
			
 
				+	all the registers and setting up the links in the Last Managed
			
 
				+	Frame structure).
			
 
				+	
			
 
				+	When the epilog has been emitted, the upper level code
			
 
				+	arranges for the buffer of memory that contains the native
			
 
				+	code to be copied in an area of executable memory and at this
			
 
				+	point, instructions that use relative addressing need to be
			
 
				+	patched to have the right offsets: this work is done by
			
 
				+	mono_arch_patch_code ().
			
 
				 
			
 
				 
			
 
				 * Call conventions and register allocation
			
 
				 
			
 
				-To account for the differences in the call conventions, a few functions need to
			
 
				-be implemented.
			
 
				-
			
 
				-mono_arch_allocate_vars () assigns to both arguments and local variables
			
 
				-the offset relative to the frame register where they are stored, dead
			
 
				-variables are simply discarded. The total amount of stack needed is calculated.
			
 
				-
			
 
				-mono_arch_call_opcode () is the function that more closely deals with the call
			
 
				-convention on a given system. For each argument to a function call, an instruction
			
 
				-is created that actually puts the argument where needed, be it the stack or a 
			
 
				-specific register. This function can also re-arrange th order of evaluation
			
 
				-when multiple arguments are involved if needed (like, on x86 arguments are pushed
			
 
				-on the stack in reverse order). The function needs to carefully take into accounts
			
 
				-platform specific issues, like how structures are returned as well as the
			
 
				-differences in size and/or alignment of managed and corresponding unmanaged 
			
 
				-structures.
			
 
				-
			
 
				-The other chunk of code that needs to deal with the call convention and other
			
 
				-specifics of a CPU, is the local register allocator, implemented in a function
			
 
				-named mono_arch_local_regalloc (). The local allocator deals with a basic block 
			
 
				-at a time and basically just allocates registers for temporary
			
 
				-values during expression evaluation, spilling and unspilling as necessary.
			
 
				-The local allocator needs to take into account clobbering information, both
			
 
				-during simple instructions and during function calls and it needs to deal
			
 
				-with other architecture-specific weirdnesses, like instructions that take
			
 
				-inputs only in specific registers or output only is some.
			
 
				-Some effort will be put later in moving most of the local register allocator to 
			
 
				-a common file so that the code can be shared more for similar, risc-like CPUs.
			
 
				-The register allocator does a first pass on the instructions in a block, collecting
			
 
				-liveness information and in a backward pass on the same list performs the
			
 
				-actual register allocation, inserting the instructions needed to spill values,
			
 
				-if necessary.
			
 
				-
			
 
				-When this part of code is implemented, some testing can be done with the generated 
			
 
				-code for the new architecture. Most helpful is the use of the --regression
			
 
				-command line switch to run the regression tests (basic.cs, for example).
			
 
				-Note that the JIT will try to initialize the runtime, but it may not be able yet to
			
 
				-compile and execute complex code: commenting most of the code in the mini_init()
			
 
				-function in mini.c is needed to let the JIT just compile the regression tests.
			
 
				-Also, using multiple -v switches on the command line makes the JIT dump an 
			
 
				-increasing amount of information during compilation.
			
 
				-
			
 
				-
			
 
				+	To account for the differences in the call conventions, a few functions need to
			
 
				+	be implemented.
			
 
				+	
			
 
				+	mono_arch_allocate_vars () assigns to both arguments and local
			
 
				+	variables the offset relative to the frame register where they
			
 
				+	are stored, dead variables are simply discarded. The total
			
 
				+	amount of stack needed is calculated.
			
 
				+	
			
 
				+	mono_arch_call_opcode () is the function that more closely
			
 
				+	deals with the call convention on a given system. For each
			
 
				+	argument to a function call, an instruction is created that
			
 
				+	actually puts the argument where needed, be it the stack or a
			
 
				+	specific register. This function can also re-arrange th order
			
 
				+	of evaluation when multiple arguments are involved if needed
			
 
				+	(like, on x86 arguments are pushed on the stack in reverse
			
 
				+	order). The function needs to carefully take into accounts
			
 
				+	platform specific issues, like how structures are returned as
			
 
				+	well as the differences in size and/or alignment of managed
			
 
				+	and corresponding unmanaged structures.
			
 
				+	
			
 
				+	The other chunk of code that needs to deal with the call
			
 
				+	convention and other specifics of a CPU, is the local register
			
 
				+	allocator, implemented in a function named
			
 
				+	mono_arch_local_regalloc (). The local allocator deals with a
			
 
				+	basic block at a time and basically just allocates registers
			
 
				+	for temporary values during expression evaluation, spilling
			
 
				+	and unspilling as necessary.
			
 
				+
			
 
				+	The local allocator needs to take into account clobbering
			
 
				+	information, both during simple instructions and during
			
 
				+	function calls and it needs to deal with other
			
 
				+	architecture-specific weirdnesses, like instructions that take
			
 
				+	inputs only in specific registers or output only is some.
			
 
				+
			
 
				+	Some effort will be put later in moving most of the local
			
 
				+	register allocator to a common file so that the code can be
			
 
				+	shared more for similar, risc-like CPUs.  The register
			
 
				+	allocator does a first pass on the instructions in a block,
			
 
				+	collecting liveness information and in a backward pass on the
			
 
				+	same list performs the actual register allocation, inserting
			
 
				+	the instructions needed to spill values, if necessary.
			
 
				+	
			
 
				+	When this part of code is implemented, some testing can be
			
 
				+	done with the generated code for the new architecture. Most
			
 
				+	helpful is the use of the --regression command line switch to
			
 
				+	run the regression tests (basic.cs, for example).
			
 
				+
			
 
				+	Note that the JIT will try to initialize the runtime, but it
			
 
				+	may not be able yet to compile and execute complex code:
			
 
				+	commenting most of the code in the mini_init() function in
			
 
				+	mini.c is needed to let the JIT just compile the regression
			
 
				+	tests.  Also, using multiple -v switches on the command line
			
 
				+	makes the JIT dump an increasing amount of information during
			
 
				+	compilation.
			
 
				+	
			
 
				+	
			
 
				 * Method trampolines
			
 
				 
			
 
				-To get better startup performance, the JIT actually compiles a method only when
			
 
				-needed. To achieve this, when a call to a method is compiled, we actually emit a 
			
 
				-call to a magic trampoline. The magic trampoline is a function written in assembly
			
 
				-that invokes the compiler to compile the given method and jumps to the newly compiled
			
 
				-code, ensuring the arguments it received are passed correctly to the actual method.
			
 
				-Before jumping to the new code, though, the magic trampoline takes care of patching
			
 
				-the call site so that next time the call will go directly to the method instead of the
			
 
				-trampoline. How does this all work?
			
 
				-mono_arch_create_jit_trampoline () creates a small function that just
			
 
				-preserves the arguments passed to it and adds an additional argument (the method
			
 
				-to compile) before calling the generic trampoline. This small function is called 
			
 
				-the specific trampoline, because it is method-specific (the method to compile
			
 
				-is hard-code in the instruction stream).
			
 
				-The generic trampoline saves all the arguments that could get clobbered
			
 
				-and calls a C function that will do two things: 
			
 
				-
			
 
				-*) actually call the JIT to compile the method
			
 
				-*) identify the calling code so that it can be patched to call directly
			
 
				-the actual method
			
 
				-
			
 
				-If the 'this' argument to a method is a boxed valuetype that is passed to
			
 
				-a method that expects just a pointer to the data, an additional unboxing
			
 
				-trampoline will need to be inserted as well.
			
 
				-
			
 
				+	To get better startup performance, the JIT actually compiles a
			
 
				+	method only when needed. To achieve this, when a call to a
			
 
				+	method is compiled, we actually emit a call to a magic
			
 
				+	trampoline. The magic trampoline is a function written in
			
 
				+	assembly that invokes the compiler to compile the given method
			
 
				+	and jumps to the newly compiled code, ensuring the arguments
			
 
				+	it received are passed correctly to the actual method.
			
 
				+
			
 
				+	Before jumping to the new code, though, the magic trampoline
			
 
				+	takes care of patching the call site so that next time the
			
 
				+	call will go directly to the method instead of the
			
 
				+	trampoline. How does this all work?
			
 
				+
			
 
				+	mono_arch_create_jit_trampoline () creates a small function
			
 
				+	that just preserves the arguments passed to it and adds an
			
 
				+	additional argument (the method to compile) before calling the
			
 
				+	generic trampoline. This small function is called the specific
			
 
				+	trampoline, because it is method-specific (the method to
			
 
				+	compile is hard-code in the instruction stream).
			
 
				+
			
 
				+	The generic trampoline saves all the arguments that could get
			
 
				+	clobbered and calls a C function that will do two things:
			
 
				+	
			
 
				+	*) actually call the JIT to compile the method
			
 
				+	*) identify the calling code so that it can be patched to call directly
			
 
				+	the actual method
			
 
				+	
			
 
				+	If the 'this' argument to a method is a boxed valuetype that
			
 
				+	is passed to a method that expects just a pointer to the data,
			
 
				+	an additional unboxing trampoline will need to be inserted as
			
 
				+	well.
			
 
				+	
			
 
				 
			
 
				 * Exception handling
			
 
				 
			
 
				-Exception handling is likely the most difficult part of the port, as it needs
			
 
				-to deal with unwinding (both managed and unmanaged code) and calling
			
 
				-catch and filter blocks. It also needs to deal with signals, because mono
			
 
				-takes advantage of the MMU in the CPU and of the operation system to
			
 
				-handle dereferences of the NULL pointer. Some of the function needed
			
 
				-to implement the mechanisms are:
			
 
				-
			
 
				-mono_arch_get_throw_exception () returns a function that takes an exception object
			
 
				-and invokes an arch-specific function that will enter the exception processing.
			
 
				-To do so, all the relevant registers need to be saved and passed on.
			
 
				-
			
 
				-mono_arch_handle_exception () this function takes the exception thrown and
			
 
				-a context that describes the state of the CPU at the time the exception was 
			
 
				-thrown. The function needs to implement the exception handling mechanism,
			
 
				-so it makes a search for an handler for the exception and if none is found,
			
 
				-it follows the unhandled exception path (that can print a trace and exit or
			
 
				-just abort the current thread). The difficulty here is to unwind the stack 
			
 
				-correctly, by restoring the register state at each call site in the call chain,
			
 
				-calling finally, filters and handler blocks while doing so.
			
 
				-
			
 
				-As part of exception handling a couple of internal calls need to be implemented
			
 
				-as well.
			
 
				-ves_icall_get_frame_info () returns info about a specific frame.
			
 
				-mono_jit_walk_stack () walks the stack and calls a callback with info for
			
 
				-each frame found.
			
 
				-ves_icall_get_trace () return an array of StackFrame objects.
			
 
				-
			
 
				+	Exception handling is likely the most difficult part of the
			
 
				+	port, as it needs to deal with unwinding (both managed and
			
 
				+	unmanaged code) and calling catch and filter blocks. It also
			
 
				+	needs to deal with signals, because mono takes advantage of
			
 
				+	the MMU in the CPU and of the operation system to handle
			
 
				+	dereferences of the NULL pointer. Some of the function needed
			
 
				+	to implement the mechanisms are:
			
 
				+	
			
 
				+	mono_arch_get_throw_exception () returns a function that takes
			
 
				+	an exception object and invokes an arch-specific function that
			
 
				+	will enter the exception processing.  To do so, all the
			
 
				+	relevant registers need to be saved and passed on.
			
 
				+	
			
 
				+	mono_arch_handle_exception () this function takes the
			
 
				+	exception thrown and a context that describes the state of the
			
 
				+	CPU at the time the exception was thrown. The function needs
			
 
				+	to implement the exception handling mechanism, so it makes a
			
 
				+	search for an handler for the exception and if none is found,
			
 
				+	it follows the unhandled exception path (that can print a
			
 
				+	trace and exit or just abort the current thread). The
			
 
				+	difficulty here is to unwind the stack correctly, by restoring
			
 
				+	the register state at each call site in the call chain,
			
 
				+	calling finally, filters and handler blocks while doing so.
			
 
				+	
			
 
				+	As part of exception handling a couple of internal calls need
			
 
				+	to be implemented as well.
			
 
				+
			
 
				+	ves_icall_get_frame_info () returns info about a specific
			
 
				+	frame.
			
 
				+
			
 
				+	mono_jit_walk_stack () walks the stack and calls a callback with info for
			
 
				+	each frame found.
			
 
				+
			
 
				+	ves_icall_get_trace () return an array of StackFrame objects.
			
 
				+	
			
 
				 ** Code generation for filter/finally handlers
			
 
				 
			
 
				-Filter and finally handlers are called from 2 different locations:
			
 
				-
			
 
				-       1.) from within the method containing the exception clauses
			
 
				-       2.) from the stack unwinding code
			
 
				-
			
 
				-To make this possible we implement them like subroutines, ending with a
			
 
				-"return" statement. The subroutine does not save the base pointer, because we
			
 
				-need access to the local variables of the enclosing method. Its is possible
			
 
				-that instructions inside those handlers modify the stack pointer, thus we save
			
 
				-the stack pointer at the start of the handler, and restore it at the end. We
			
 
				-have to use a "call" instruction to execute such finally handlers.
			
 
				-
			
 
				-The MIR code for filter and finally handlers looks like:
			
 
				-
			
 
				-    OP_START_HANDLER
			
 
				-    ...
			
 
				-    OP_END_FINALLY | OP_ENDFILTER(reg)
			
 
				-
			
 
				-OP_START_HANDLER: should save the stack pointer somewhere
			
 
				-OP_END_FINALLY: restores the stack pointers and returns.
			
 
				-OP_ENDFILTER (reg): restores the stack pointers and returns the value in "reg".
			
 
				-
			
 
				+	Filter and finally handlers are called from 2 different locations:
			
 
				+	
			
 
				+	       1.) from within the method containing the exception clauses
			
 
				+	       2.) from the stack unwinding code
			
 
				+	
			
 
				+	To make this possible we implement them like subroutines,
			
 
				+	ending with a "return" statement. The subroutine does not save
			
 
				+	the base pointer, because we need access to the local
			
 
				+	variables of the enclosing method. Its is possible that
			
 
				+	instructions inside those handlers modify the stack pointer,
			
 
				+	thus we save the stack pointer at the start of the handler,
			
 
				+	and restore it at the end. We have to use a "call" instruction
			
 
				+	to execute such finally handlers.
			
 
				+	
			
 
				+	The MIR code for filter and finally handlers looks like:
			
 
				+	
			
 
				+	    OP_START_HANDLER
			
 
				+	    ...
			
 
				+	    OP_END_FINALLY | OP_ENDFILTER(reg)
			
 
				+	
			
 
				+	OP_START_HANDLER: should save the stack pointer somewhere
			
 
				+	OP_END_FINALLY: restores the stack pointers and returns.
			
 
				+	OP_ENDFILTER (reg): restores the stack pointers and returns the value in "reg".
			
 
				+	
			
 
				 ** Calling finally/filter handlers 
			
 
				 
			
 
				-There is a special opcode to call those handler, its called OP_CALL_HANDLER. It
			
 
				-simple emits a call instruction.
			
 
				-
			
 
				-Its a bit more complex to call handler from outside (in the stack unwinding
			
 
				-code), because we have to restore the whole context of the method first. After that
			
 
				-we simply emit a call instruction to invoke the handler. Its usually
			
 
				-possible to use the same code to call filter and finally handlers (see
			
 
				-arch_get_call_filter).
			
 
				-
			
 
				+	There is a special opcode to call those handler, its called
			
 
				+	OP_CALL_HANDLER. It simple emits a call instruction.
			
 
				+	
			
 
				+	Its a bit more complex to call handler from outside (in the
			
 
				+	stack unwinding code), because we have to restore the whole
			
 
				+	context of the method first. After that we simply emit a call
			
 
				+	instruction to invoke the handler. Its usually possible to use
			
 
				+	the same code to call filter and finally handlers (see
			
 
				+	arch_get_call_filter).
			
 
				+	
			
 
				 ** Calling catch handlers
			
 
				 
			
 
				-Catch handlers are always called from the stack unwinding code. Unlike finally clauses
			
 
				-or filters, catch handler never return. Instead we simply restore the whole
			
 
				-context, and restart execution at the catch handler.
			
 
				-
			
 
				+	Catch handlers are always called from the stack unwinding
			
 
				+	code. Unlike finally clauses or filters, catch handler never
			
 
				+	return. Instead we simply restore the whole context, and
			
 
				+	restart execution at the catch handler.
			
 
				+	
			
 
				 ** Passing Exception objects to catch handlers and filters.
			
 
				 
			
 
				-We use a local variable to store exception objects. The stack unwinding code
			
 
				-must store the exception object into this variable before calling catch handler
			
 
				-or filter.
			
 
				-
			
 
				+	We use a local variable to store exception objects. The stack
			
 
				+	unwinding code must store the exception object into this
			
 
				+	variable before calling catch handler or filter.
			
 
				+	
			
 
				 * Minor helper methods
			
 
				 
			
 
				-A few minor helper methods are referenced from the arch-independent code.
			
 
				-Some of them are:
			
 
				-
			
 
				-*) mono_arch_cpu_optimizations ()
			
 
				-	This function returns a mask of optimizations that should be enabled for the
			
 
				-	current CPU and a mask of optimizations that should be excluded, instead.
			
 
				-
			
 
				-*) mono_arch_regname ()
			
 
				-	Returns the name for a numeric register.
			
 
				-
			
 
				-*) mono_arch_get_allocatable_int_vars ()
			
 
				-	Returns a list of variables that can be allocated to the integer registers
			
 
				-	in the current architecture.
			
 
				-
			
 
				-*) mono_arch_get_global_int_regs ()
			
 
				-	Returns a list of caller-save registers that can be used to allocate variables 
			
 
				-	in the current method.
			
 
				-
			
 
				-*) mono_arch_instrument_mem_needs ()
			
 
				-*) mono_arch_instrument_prolog ()
			
 
				-*) mono_arch_instrument_epilog ()
			
 
				-	Functions needed to implement the profiling interface.
			
 
				-
			
 
				-
			
 
				+	A few minor helper methods are referenced from the arch-independent code.
			
 
				+	Some of them are:
			
 
				+	
			
 
				+	*) mono_arch_cpu_optimizations ()
			
 
				+		This function returns a mask of optimizations that
			
 
				+		should be enabled for the current CPU and a mask of
			
 
				+		optimizations that should be excluded, instead.
			
 
				+	
			
 
				+	*) mono_arch_regname ()
			
 
				+		Returns the name for a numeric register.
			
 
				+	
			
 
				+	*) mono_arch_get_allocatable_int_vars ()
			
 
				+		Returns a list of variables that can be allocated to
			
 
				+		the integer registers in the current architecture.
			
 
				+	
			
 
				+	*) mono_arch_get_global_int_regs ()
			
 
				+		Returns a list of caller-save registers that can be
			
 
				+		used to allocate variables in the current method.
			
 
				+	
			
 
				+	*) mono_arch_instrument_mem_needs ()
			
 
				+	*) mono_arch_instrument_prolog ()
			
 
				+	*) mono_arch_instrument_epilog ()
			
 
				+		Functions needed to implement the profiling interface.
			
 
				+	
			
 
				+	
			
 
				 * Writing regression tests
			
 
				 
			
 
				-Regression tests for the JIT should be written for any bug found in the JIT
			
 
				-in one of the *.cs files in the mini directory. Eventually all the operations
			
 
				-of the JIT should be tested (including the ones that get selected only when 
			
 
				-some specific optimization is enabled).
			
 
				-
			
 
				+	Regression tests for the JIT should be written for any bug
			
 
				+	found in the JIT in one of the *.cs files in the mini
			
 
				+	directory. Eventually all the operations of the JIT should be
			
 
				+	tested (including the ones that get selected only when some
			
 
				+	specific optimization is enabled).
			
 
				+	
			
 
				 
			
 
				 * Platform specific optimizations
			
 
				 
			
 
				-An example of a platform-specific optimization is the peephole optimization:
			
 
				-we look at a small window of code at a time and we replace one or more 
			
 
				-instructions with others that perform better for the given architecture or CPU.
			
 
				-
			
 
				+	An example of a platform-specific optimization is the peephole
			
 
				+	optimization: we look at a small window of code at a time and
			
 
				+	we replace one or more instructions with others that perform
			
 
				+	better for the given architecture or CPU.
			
 
				+	
			
--- a/mono/mini/mini-porting.txt
+++ b/mono/mini/mini-porting.txt
@@ -1,349 +1,424 @@
 
				-			Mono JIT porting guide.
			
 
				-		Paolo Molaro ([email protected])
			
 
				+		       Mono JIT porting guide.
			
 
				+		   Paolo Molaro ([email protected])
			
 
				 
			
 
				 * Introduction
			
 
				 
			
 
				-This documents describes the process of porting the mono JIT
			
 
				-to a new CPU architecture. The new mono JIT has been designed 
			
 
				-to make porting easier though at the same time enable the port
			
 
				-to take full advantage from the new architecture features and 
			
 
				-instructions. Knowledge of the mini architecture (described in the
			
 
				-mini-doc.txt file) is a requirement for understanding this guide,
			
 
				-as well as an earlier document about porting the mono interpreter
			
 
				-(available on the web site).
			
 
				-
			
 
				-There are six main areas that a port needs to implement to
			
 
				-have a fully-functional JIT for a given architecture:
			
 
				-
			
 
				-	1) instruction selection
			
 
				-	2) native code emission
			
 
				-	3) call conventions and register allocation
			
 
				-	4) method trampolines
			
 
				-	5) exception handling
			
 
				-	6) minor helper methods
			
 
				-
			
 
				-To take advantage of some not-so-common processor features (for example
			
 
				-conditional execution of instructions as may be found on ARM or ia64), it may
			
 
				-be needed to develop an high-level optimization, but doing so is not a 
			
 
				-requirement for getting the JIT to work.
			
 
				-
			
 
				-We'll see in more details each of the steps required, note, though,
			
 
				-that a new port may just as well start from a cut&paste of an existing
			
 
				-port to a similar architecture (for example from x86 to amd64, or from
			
 
				-powerpc to sparc).
			
 
				-The architecture specific code is split from the rest of the JIT,
			
 
				-for example the x86 specific code and data is all included in the 
			
 
				-following files in the distribution:
			
 
				-
			
 
				-	mini-x86.h mini-x86.c
			
 
				-	inssel-x86.brg
			
 
				-	cpu-pentium.md
			
 
				-	tramp-x86.c 
			
 
				-	exceptions-x86.c 
			
 
				-
			
 
				-I suggest a similar split for other architectures as well.
			
 
				-
			
 
				-Note that this document is still incomplete: some sections are only
			
 
				-sketched and some are missing, but the important info to get a port 
			
 
				-going is already described.
			
 
				+	This documents describes the process of porting the mono JIT
			
 
				+	to a new CPU architecture. The new mono JIT has been designed
			
 
				+	to make porting easier though at the same time enable the port
			
 
				+	to take full advantage from the new architecture features and
			
 
				+	instructions. Knowledge of the mini architecture (described in
			
 
				+	the mini-doc.txt file) is a requirement for understanding this
			
 
				+	guide, as well as an earlier document about porting the mono
			
 
				+	interpreter (available on the web site).
			
 
				+	
			
 
				+	There are six main areas that a port needs to implement to
			
 
				+	have a fully-functional JIT for a given architecture:
			
 
				+	
			
 
				+		1) instruction selection
			
 
				+		2) native code emission
			
 
				+		3) call conventions and register allocation
			
 
				+		4) method trampolines
			
 
				+		5) exception handling
			
 
				+		6) minor helper methods
			
 
				+	
			
 
				+	To take advantage of some not-so-common processor features
			
 
				+	(for example conditional execution of instructions as may be
			
 
				+	found on ARM or ia64), it may be needed to develop an
			
 
				+	high-level optimization, but doing so is not a requirement for
			
 
				+	getting the JIT to work.
			
 
				+	
			
 
				+	We'll see in more details each of the steps required, note,
			
 
				+	though, that a new port may just as well start from a
			
 
				+	cut&paste of an existing port to a similar architecture (for
			
 
				+	example from x86 to amd64, or from powerpc to sparc).
			
 
				+	
			
 
				+	The architecture specific code is split from the rest of the
			
 
				+	JIT, for example the x86 specific code and data is all
			
 
				+	included in the following files in the distribution:
			
 
				+	
			
 
				+		mini-x86.h mini-x86.c
			
 
				+		inssel-x86.brg
			
 
				+		cpu-pentium.md
			
 
				+		tramp-x86.c 
			
 
				+		exceptions-x86.c 
			
 
				+	
			
 
				+	I suggest a similar split for other architectures as well.
			
 
				+	
			
 
				+	Note that this document is still incomplete: some sections are
			
 
				+	only sketched and some are missing, but the important info to
			
 
				+	get a port going is already described.
			
 
				 
			
 
				 
			
 
				 * Architecture-specific instructions and instruction selection.
			
 
				 
			
 
				-The JIT already provides a set of instructions that can be easily
			
 
				-mapped to a great variety of different processor instructions.
			
 
				-Sometimes it may be necessary or advisable to add a new instruction
			
 
				-that represent more closely an instruction in the architecture.
			
 
				-Note that a mini instruction can be used to represent also a short
			
 
				-sequence of CPU low-level instructions, but note that each
			
 
				-instruction represents the minimum amount of code the instruction 
			
 
				-scheduler will handle (i.e., the scheduler won't schedule the instructions
			
 
				-that compose the low-level sequence as individual instructions, but just
			
 
				-the whole sequence, as an indivisible block).
			
 
				-New instructions are created by adding a line in the mini-ops.h file,
			
 
				-assigning an opcode and a name. To specify the input and output for 
			
 
				-the instruction, there are two different places, depending on the context 
			
 
				-in which the instruction gets used.
			
 
				-If the instruction is used in the tree representation, the input and output
			
 
				-types are defined by the BURG rules in the *.brg files (the usual 
			
 
				-non-terminals are 'reg' to represent a normal register, 'lreg' to 
			
 
				-represent a register or two that hold a 64 bit value, freg for a
			
 
				-floating point register).
			
 
				-If an instruction is used as a low-level CPU instruction, the info
			
 
				-is specified in a machine description file. The description file is
			
 
				-processed by the genmdesc program to provide a data structure that
			
 
				-can be easily used from C code to query the needed info about the 
			
 
				-instruction.
			
 
				-As an example, let's consider the add instruction for both x86 and ppc:
			
 
				-
			
 
				-x86 version:
			
 
				-	add: dest:i src1:i src2:i len:2 clob:1
			
 
				-ppc version:
			
 
				-	add: dest:i src1:i src2:i len:4
			
 
				-
			
 
				-Note that the instruction takes two input integer registers on both CPU,
			
 
				-but on x86 the first source register is clobbered (clob:1) and the length
			
 
				-in bytes of the instruction differs.
			
 
				-Note that integer adds and floating point adds use different opcodes, unlike
			
 
				-the IL language (64 bit add is done with two instructions on 32 bit architectures,
			
 
				-using a add that sets the carry and an add with carry).
			
 
				-A specific CPU port may assign any meaning to the clob field for an instruction
			
 
				-since the value will be processed in an arch-specific file anyway.
			
 
				-See the top of the existing cpu-pentium.md file for more info on other fields:
			
 
				-the info may or may not be applicable to a different CPU, in this latter case
			
 
				-the info can be ignored.
			
 
				-The code in mini.c together with the BURG rules in inssel.brg, inssel-float.brg
			
 
				-and inssel-long32.brg provides general purpose mappings from the tree representation 
			
 
				-to a set of instructions that should be easily implemented in any architecture.
			
 
				-To allow for additional arch-specific functionality, an arch-specific BURG file
			
 
				-can be used: in this file arch-specific instructions can be selected that provide
			
 
				-better performance than the general instructions or that provide functionality
			
 
				-that is needed by the JIT but that cannot be expressed in a general enough way.
			
 
				-As an example, x86 has the special instruction "push" to make it easier to
			
 
				-implement the default call convention (passing arguments on the stack): almost
			
 
				-all the other architectures don't have such an instruction (and don't need it anyway),
			
 
				-so we added a special rule in the inssel-x86.brg file for it.
			
 
				-
			
 
				-So, one of the first things needed in a port is to write a cpu-$(arch).md machine
			
 
				-description file and fill it with the needed info. As a start, only a few
			
 
				-instructions can be specified, like the ones required to do simple integer
			
 
				-operations. The default rules of the instruction selector will emit the common
			
 
				-instructions and so we're ready to go for the next step in porting the JIT.
			
 
				-
			
 
				+	The JIT already provides a set of instructions that can be
			
 
				+	easily mapped to a great variety of different processor
			
 
				+	instructions.  Sometimes it may be necessary or advisable to
			
 
				+	add a new instruction that represent more closely an
			
 
				+	instruction in the architecture.  Note that a mini instruction
			
 
				+	can be used to represent also a short sequence of CPU
			
 
				+	low-level instructions, but note that each instruction
			
 
				+	represents the minimum amount of code the instruction
			
 
				+	scheduler will handle (i.e., the scheduler won't schedule the
			
 
				+	instructions that compose the low-level sequence as individual
			
 
				+	instructions, but just the whole sequence, as an indivisible
			
 
				+	block).
			
 
				+
			
 
				+	New instructions are created by adding a line in the
			
 
				+	mini-ops.h file, assigning an opcode and a name. To specify
			
 
				+	the input and output for the instruction, there are two
			
 
				+	different places, depending on the context in which the
			
 
				+	instruction gets used.
			
 
				+
			
 
				+	If the instruction is used in the tree representation, the
			
 
				+	input and output types are defined by the BURG rules in the
			
 
				+	*.brg files (the usual non-terminals are 'reg' to represent a
			
 
				+	normal register, 'lreg' to represent a register or two that
			
 
				+	hold a 64 bit value, freg for a floating point register).
			
 
				+
			
 
				+	If an instruction is used as a low-level CPU instruction, the
			
 
				+	info is specified in a machine description file. The
			
 
				+	description file is processed by the genmdesc program to
			
 
				+	provide a data structure that can be easily used from C code
			
 
				+	to query the needed info about the instruction.
			
 
				+
			
 
				+	As an example, let's consider the add instruction for both x86
			
 
				+	and ppc:
			
 
				+	
			
 
				+	x86 version:
			
 
				+		add: dest:i src1:i src2:i len:2 clob:1
			
 
				+	ppc version:
			
 
				+		add: dest:i src1:i src2:i len:4
			
 
				+	
			
 
				+	Note that the instruction takes two input integer registers on
			
 
				+	both CPU, but on x86 the first source register is clobbered
			
 
				+	(clob:1) and the length in bytes of the instruction differs.
			
 
				+
			
 
				+	Note that integer adds and floating point adds use different
			
 
				+	opcodes, unlike the IL language (64 bit add is done with two
			
 
				+	instructions on 32 bit architectures, using a add that sets
			
 
				+	the carry and an add with carry).
			
 
				+
			
 
				+	A specific CPU port may assign any meaning to the clob field
			
 
				+	for an instruction since the value will be processed in an
			
 
				+	arch-specific file anyway.
			
 
				+
			
 
				+	See the top of the existing cpu-pentium.md file for more info
			
 
				+	on other fields: the info may or may not be applicable to a
			
 
				+	different CPU, in this latter case the info can be ignored.
			
 
				+
			
 
				+	The code in mini.c together with the BURG rules in inssel.brg,
			
 
				+	inssel-float.brg and inssel-long32.brg provides general
			
 
				+	purpose mappings from the tree representation to a set of
			
 
				+	instructions that should be easily implemented in any
			
 
				+	architecture.  To allow for additional arch-specific
			
 
				+	functionality, an arch-specific BURG file can be used: in this
			
 
				+	file arch-specific instructions can be selected that provide
			
 
				+	better performance than the general instructions or that
			
 
				+	provide functionality that is needed by the JIT but that
			
 
				+	cannot be expressed in a general enough way.
			
 
				+	
			
 
				+	As an example, x86 has the special instruction "push" to make
			
 
				+	it easier to implement the default call convention (passing
			
 
				+	arguments on the stack): almost all the other architectures
			
 
				+	don't have such an instruction (and don't need it anyway), so
			
 
				+	we added a special rule in the inssel-x86.brg file for it.
			
 
				+	
			
 
				+	So, one of the first things needed in a port is to write a
			
 
				+	cpu-$(arch).md machine description file and fill it with the
			
 
				+	needed info. As a start, only a few instructions can be
			
 
				+	specified, like the ones required to do simple integer
			
 
				+	operations. The default rules of the instruction selector will
			
 
				+	emit the common instructions and so we're ready to go for the
			
 
				+	next step in porting the JIT.
			
 
				+	
			
 
				 
			
 
				 *) Native code emission
			
 
				 
			
 
				-Since the first step in porting mono to a new CPU is to port the interpreter,
			
 
				-there should be already a file that allows the emission of binary native code
			
 
				-in a buffer for the architecture. This file should be placed in the 
			
 
				-	mono/arch/$(arch)/
			
 
				-directory.
			
 
				-
			
 
				-The bulk of the code emission happens in the mini-$(arch).c file, in a function
			
 
				-called mono_arch_output_basic_block (). This function takes a basic block, walks the
			
 
				-list of instructions in the block and emits the binary code for each.
			
 
				-Optionally a peephole optimization pass is done on the basic block, but this can be
			
 
				-left for later, when the port actually works.
			
 
				-This function is very simple, there is just a big switch on the instruction opcode
			
 
				-and in the corresponding case the functions or macros to emit the binary native code
			
 
				-are used. Note that in this function the lengths of the instructions are used to
			
 
				-determine if the buffer for the code needs enlarging.
			
 
				-
			
 
				-To complete the code emission for a method, a few other functions need
			
 
				-implementing as well:
			
 
				-
			
 
				-	mono_arch_emit_prolog ()
			
 
				-	mono_arch_emit_epilog ()
			
 
				-	mono_arch_patch_code ()
			
 
				-
			
 
				-mono_arch_emit_prolog () will emit the code to setup the stack frame for a method,
			
 
				-optionally call the callbacks used in profiling and tracing, and move the
			
 
				-arguments to their home location (in a caller-save register if the variable was 
			
 
				-allocated to one, or in a stack location if the argument was passed in a volatile 
			
 
				-register and wasn't allocated a non-volatile one). caller-save registers used by the
			
 
				-function are saved in the prolog as well.
			
 
				-
			
 
				-mono_arch_emit_epilog () will emit the code needed to return from the function,
			
 
				-optionally calling the profiling or tracing callbacks. At this point the basic blocks
			
 
				-or the code that was moved out of the normal flow for the function can be emitted 
			
 
				-as well (this is usually done to provide better info for the static branch predictor).
			
 
				-In the epilog, caller-save registers are restored if they were used.
			
 
				-Note that, to help exception handling and stack unwinding, when there is a transition
			
 
				-from managed to unmanaged code, some special processing needs to be done (basically,
			
 
				-saving all the registers and setting up the links in the Last Managed Frame
			
 
				-structure).
			
 
				-
			
 
				-When the epilog has been emitted, the upper level code arranges for the buffer of 
			
 
				-memory that contains the native code to be copied in an area of executable memory
			
 
				-and at this point, instructions that use relative addressing need to be patched
			
 
				-to have the right offsets: this work is done by mono_arch_patch_code ().
			
 
				+	Since the first step in porting mono to a new CPU is to port
			
 
				+	the interpreter, there should be already a file that allows
			
 
				+	the emission of binary native code in a buffer for the
			
 
				+	architecture. This file should be placed in the
			
 
				+
			
 
				+		mono/arch/$(arch)/
			
 
				+
			
 
				+	directory.
			
 
				+
			
 
				+	The bulk of the code emission happens in the mini-$(arch).c
			
 
				+	file, in a function called mono_arch_output_basic_block
			
 
				+	(). This function takes a basic block, walks the list of
			
 
				+	instructions in the block and emits the binary code for each.
			
 
				+	Optionally a peephole optimization pass is done on the basic
			
 
				+	block, but this can be left for later, when the port actually
			
 
				+	works.
			
 
				+
			
 
				+	This function is very simple, there is just a big switch on
			
 
				+	the instruction opcode and in the corresponding case the
			
 
				+	functions or macros to emit the binary native code are
			
 
				+	used. Note that in this function the lengths of the
			
 
				+	instructions are used to determine if the buffer for the code
			
 
				+	needs enlarging.
			
 
				+	
			
 
				+	To complete the code emission for a method, a few other
			
 
				+	functions need implementing as well:
			
 
				+	
			
 
				+		mono_arch_emit_prolog ()
			
 
				+		mono_arch_emit_epilog ()
			
 
				+		mono_arch_patch_code ()
			
 
				+	
			
 
				+	mono_arch_emit_prolog () will emit the code to setup the stack
			
 
				+	frame for a method, optionally call the callbacks used in
			
 
				+	profiling and tracing, and move the arguments to their home
			
 
				+	location (in a caller-save register if the variable was
			
 
				+	allocated to one, or in a stack location if the argument was
			
 
				+	passed in a volatile register and wasn't allocated a
			
 
				+	non-volatile one). caller-save registers used by the function
			
 
				+	are saved in the prolog as well.
			
 
				+	
			
 
				+	mono_arch_emit_epilog () will emit the code needed to return
			
 
				+	from the function, optionally calling the profiling or tracing
			
 
				+	callbacks. At this point the basic blocks or the code that was
			
 
				+	moved out of the normal flow for the function can be emitted
			
 
				+	as well (this is usually done to provide better info for the
			
 
				+	static branch predictor).  In the epilog, caller-save
			
 
				+	registers are restored if they were used.
			
 
				+
			
 
				+	Note that, to help exception handling and stack unwinding,
			
 
				+	when there is a transition from managed to unmanaged code,
			
 
				+	some special processing needs to be done (basically, saving
			
 
				+	all the registers and setting up the links in the Last Managed
			
 
				+	Frame structure).
			
 
				+	
			
 
				+	When the epilog has been emitted, the upper level code
			
 
				+	arranges for the buffer of memory that contains the native
			
 
				+	code to be copied in an area of executable memory and at this
			
 
				+	point, instructions that use relative addressing need to be
			
 
				+	patched to have the right offsets: this work is done by
			
 
				+	mono_arch_patch_code ().
			
 
				 
			
 
				 
			
 
				 * Call conventions and register allocation
			
 
				 
			
 
				-To account for the differences in the call conventions, a few functions need to
			
 
				-be implemented.
			
 
				-
			
 
				-mono_arch_allocate_vars () assigns to both arguments and local variables
			
 
				-the offset relative to the frame register where they are stored, dead
			
 
				-variables are simply discarded. The total amount of stack needed is calculated.
			
 
				-
			
 
				-mono_arch_call_opcode () is the function that more closely deals with the call
			
 
				-convention on a given system. For each argument to a function call, an instruction
			
 
				-is created that actually puts the argument where needed, be it the stack or a 
			
 
				-specific register. This function can also re-arrange th order of evaluation
			
 
				-when multiple arguments are involved if needed (like, on x86 arguments are pushed
			
 
				-on the stack in reverse order). The function needs to carefully take into accounts
			
 
				-platform specific issues, like how structures are returned as well as the
			
 
				-differences in size and/or alignment of managed and corresponding unmanaged 
			
 
				-structures.
			
 
				-
			
 
				-The other chunk of code that needs to deal with the call convention and other
			
 
				-specifics of a CPU, is the local register allocator, implemented in a function
			
 
				-named mono_arch_local_regalloc (). The local allocator deals with a basic block 
			
 
				-at a time and basically just allocates registers for temporary
			
 
				-values during expression evaluation, spilling and unspilling as necessary.
			
 
				-The local allocator needs to take into account clobbering information, both
			
 
				-during simple instructions and during function calls and it needs to deal
			
 
				-with other architecture-specific weirdnesses, like instructions that take
			
 
				-inputs only in specific registers or output only is some.
			
 
				-Some effort will be put later in moving most of the local register allocator to 
			
 
				-a common file so that the code can be shared more for similar, risc-like CPUs.
			
 
				-The register allocator does a first pass on the instructions in a block, collecting
			
 
				-liveness information and in a backward pass on the same list performs the
			
 
				-actual register allocation, inserting the instructions needed to spill values,
			
 
				-if necessary.
			
 
				-
			
 
				-When this part of code is implemented, some testing can be done with the generated 
			
 
				-code for the new architecture. Most helpful is the use of the --regression
			
 
				-command line switch to run the regression tests (basic.cs, for example).
			
 
				-Note that the JIT will try to initialize the runtime, but it may not be able yet to
			
 
				-compile and execute complex code: commenting most of the code in the mini_init()
			
 
				-function in mini.c is needed to let the JIT just compile the regression tests.
			
 
				-Also, using multiple -v switches on the command line makes the JIT dump an 
			
 
				-increasing amount of information during compilation.
			
 
				-
			
 
				-
			
 
				+	To account for the differences in the call conventions, a few functions need to
			
 
				+	be implemented.
			
 
				+	
			
 
				+	mono_arch_allocate_vars () assigns to both arguments and local
			
 
				+	variables the offset relative to the frame register where they
			
 
				+	are stored, dead variables are simply discarded. The total
			
 
				+	amount of stack needed is calculated.
			
 
				+	
			
 
				+	mono_arch_call_opcode () is the function that more closely
			
 
				+	deals with the call convention on a given system. For each
			
 
				+	argument to a function call, an instruction is created that
			
 
				+	actually puts the argument where needed, be it the stack or a
			
 
				+	specific register. This function can also re-arrange th order
			
 
				+	of evaluation when multiple arguments are involved if needed
			
 
				+	(like, on x86 arguments are pushed on the stack in reverse
			
 
				+	order). The function needs to carefully take into accounts
			
 
				+	platform specific issues, like how structures are returned as
			
 
				+	well as the differences in size and/or alignment of managed
			
 
				+	and corresponding unmanaged structures.
			
 
				+	
			
 
				+	The other chunk of code that needs to deal with the call
			
 
				+	convention and other specifics of a CPU, is the local register
			
 
				+	allocator, implemented in a function named
			
 
				+	mono_arch_local_regalloc (). The local allocator deals with a
			
 
				+	basic block at a time and basically just allocates registers
			
 
				+	for temporary values during expression evaluation, spilling
			
 
				+	and unspilling as necessary.
			
 
				+
			
 
				+	The local allocator needs to take into account clobbering
			
 
				+	information, both during simple instructions and during
			
 
				+	function calls and it needs to deal with other
			
 
				+	architecture-specific weirdnesses, like instructions that take
			
 
				+	inputs only in specific registers or output only is some.
			
 
				+
			
 
				+	Some effort will be put later in moving most of the local
			
 
				+	register allocator to a common file so that the code can be
			
 
				+	shared more for similar, risc-like CPUs.  The register
			
 
				+	allocator does a first pass on the instructions in a block,
			
 
				+	collecting liveness information and in a backward pass on the
			
 
				+	same list performs the actual register allocation, inserting
			
 
				+	the instructions needed to spill values, if necessary.
			
 
				+	
			
 
				+	When this part of code is implemented, some testing can be
			
 
				+	done with the generated code for the new architecture. Most
			
 
				+	helpful is the use of the --regression command line switch to
			
 
				+	run the regression tests (basic.cs, for example).
			
 
				+
			
 
				+	Note that the JIT will try to initialize the runtime, but it
			
 
				+	may not be able yet to compile and execute complex code:
			
 
				+	commenting most of the code in the mini_init() function in
			
 
				+	mini.c is needed to let the JIT just compile the regression
			
 
				+	tests.  Also, using multiple -v switches on the command line
			
 
				+	makes the JIT dump an increasing amount of information during
			
 
				+	compilation.
			
 
				+	
			
 
				+	
			
 
				 * Method trampolines
			
 
				 
			
 
				-To get better startup performance, the JIT actually compiles a method only when
			
 
				-needed. To achieve this, when a call to a method is compiled, we actually emit a 
			
 
				-call to a magic trampoline. The magic trampoline is a function written in assembly
			
 
				-that invokes the compiler to compile the given method and jumps to the newly compiled
			
 
				-code, ensuring the arguments it received are passed correctly to the actual method.
			
 
				-Before jumping to the new code, though, the magic trampoline takes care of patching
			
 
				-the call site so that next time the call will go directly to the method instead of the
			
 
				-trampoline. How does this all work?
			
 
				-mono_arch_create_jit_trampoline () creates a small function that just
			
 
				-preserves the arguments passed to it and adds an additional argument (the method
			
 
				-to compile) before calling the generic trampoline. This small function is called 
			
 
				-the specific trampoline, because it is method-specific (the method to compile
			
 
				-is hard-code in the instruction stream).
			
 
				-The generic trampoline saves all the arguments that could get clobbered
			
 
				-and calls a C function that will do two things: 
			
 
				-
			
 
				-*) actually call the JIT to compile the method
			
 
				-*) identify the calling code so that it can be patched to call directly
			
 
				-the actual method
			
 
				-
			
 
				-If the 'this' argument to a method is a boxed valuetype that is passed to
			
 
				-a method that expects just a pointer to the data, an additional unboxing
			
 
				-trampoline will need to be inserted as well.
			
 
				-
			
 
				+	To get better startup performance, the JIT actually compiles a
			
 
				+	method only when needed. To achieve this, when a call to a
			
 
				+	method is compiled, we actually emit a call to a magic
			
 
				+	trampoline. The magic trampoline is a function written in
			
 
				+	assembly that invokes the compiler to compile the given method
			
 
				+	and jumps to the newly compiled code, ensuring the arguments
			
 
				+	it received are passed correctly to the actual method.
			
 
				+
			
 
				+	Before jumping to the new code, though, the magic trampoline
			
 
				+	takes care of patching the call site so that next time the
			
 
				+	call will go directly to the method instead of the
			
 
				+	trampoline. How does this all work?
			
 
				+
			
 
				+	mono_arch_create_jit_trampoline () creates a small function
			
 
				+	that just preserves the arguments passed to it and adds an
			
 
				+	additional argument (the method to compile) before calling the
			
 
				+	generic trampoline. This small function is called the specific
			
 
				+	trampoline, because it is method-specific (the method to
			
 
				+	compile is hard-code in the instruction stream).
			
 
				+
			
 
				+	The generic trampoline saves all the arguments that could get
			
 
				+	clobbered and calls a C function that will do two things:
			
 
				+	
			
 
				+	*) actually call the JIT to compile the method
			
 
				+	*) identify the calling code so that it can be patched to call directly
			
 
				+	the actual method
			
 
				+	
			
 
				+	If the 'this' argument to a method is a boxed valuetype that
			
 
				+	is passed to a method that expects just a pointer to the data,
			
 
				+	an additional unboxing trampoline will need to be inserted as
			
 
				+	well.
			
 
				+	
			
 
				 
			
 
				 * Exception handling
			
 
				 
			
 
				-Exception handling is likely the most difficult part of the port, as it needs
			
 
				-to deal with unwinding (both managed and unmanaged code) and calling
			
 
				-catch and filter blocks. It also needs to deal with signals, because mono
			
 
				-takes advantage of the MMU in the CPU and of the operation system to
			
 
				-handle dereferences of the NULL pointer. Some of the function needed
			
 
				-to implement the mechanisms are:
			
 
				-
			
 
				-mono_arch_get_throw_exception () returns a function that takes an exception object
			
 
				-and invokes an arch-specific function that will enter the exception processing.
			
 
				-To do so, all the relevant registers need to be saved and passed on.
			
 
				-
			
 
				-mono_arch_handle_exception () this function takes the exception thrown and
			
 
				-a context that describes the state of the CPU at the time the exception was 
			
 
				-thrown. The function needs to implement the exception handling mechanism,
			
 
				-so it makes a search for an handler for the exception and if none is found,
			
 
				-it follows the unhandled exception path (that can print a trace and exit or
			
 
				-just abort the current thread). The difficulty here is to unwind the stack 
			
 
				-correctly, by restoring the register state at each call site in the call chain,
			
 
				-calling finally, filters and handler blocks while doing so.
			
 
				-
			
 
				-As part of exception handling a couple of internal calls need to be implemented
			
 
				-as well.
			
 
				-ves_icall_get_frame_info () returns info about a specific frame.
			
 
				-mono_jit_walk_stack () walks the stack and calls a callback with info for
			
 
				-each frame found.
			
 
				-ves_icall_get_trace () return an array of StackFrame objects.
			
 
				-
			
 
				+	Exception handling is likely the most difficult part of the
			
 
				+	port, as it needs to deal with unwinding (both managed and
			
 
				+	unmanaged code) and calling catch and filter blocks. It also
			
 
				+	needs to deal with signals, because mono takes advantage of
			
 
				+	the MMU in the CPU and of the operation system to handle
			
 
				+	dereferences of the NULL pointer. Some of the function needed
			
 
				+	to implement the mechanisms are:
			
 
				+	
			
 
				+	mono_arch_get_throw_exception () returns a function that takes
			
 
				+	an exception object and invokes an arch-specific function that
			
 
				+	will enter the exception processing.  To do so, all the
			
 
				+	relevant registers need to be saved and passed on.
			
 
				+	
			
 
				+	mono_arch_handle_exception () this function takes the
			
 
				+	exception thrown and a context that describes the state of the
			
 
				+	CPU at the time the exception was thrown. The function needs
			
 
				+	to implement the exception handling mechanism, so it makes a
			
 
				+	search for an handler for the exception and if none is found,
			
 
				+	it follows the unhandled exception path (that can print a
			
 
				+	trace and exit or just abort the current thread). The
			
 
				+	difficulty here is to unwind the stack correctly, by restoring
			
 
				+	the register state at each call site in the call chain,
			
 
				+	calling finally, filters and handler blocks while doing so.
			
 
				+	
			
 
				+	As part of exception handling a couple of internal calls need
			
 
				+	to be implemented as well.
			
 
				+
			
 
				+	ves_icall_get_frame_info () returns info about a specific
			
 
				+	frame.
			
 
				+
			
 
				+	mono_jit_walk_stack () walks the stack and calls a callback with info for
			
 
				+	each frame found.
			
 
				+
			
 
				+	ves_icall_get_trace () return an array of StackFrame objects.
			
 
				+	
			
 
				 ** Code generation for filter/finally handlers
			
 
				 
			
 
				-Filter and finally handlers are called from 2 different locations:
			
 
				-
			
 
				-       1.) from within the method containing the exception clauses
			
 
				-       2.) from the stack unwinding code
			
 
				-
			
 
				-To make this possible we implement them like subroutines, ending with a
			
 
				-"return" statement. The subroutine does not save the base pointer, because we
			
 
				-need access to the local variables of the enclosing method. Its is possible
			
 
				-that instructions inside those handlers modify the stack pointer, thus we save
			
 
				-the stack pointer at the start of the handler, and restore it at the end. We
			
 
				-have to use a "call" instruction to execute such finally handlers.
			
 
				-
			
 
				-The MIR code for filter and finally handlers looks like:
			
 
				-
			
 
				-    OP_START_HANDLER
			
 
				-    ...
			
 
				-    OP_END_FINALLY | OP_ENDFILTER(reg)
			
 
				-
			
 
				-OP_START_HANDLER: should save the stack pointer somewhere
			
 
				-OP_END_FINALLY: restores the stack pointers and returns.
			
 
				-OP_ENDFILTER (reg): restores the stack pointers and returns the value in "reg".
			
 
				-
			
 
				+	Filter and finally handlers are called from 2 different locations:
			
 
				+	
			
 
				+	       1.) from within the method containing the exception clauses
			
 
				+	       2.) from the stack unwinding code
			
 
				+	
			
 
				+	To make this possible we implement them like subroutines,
			
 
				+	ending with a "return" statement. The subroutine does not save
			
 
				+	the base pointer, because we need access to the local
			
 
				+	variables of the enclosing method. Its is possible that
			
 
				+	instructions inside those handlers modify the stack pointer,
			
 
				+	thus we save the stack pointer at the start of the handler,
			
 
				+	and restore it at the end. We have to use a "call" instruction
			
 
				+	to execute such finally handlers.
			
 
				+	
			
 
				+	The MIR code for filter and finally handlers looks like:
			
 
				+	
			
 
				+	    OP_START_HANDLER
			
 
				+	    ...
			
 
				+	    OP_END_FINALLY | OP_ENDFILTER(reg)
			
 
				+	
			
 
				+	OP_START_HANDLER: should save the stack pointer somewhere
			
 
				+	OP_END_FINALLY: restores the stack pointers and returns.
			
 
				+	OP_ENDFILTER (reg): restores the stack pointers and returns the value in "reg".
			
 
				+	
			
 
				 ** Calling finally/filter handlers 
			
 
				 
			
 
				-There is a special opcode to call those handler, its called OP_CALL_HANDLER. It
			
 
				-simple emits a call instruction.
			
 
				-
			
 
				-Its a bit more complex to call handler from outside (in the stack unwinding
			
 
				-code), because we have to restore the whole context of the method first. After that
			
 
				-we simply emit a call instruction to invoke the handler. Its usually
			
 
				-possible to use the same code to call filter and finally handlers (see
			
 
				-arch_get_call_filter).
			
 
				-
			
 
				+	There is a special opcode to call those handler, its called
			
 
				+	OP_CALL_HANDLER. It simple emits a call instruction.
			
 
				+	
			
 
				+	Its a bit more complex to call handler from outside (in the
			
 
				+	stack unwinding code), because we have to restore the whole
			
 
				+	context of the method first. After that we simply emit a call
			
 
				+	instruction to invoke the handler. Its usually possible to use
			
 
				+	the same code to call filter and finally handlers (see
			
 
				+	arch_get_call_filter).
			
 
				+	
			
 
				 ** Calling catch handlers
			
 
				 
			
 
				-Catch handlers are always called from the stack unwinding code. Unlike finally clauses
			
 
				-or filters, catch handler never return. Instead we simply restore the whole
			
 
				-context, and restart execution at the catch handler.
			
 
				-
			
 
				+	Catch handlers are always called from the stack unwinding
			
 
				+	code. Unlike finally clauses or filters, catch handler never
			
 
				+	return. Instead we simply restore the whole context, and
			
 
				+	restart execution at the catch handler.
			
 
				+	
			
 
				 ** Passing Exception objects to catch handlers and filters.
			
 
				 
			
 
				-We use a local variable to store exception objects. The stack unwinding code
			
 
				-must store the exception object into this variable before calling catch handler
			
 
				-or filter.
			
 
				-
			
 
				+	We use a local variable to store exception objects. The stack
			
 
				+	unwinding code must store the exception object into this
			
 
				+	variable before calling catch handler or filter.
			
 
				+	
			
 
				 * Minor helper methods
			
 
				 
			
 
				-A few minor helper methods are referenced from the arch-independent code.
			
 
				-Some of them are:
			
 
				-
			
 
				-*) mono_arch_cpu_optimizations ()
			
 
				-	This function returns a mask of optimizations that should be enabled for the
			
 
				-	current CPU and a mask of optimizations that should be excluded, instead.
			
 
				-
			
 
				-*) mono_arch_regname ()
			
 
				-	Returns the name for a numeric register.
			
 
				-
			
 
				-*) mono_arch_get_allocatable_int_vars ()
			
 
				-	Returns a list of variables that can be allocated to the integer registers
			
 
				-	in the current architecture.
			
 
				-
			
 
				-*) mono_arch_get_global_int_regs ()
			
 
				-	Returns a list of caller-save registers that can be used to allocate variables 
			
 
				-	in the current method.
			
 
				-
			
 
				-*) mono_arch_instrument_mem_needs ()
			
 
				-*) mono_arch_instrument_prolog ()
			
 
				-*) mono_arch_instrument_epilog ()
			
 
				-	Functions needed to implement the profiling interface.
			
 
				-
			
 
				-
			
 
				+	A few minor helper methods are referenced from the arch-independent code.
			
 
				+	Some of them are:
			
 
				+	
			
 
				+	*) mono_arch_cpu_optimizations ()
			
 
				+		This function returns a mask of optimizations that
			
 
				+		should be enabled for the current CPU and a mask of
			
 
				+		optimizations that should be excluded, instead.
			
 
				+	
			
 
				+	*) mono_arch_regname ()
			
 
				+		Returns the name for a numeric register.
			
 
				+	
			
 
				+	*) mono_arch_get_allocatable_int_vars ()
			
 
				+		Returns a list of variables that can be allocated to
			
 
				+		the integer registers in the current architecture.
			
 
				+	
			
 
				+	*) mono_arch_get_global_int_regs ()
			
 
				+		Returns a list of caller-save registers that can be
			
 
				+		used to allocate variables in the current method.
			
 
				+	
			
 
				+	*) mono_arch_instrument_mem_needs ()
			
 
				+	*) mono_arch_instrument_prolog ()
			
 
				+	*) mono_arch_instrument_epilog ()
			
 
				+		Functions needed to implement the profiling interface.
			
 
				+	
			
 
				+	
			
 
				 * Writing regression tests
			
 
				 
			
 
				-Regression tests for the JIT should be written for any bug found in the JIT
			
 
				-in one of the *.cs files in the mini directory. Eventually all the operations
			
 
				-of the JIT should be tested (including the ones that get selected only when 
			
 
				-some specific optimization is enabled).
			
 
				-
			
 
				+	Regression tests for the JIT should be written for any bug
			
 
				+	found in the JIT in one of the *.cs files in the mini
			
 
				+	directory. Eventually all the operations of the JIT should be
			
 
				+	tested (including the ones that get selected only when some
			
 
				+	specific optimization is enabled).
			
 
				+	
			
 
				 
			
 
				 * Platform specific optimizations
			
 
				 
			
 
				-An example of a platform-specific optimization is the peephole optimization:
			
 
				-we look at a small window of code at a time and we replace one or more 
			
 
				-instructions with others that perform better for the given architecture or CPU.
			
 
				-
			
 
				+	An example of a platform-specific optimization is the peephole
			
 
				+	optimization: we look at a small window of code at a time and
			
 
				+	we replace one or more instructions with others that perform
			
 
				+	better for the given architecture or CPU.
			
 
				+