Browse Source

* Also take into account the node complexity of parameters to determine
the order in which they are evaluated (except for LOC_REFERENCE
parameters on i386, because the code generator expects them in their
original order). This saves quite a lot of spilling and uses of
non-volatile registers when the parameters themselves also contain
calls

git-svn-id: trunk@8603 -

Jonas Maebe 18 years ago
parent
commit
93aad97c22
1 changed files with 33 additions and 11 deletions
  1. 33 11
      compiler/ncal.pas

+ 33 - 11
compiler/ncal.pas

@@ -2470,11 +2470,27 @@ implementation
             hpnext:=tcallparanode(hpcurr.right);
             hpnext:=tcallparanode(hpcurr.right);
             { pull in at the correct place.
             { pull in at the correct place.
               Used order:
               Used order:
-                1. LOC_REFERENCE with smallest offset (x86 only)
-                2. LOC_REFERENCE with most registers
-                3. LOC_REGISTER with most registers
+                1. LOC_REFERENCE with smallest offset (i386 only)
+                2. LOC_REFERENCE with most registers and least complexity (non-i386 only)
+                3. LOC_REFERENCE with least registers and most complexity (non-i386 only)
+                4. LOC_REGISTER with most registers and most complexity
+                5. LOC_REGISTER with least registers and least complexity
               For the moment we only look at the first parameter field. Combining it
               For the moment we only look at the first parameter field. Combining it
-              with multiple parameter fields will make things a lot complexer (PFV) }
+              with multiple parameter fields will make things a lot complexer (PFV)
+
+              The reason for the difference regarding complexity ordering
+              between LOC_REFERENCE and LOC_REGISTER is mainly for calls:
+              we first want to treat the LOC_REFERENCE destinations whose
+              calculation does not require a call, because their location
+              may contain registers which might otherwise have to be saved
+              if a call has to be evaluated first. The calculated value is
+              stored on the stack and will thus no longer occupy any
+              register.
+
+              Similarly, for the register parameters we first want to
+              evaluate the calls, because otherwise the already loaded
+              register parameters will have to be saved so the intermediate
+              call can be evaluated (JM) }
             if not assigned(hpcurr.parasym.paraloc[callerside].location) then
             if not assigned(hpcurr.parasym.paraloc[callerside].location) then
               internalerror(200412152);
               internalerror(200412152);
             currloc:=hpcurr.parasym.paraloc[callerside].location^.loc;
             currloc:=hpcurr.parasym.paraloc[callerside].location^.loc;
@@ -2497,23 +2513,29 @@ implementation
                               That means the for pushes the para with the
                               That means the for pushes the para with the
                               highest offset (see para3) needs to be pushed first
                               highest offset (see para3) needs to be pushed first
                             }
                             }
-                            if (hpcurr.registersint>hp.registersint)
-{$ifdef x86}
-                               or (hpcurr.parasym.paraloc[callerside].location^.reference.offset>hp.parasym.paraloc[callerside].location^.reference.offset)
-{$endif x86}
-                               then
+{$ifdef i386}
+                            { the i386 code generator expects all reference }
+                            { parameter to be in this order so it can use   }
+                            { pushes                                        }
+                            if (hpcurr.parasym.paraloc[callerside].location^.reference.offset>hp.parasym.paraloc[callerside].location^.reference.offset) then
+{$else i386}
+                            if (hpcurr.registersint>hp.registersint) or
+                               (node_complexity(hpcurr)<node_complexity(hp))
+{$endif i386}
                               break;
                               break;
                           end;
                           end;
+                        LOC_MMREGISTER,
                         LOC_REGISTER,
                         LOC_REGISTER,
                         LOC_FPUREGISTER :
                         LOC_FPUREGISTER :
                           break;
                           break;
                       end;
                       end;
                     end;
                     end;
+                  LOC_MMREGISTER,
                   LOC_FPUREGISTER,
                   LOC_FPUREGISTER,
                   LOC_REGISTER :
                   LOC_REGISTER :
                     begin
                     begin
-                      if (hp.parasym.paraloc[callerside].location^.loc=currloc) and
-                         (hpcurr.registersint>hp.registersint) then
+                      if (hp.parasym.paraloc[callerside].location^.loc<>LOC_REFERENCE) and
+                         (node_complexity(hpcurr)>node_complexity(hp)) then
                         break;
                         break;
                     end;
                     end;
                 end;
                 end;