Browse Source

Had to daemonize TFBReaper

The old implementation had a race-condition with wait()
that would sometimes (often?) result in the first process
to exit successfully and fire off the wait() function
leaving the child processes orphaned as TFBReaper exited.

The new implementation drops wait() in favor of simply
entering an infinite loop and relying on the suite to handle
the process cleanup.

This new implementation has the added wrinkle of TFBReaper
remaining as a 'defunct' process because it had been exited
forcibly and gone into a state of waiting for its parent
(TFB python suite) to clean it or exit. This would result
in hundreds of defunct TFBReaper processes left running as
a full benchmark neared conclusion (each test would spawn
a new-and-eventually-defunct TFBReaper). The fix to this
issue is actually the original problem - have TFBReaper
fork a child process and exit the parent.

This causes TFBReaper's child process to become orphaned
and adopted by init(1), which will clean defunct processes
by design almost immediately.
msmith-techempower 8 years ago
parent
commit
baa8735e20
2 changed files with 23 additions and 10 deletions
  1. 6 6
      toolset/benchmark/benchmarker.py
  2. 17 4
      toolset/setup/linux/TFBReaper.c

+ 6 - 6
toolset/benchmark/benchmarker.py

@@ -693,13 +693,13 @@ class Benchmarker:
   # Stops all running tests
   # Stops all running tests
   ############################################################
   ############################################################
   def __stop_test(self, ppid, out):
   def __stop_test(self, ppid, out):
-    try:
-      subprocess.check_call(['pkill', '-P', str(ppid)], stderr=out, stdout=out)
-      retcode = 0
-    except Exception:
-      retcode = 1
+    # Find the PID of the daemon (and remove trailing newline)
+    pid = subprocess.check_output(['pgrep','TFBReaper']).strip()
+    # Kill the children
+    subprocess.call(['pkill', '-P', pid], stderr=out, stdout=out)
+    # Kill the parent
+    subprocess.call(['kill', pid], stderr=out, stdout=out)
 
 
-    return retcode
   ############################################################
   ############################################################
   # End __stop_test
   # End __stop_test
   ############################################################
   ############################################################

+ 17 - 4
toolset/setup/linux/TFBReaper.c

@@ -10,6 +10,20 @@
 
 
 int main(int argc, char *argv[])
 int main(int argc, char *argv[])
 {
 {
+  pid_t process_id = 0;
+  pid_t sid = 0;
+  // Create child process
+  process_id = fork();
+  // PARENT PROCESS. Need to kill it.
+  if (process_id > 0)
+  {
+    // Parent returns success in exit status
+    exit(0);
+  }
+
+  // Here we are as the child with no parent.
+
+  // Gather the command line arguments for the pass-through.
   int count = argc - 1;
   int count = argc - 1;
   int *sizes = malloc(sizeof(int) * count);
   int *sizes = malloc(sizeof(int) * count);
   int total_size = 0;
   int total_size = 0;
@@ -47,10 +61,9 @@ int main(int argc, char *argv[])
   int ret = system(result);
   int ret = system(result);
   free(result);
   free(result);
 
 
-  // This tells the application to wait until all the child 
-  // processes have completed.
-  int status;
-  wait(&status);
+  // We need to wait forever; the suite will clean this child
+  // process up later.
+  for(;;){}
 
 
   return ret;
   return ret;
 }
 }