sim.html 3.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
  1. <HTML>
  2. <!-- $Id: sim.html,v 1.7 2007/08/27 09:57:35 dick Exp $ -->
  3. <HEAD>
  4. <TITLE>The software and text similarity tester SIM</TITLE>
  5. </HEAD>
  6. <BODY>
  7. <H1>The software and text similarity tester SIM</H1>
  8. <H2>
  9. <A HREF="http://www.cs.vu.nl/~dick/">Dick Grune</A>
  10. </H2>
  11. <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/README.1st">SIM</A>
  12. tests lexical similarity in texts in C, Java, Pascal, Modula-2, Lisp, Miranda,
  13. and natural language.
  14. It is used
  15. <UL>
  16. <LI>
  17. to detect potentially duplicated code fragments in large software
  18. projects, in program text, in shell scripts and in documentation
  19. </LI>
  20. <LI>
  21. to detect plagiarism in software projects, educational and otherwise
  22. </LI>
  23. </UL>
  24. <P>
  25. SIM 2.19 is available as
  26. <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/sim_2_19.shar">
  27. C sources</A>
  28. and as
  29. <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/sim_2_19.zip">
  30. MSDOS binaries</A>.
  31. It is also available through ftp; the directory is
  32. <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester">
  33. ftp.cs.vu.nl:/pub/dick/similarity_tester</A>.
  34. There is a
  35. <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/sim.pdf">
  36. Unix-style manual page</A>.
  37. </P>
  38. <P>
  39. The software similarity tester is very efficient and allows us to compare
  40. this year's students' work with that collected from many past years (much to
  41. the dismay of some, mostly non-CS, students).
  42. Students are told that their work is going to be compared, but some are
  43. non-believers ...
  44. </P>
  45. <P>
  46. The output of the similarity tester can be processed by a number of shell
  47. scripts by Matty Huntjens
  48. (<A HREF="http://www.cs.vu.nl/~matty/">[email protected]</A>).
  49. These shell scripts take sim output and produce lists of suspect submissions,
  50. histograms and the like.
  51. The present version of these scripts is very much geared to the local
  52. situation at the
  53. <A HREF="http://www.vu.nl/">VU University Amsterdam</A>,
  54. though; they are low on portability.
  55. </P>
  56. <P>
  57. We are not afraid that students would try to tune their work to the
  58. similarity tester.
  59. We reckon if they can do that they can also do the exercise.
  60. </P>
  61. <P>
  62. Since this piece of handicraft does not qualify as research, there are no
  63. international papers on it.
  64. The work was described in Dutch in
  65. Dick Grune,
  66. Matty Huntjens,
  67. <A HREF="ftp://ftp.cs.vu.nl/pub/dick/publications/Het_detecteren_van_kopieen_bij_informatica-practica.ps">
  68. Het detecteren van kopie&euml;n bij informatica-practica</A>,
  69. Informatie,
  70. <STRONG>31</STRONG>,
  71. 11,
  72. Nov 1989,
  73. pp. 864-867
  74. (<A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/artikel.lit">
  75. lit. ref.</A>)).
  76. An
  77. <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/Paper.ps">
  78. English translation
  79. </A>
  80. of the paper is also available.
  81. The ftp directory contains a terse
  82. <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/TechnReport">
  83. technical report</A>
  84. about the internal workings of the program.
  85. </P>
  86. <H5>
  87. <HR>
  88. [<A HREF="CVS.html">Previous</A>]
  89. [<A HREF="mag.html">Next</A>]
  90. [<A HREF="http://www.cs.vu.nl/~dick/dick.html">Personal Page</A>]
  91. [<A HREF="http://www.cs.vu.nl/~dick/">Professional Page</A>]
  92. [<A HREF="http://www.cs.vu.nl/">CS</A>]
  93. [<A HREF="http://www.few.vu.nl/">Faculty</A>]
  94. [<A HREF="http://www.vu.nl/">VU University Amsterdam</A>]
  95. <HR>
  96. </H5>
  97. <ADDRESS>
  98. The software and text similarity tester SIM / Dick Grune /
  99. <A HREF="mailto:[email protected]">[email protected]</A>
  100. </ADDRESS>
  101. </BODY>
  102. </HTML>