123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116 |
- <HTML>
- <!-- $Id: sim.html,v 1.7 2007/08/27 09:57:35 dick Exp $ -->
- <HEAD>
- <TITLE>The software and text similarity tester SIM</TITLE>
- </HEAD>
- <BODY>
- <H1>The software and text similarity tester SIM</H1>
- <H2>
- <A HREF="http://www.cs.vu.nl/~dick/">Dick Grune</A>
- </H2>
- <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/README.1st">SIM</A>
- tests lexical similarity in texts in C, Java, Pascal, Modula-2, Lisp, Miranda,
- and natural language.
- It is used
- <UL>
- <LI>
- to detect potentially duplicated code fragments in large software
- projects, in program text, in shell scripts and in documentation
- </LI>
- <LI>
- to detect plagiarism in software projects, educational and otherwise
- </LI>
- </UL>
- <P>
- SIM 2.19 is available as
- <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/sim_2_19.shar">
- C sources</A>
- and as
- <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/sim_2_19.zip">
- MSDOS binaries</A>.
- It is also available through ftp; the directory is
- <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester">
- ftp.cs.vu.nl:/pub/dick/similarity_tester</A>.
- There is a
- <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/sim.pdf">
- Unix-style manual page</A>.
- </P>
- <P>
- The software similarity tester is very efficient and allows us to compare
- this year's students' work with that collected from many past years (much to
- the dismay of some, mostly non-CS, students).
- Students are told that their work is going to be compared, but some are
- non-believers ...
- </P>
- <P>
- The output of the similarity tester can be processed by a number of shell
- scripts by Matty Huntjens
- (<A HREF="http://www.cs.vu.nl/~matty/">[email protected]</A>).
- These shell scripts take sim output and produce lists of suspect submissions,
- histograms and the like.
- The present version of these scripts is very much geared to the local
- situation at the
- <A HREF="http://www.vu.nl/">VU University Amsterdam</A>,
- though; they are low on portability.
- </P>
- <P>
- We are not afraid that students would try to tune their work to the
- similarity tester.
- We reckon if they can do that they can also do the exercise.
- </P>
- <P>
- Since this piece of handicraft does not qualify as research, there are no
- international papers on it.
- The work was described in Dutch in
- Dick Grune,
- Matty Huntjens,
- <A HREF="ftp://ftp.cs.vu.nl/pub/dick/publications/Het_detecteren_van_kopieen_bij_informatica-practica.ps">
- Het detecteren van kopieën bij informatica-practica</A>,
- Informatie,
- <STRONG>31</STRONG>,
- 11,
- Nov 1989,
- pp. 864-867
- (<A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/artikel.lit">
- lit. ref.</A>)).
- An
- <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/Paper.ps">
- English translation
- </A>
- of the paper is also available.
- The ftp directory contains a terse
- <A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/TechnReport">
- technical report</A>
- about the internal workings of the program.
- </P>
- <H5>
- <HR>
- [<A HREF="CVS.html">Previous</A>]
- [<A HREF="mag.html">Next</A>]
- [<A HREF="http://www.cs.vu.nl/~dick/dick.html">Personal Page</A>]
- [<A HREF="http://www.cs.vu.nl/~dick/">Professional Page</A>]
- [<A HREF="http://www.cs.vu.nl/">CS</A>]
- [<A HREF="http://www.few.vu.nl/">Faculty</A>]
- [<A HREF="http://www.vu.nl/">VU University Amsterdam</A>]
- <HR>
- </H5>
- <ADDRESS>
- The software and text similarity tester SIM / Dick Grune /
- <A HREF="mailto:[email protected]">[email protected]</A>
- </ADDRESS>
- </BODY>
- </HTML>
|