sim.txt 4.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198
  1. User Commands SIM(1)
  2. NAME
  3. sim - find similarities in C, Java, Pascal, Modula-2, Lisp,
  4. Miranda or text files
  5. SYNOPSIS
  6. sim_c [ -[defFnpsS] -r N -w N -o F ] file ... [ / [ file ...
  7. ] ]
  8. sim_c ...
  9. sim_java ...
  10. sim_pasc ...
  11. sim_m2 ...
  12. sim_lisp ...
  13. sim_mira ...
  14. sim_text ...
  15. DESCRIPTION
  16. Sim_c reads the C files file ... and looks for pieces of
  17. text that are similar; two pieces of program text are simi-
  18. lar if they only differ in layout, comment, identifiers and
  19. the contents of numbers, strings and characters. If any
  20. runs of sufficient length are found, they are reported on
  21. standard output; the number of significant tokens in the run
  22. is given between square brackets.
  23. Sim_java does the same for Java, sim_pasc for Pascal, sim_m2
  24. for Modula-2, sim_lisp for Lisp, and sim_mira for Miranda.
  25. Sim_text works on arbitrary text; it is occasionally useful
  26. on shell scripts.
  27. The program can be used for finding copied pieces of code in
  28. purportedly unrelated programs (with -s or -S), or for find-
  29. ing accidentally duplicated code in larger projects (with
  30. -f).
  31. If a / is present between the input files, the latter are
  32. divided into a group of "new" files (before the /) and a
  33. group of "old" files; if there is no /, all files are "new".
  34. Old files are never compared to each other. Since the simi-
  35. larity tester reads the files several times, it cannot read
  36. from standard input.
  37. There are the following options:
  38. -d The output is in a diff(1)-like format instead of the
  39. default 2-column format.
  40. -e Each file is compared to each file in isolation; this
  41. will find all similarities between all texts involved,
  42. regardless of duplicates.
  43. -f Runs are restricted to pieces with balancing
  44. parentheses, to isolate potential functions (C, Java,
  45. Vrije Universiteit Last change: 2001/11/13 1
  46. User Commands SIM(1)
  47. Pascal, Modula-2 and Lisp only).
  48. -F The names of functions in calls are required to match
  49. exactly (C, Java, Pascal, Modula-2 and Lisp only).
  50. -n Similarities found are only summarized, not displayed.
  51. -o F The output is written to the file named F.
  52. -p The output is given in similarity percentages; see
  53. below.
  54. -r N The minimum run length is set to N (default is N = 24).
  55. -s The contents of a file are not compared to itself (-s =
  56. not self).
  57. -S The contents of the new files are compared to the old
  58. files only - not between themselves.
  59. -w N The page width used is set to N columns (default is N =
  60. 80).
  61. The -p option results in lines of the form F consists for x
  62. % of G material meaning that x % of F's text can also be
  63. found in G. Note that this relation is not symmetric; it is
  64. in fact quite possible for one file to consist for 100 % of
  65. text from another file, while the other file consists for
  66. only 1 % of text of the first file, if their lengths differ
  67. enough. Note also that the granularity of the recognized
  68. text is still governed by the -r option or its default.
  69. Care has been taken to keep all internal processes linear in
  70. the length of the input, with the exception of the matching
  71. process which is almost linear, using a hash table; various
  72. other tables are used for speed-up. If, however, there is
  73. not enough memory for the tables, they are discarded in
  74. order of unimportance, under which conditions the algorithms
  75. revert to their quadratic nature.
  76. AUTHOR
  77. Dick Grune, Vrije Universiteit, Amsterdam.
  78. BUGS
  79. Strong periodicity in the input text (like a table of N
  80. almost identical lines) causes problems. Sim tries to cope
  81. with this but cannot avoid giving appr. log N messages about
  82. it. The best advice is still to take the offending files
  83. out of the game.
  84. Since it uses lex(1) on some systems, it may dump core on
  85. any weird construction that overflows lex's internal
  86. Vrije Universiteit Last change: 2001/11/13 2
  87. User Commands SIM(1)
  88. buffers.
  89. Vrije Universiteit Last change: 2001/11/13 3