ChangeLog 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580
  1. 2007-08-23 Dick Grune <[email protected]>
  2. LICENSE.txt added.
  3. 2006-11-27 Dick Grune <[email protected]>
  4. Removal of setbuff() for compatibility.
  5. 2005-01-17 Dick Grune <[email protected]>
  6. Corrections by Jerry James <[email protected]>; ANSIizing, etc.
  7. 2004-08-05 Dick Grune <[email protected]>
  8. Finished the 'percentage' option.
  9. 08-Nov-2001 Dick Grune
  10. Begun to add a 'percentage' option, which will express the
  11. similarity between two files in percents.
  12. 27-Sep-2001 Dick Grune
  13. Split add_run() off from compare.c into add_run.c, to accomodate
  14. different add_run()s, for different types of processing.
  15. 27-Nov-1998 Dick Grune
  16. Installed a Miranda version supplied by Emma Norling ([email protected])
  17. 23-Feb-1998 Dick Grune
  18. Renamed text.l to textlang.l for uniformity and to make room for
  19. a possible module text.[ch].
  20. Isolated a module for handling the token array from buff.[ch] to
  21. tokenarray.[ch], and renamed buff.[ch] to text.[ch].
  22. 23-Feb-1998 Dick Grune
  23. There is probably not much point in abandoning the nl_buff list
  24. when running out of memory for TokenArray[]: each token costs 1
  25. byte for the token and 4 bytes for the entry in
  26. forward_references[], a total of 5 bytes. There are about 3
  27. tokens to a line, together requiring 15 bytes, plus 1 byte in
  28. nl_buff yields 16 bytes. So releasing nl_buff frees only 1/16 =
  29. 6.7 % of memeory.
  30. Since the code is a bother, I removed it. Note that nl_buff is
  31. still abandoned when the number of tokens in a line does not fit
  32. in one unsigned char (but that is not very likely to happen).
  33. 21-Feb-1998 Dick Grune
  34. Printing got into an infinite loop when the last line of the
  35. input was not terminated by a newline AND contained tokens that
  36. were included in a matching run.
  37. This was due to a double bug: 1. the non-terminated line was not
  38. registered properly in NextTextTokenObtained() / CloseText(),
  39. and 2. the loop in pass 2 which sets the values of
  40. pos->ps_nl_cnt was terminated prematurely when the file turned
  41. out to be shorter than the list of pos-es indicated.
  42. Both bugs were corrected, the first by supplying an extra
  43. newline in CloseText() when one is found missing, and the second
  44. by rewriting the list-parallel loop in pass 2.
  45. 02-Feb-1998 Dick Grune
  46. Pascal does not differentiate between strings and characters
  47. (strings of one character); this difference has been removed
  48. from pascallang.l.
  49. 22-Jan-1998 Dick Grune
  50. Detection of non-ASCII characters added. Since the lexical
  51. analyser itself generates non-ASCII characters, the test must occur
  52. earlier. We could replace the input routine of lex by a
  53. checking routine, but with several lex-es going around, we want
  54. a more lex-independent solution. To allow each language its own
  55. restrictions about non-ASCII characters, the check is
  56. implemented in the *lang.l files.
  57. 28-Nov-1997 Dick Grune
  58. Changed the name of the C similarity tester 'sim' to 'sim_c', for
  59. uniformity with sim_java, etc.
  60. 23-Nov-1997 Dick Grune
  61. Java version finished; checked by Matty Huntjens and crew.
  62. 24-Jun-1997 Dick Grune
  63. Started on a Java version, by copying the C version.
  64. 22-Jun-1997 Dick Grune
  65. Modern lexical analysers, among which flex, read the entire input into
  66. a buffer before they issue the first token. As a result, ftell() no
  67. longer gives a usable indication of the position of a token in a file.
  68. This pulls the rug from under the nl_buff mechanism in buff.c, which
  69. is removed. We loose a valuable optimization this way, but there just
  70. seems to be no way to keep it.
  71. Note that this has nothing to do with the problem in MS-DOS of
  72. character count and fseek position not being synchronized. That
  73. problem has been solved on June 14, 1991 (which see) and the code has
  74. been running OK since.
  75. 18-Jun-1997 Dick Grune
  76. The thought has occurred to use McCreight's linear longest common
  77. substring algorithm rather than the existing algorithm, which has a
  78. small quadratic component. There are a couple of problems with this:
  79. 1. We need the longest >non-overlapping< common substring;
  80. McCreight provides just the longest. It is not at all clear
  81. how to modify the algorithm.
  82. 2. Once we have found our LCS, we want to find the
  83. one-but-longest; it is far from obvious how to do that in
  84. McCreight's algorithm.
  85. 3. Once we have found our LCS, we want to take one of its
  86. copies out of the game, to suppress duplicate messages.
  87. Again, it is difficult to see how to do that, without
  88. redoing all the calculations.
  89. 4. McCreight's algorithm seems to require about two binary
  90. tree nodes per token, say 8 bytes, which is double we
  91. use now.
  92. 17-Jun-1997 Dick Grune
  93. Did some experimenting with the hash function; it is still
  94. pretty bad: the simple-minded second sweep through
  95. forward_references easily removes another 80-99% of false hits.
  96. Next, a third sweep that does a full comparison will remove another
  97. large percentage.
  98. So I have left in the second sweep in all cases.
  99. There are a couple of questions here:
  100. 1. Can we find a better hash function, or will we forever need a
  101. second sweep?
  102. 2. Does it actually matter, or will we loose on more expensive
  103. hashing what we gain by having a better set of forward
  104. references in compare.c?
  105. 16-Jun-1997 Dick Grune
  106. Cleaned up sim.h and renamed aiso.[ch] to runs.[ch] since they
  107. are instantiations of the aiso module concerned with runs.
  108. Aiso.[spc|bdy] stays aiso.[spc|bdy], of course.
  109. 16-Jun-1997 Dick Grune
  110. Redid largest_function() in algollike.c.
  111. Corrected bug in CheckRun; it now always removes NonFinals from
  112. the end, even when it has first applied largest_function().
  113. 15-Jun-1997 Dick Grune
  114. Reorganized the layers around the input file. There were and
  115. still are three layers: lang, stream and buff.
  116. Since the lex_X variables are hoisted unchanged through the levels
  117. lang, stream, and buff, to be used by pass1, pass2, etc., they
  118. have to be placed in a module of their own.
  119. The token-providing module 'lang' has three interfaces:
  120. - lang.h, which provides access to the lowest-level token
  121. routines, to be used by the next level.
  122. - lex.h, which provides the lex variables, to be used by
  123. all and sundry.
  124. - language.h, which provides language-specific info about
  125. tokens, concerning their suitability as initial
  126. and final tokens, to be used by higher levels.
  127. This structure is not satisfactory, but it is also unreasonable
  128. to combine them in one interface.
  129. There is no single lang.c; rather it is represented by the
  130. various Xlang.c files generated from the Xlang.l files.
  131. 14-Jun-1997 Dick Grune
  132. Added a Makefile zip entry to parallel the shar entry.
  133. 13-Jun-1997 Dick Grune
  134. A number of simplifications, in view of better software and bigger
  135. machines:
  136. - Removed good_realloc from hash.c; I don't think there are
  137. any bad reallocs left.
  138. - Removed the option to run without forward_references.
  139. On a 16Mb machine this means you have at least 2M tokens;
  140. using a quadratic algorithm will take 4*10^6 sec. at an
  141. impossible rate of 1M actions/sec., which is some 50 days.
  142. Forget it.
  143. - Renamed lang() to print_stream(), and incorporated it in sim.c
  144. - Removed the MSDOS subdirectory mechanism in the Makefile.
  145. - Removed the funny and sneaky double parameter expansion in
  146. the call of idf_in_list().
  147. 12-Jun-1997 Dick Grune
  148. Converted to ANSI C. Removed cport.h.
  149. 09-Jan-1995 Dick Grune
  150. Decided not to do directories: they usually contain extraneous
  151. files and doing sim * is simple enough anyway.
  152. 09-Sep-1994 Dick Grune
  153. Added system.h to cater for the (few) differences between Unix and DOS.
  154. The #define int32 is also supplied there.
  155. 05-Sep-1994 Dick Grune
  156. Added many prototype declarations using cport.h.
  157. Added a depend entry to the Makefile.
  158. 31-Aug-1994 Dick Grune
  159. All these changes require a 32 bit integer; introduced a #define
  160. int32, set from the command line in the Makefile.
  161. 25-Aug-1994 Dick Grune
  162. It turned out that one of the most often called routines was .rem,
  163. from idf_hashed() in idf.c. Moving the % out of the loop chafed off
  164. another 6% and reduced the time to 18.4 sec.
  165. 19-Aug-1994 Dick Grune
  166. With very large files (e.g., concatenated /usr/man/man1/*) the fixed
  167. built-in hash table size of 10639 is no longer satisfactory. Hash.c
  168. now finds a prime about 8 times smaller than the text_size to use
  169. for hash table size; this achieves optimal speed-up without gobbling
  170. up too much memory. Reduced the time for the above file from 30.2
  171. sec. to 19.6 sec.
  172. For checking, the same test was run with all hashing off; it took
  173. 20h 27m 19s = 73639 sec. But it worked.
  174. 11-Aug-1994 Dick Grune
  175. For large values of MinRunSize (>1000) a large part of the time
  176. (>two-thirds) was spent in calculating the hash values for each
  177. position in the input, since the cost of this calculation was
  178. proportional to MinRunSize. We now sample a maximum of 24 tokens
  179. from the input string to calculate the hash value, and avoid
  180. overflow. On my workstation, this reduces the time for
  181. sim_text -r 1000 -n /usr/man/man1/*
  182. from 60 sec to 21 sec.
  183. 30-Jun-1992 Dick Grune,kamer R4.40,telef. 5778
  184. There was an amazing bug in buff.c where NextTextToken() for pass 2
  185. omitted to set lex_token to EOL when retrieving newline info from
  186. nl_buff. Worked until now!?!
  187. 23-Sep-1991 Dick Grune
  188. Cport.h introduced, CONST and *.spc only.
  189. 17-Sep-1991 Dick Grune
  190. The position-sorting routine in pass2.c has been made into a
  191. separate generic module.
  192. 14-Jun-1991 Dick Grune ([email protected]) at dick.cs.vu.nl
  193. Replaced the determination of the input position through counting
  194. input characters by calls of ftell(); this is cleaner and the other
  195. method will never work on MSDOS.
  196. 30-May-1989 Dick Grune (dick) at dick
  197. Replaced the old top-100 module (which had been extended to top-10000
  198. already anyway) by the new aiso (arbitrary-in sorted-out) module.
  199. This caused a considerable speed-up on the Mod2 test bed:
  200. %time cumsecs #call ms/call name
  201. 17.9 99.20 7209 13.76 _InsertTop
  202. 0.3 1.37 7209 0.19 _InsertAiso
  203. It turns out that malloc() is not a serious problem, so no special
  204. version for the aiso module is required.
  205. 23-May-1989 Dick Grune (dick) at dick
  206. No more uncommented comment at the end of preprocessor lines, to
  207. conform to ANSI C.
  208. 23-May-1989 Dick Grune (dick) at dick
  209. Added code in the X.l files to (silently) reject characters over 0200.
  210. This does not really help, since lex stops on null chars. Ah, well.
  211. 19-May-1989 Dick Grune (dick) at dick
  212. Made the token as handled by sim into an abstract data type, for
  213. aesthetic reasons. Sign extension is still a problem.
  214. 03-May-1989 Dick Grune (dick) at dick
  215. Optimized lcs() by first checking from the end if a sufficiently long
  216. run is present; if in fact only the first 12 tokens match, chances
  217. are good that you can reject the run right away by first testing
  218. the 20th token, then the 19th, and so on.
  219. 21-Apr-1989 Dick Grune (dick) at dick
  220. A run of sim_m2 finding 7209 similarities raised the question of
  221. the appropriateness of the linear sort in sort_pos(). Profiling
  222. showed that in this case sorting takes all of 7.5 % of the total
  223. time. Putting the word register in in the right places in
  224. sort_pos() lowered this number to 4.6%.
  225. 20-Apr-1989 Dick Grune (dick) at dick
  226. Moved the test for MayBeStartOfRun() from compare.c (where it is
  227. done again and again) to hash.c, where its effect is incorporated in
  228. the forward reference chain.
  229. 14-Apr-1989 Dick Grune (dick) at dick
  230. Replaced elem_of() by bit tables, headers[] and trailers[], to be
  231. prefilled from Headers[] and Trailers[] by a call of
  232. InitLanguage(). This saves a few percents.
  233. 13-Apr-1989 Dick Grune (dick) at dick
  234. Implemented the -e and the -S option, by putting yet another loop
  235. in compare.c
  236. 13-Apr-1989 Dick Grune (dick) at dick
  237. The -- option (displaying the tokens) will now handle more than one
  238. file.
  239. 20-Jan-1989 Dick Grune (dick) at dick
  240. After the modification of 19-Dec-88, 12% of the time went into
  241. updating the positions in the chunks, as they were produced by the
  242. matching process. This matching process identifies runs (matches)
  243. by token position, which has to be recalculated to lseek positions
  244. and line numbers. To this end the files are read again, and for
  245. each line all positions found were checked to see if they applied
  246. to this line; this was a awfully stupid algorithm, but since much
  247. more time was spent elsewhere, it did not really matter. With all
  248. the saving below, however, it had risen to second position, after
  249. yylook() with 35%.
  250. Th solution was, to sort the positions in the same order in which
  251. they would be met by the reading of the files. The process is then
  252. linear. This required some extensive hacking in pass2.c
  253. 06-Jan-1989 Dick Grune (dick) at dick
  254. The modification below did indeed save 25%. The newline information
  255. is now reduced to 2 shorts; 2 chars were not enough, since some
  256. lines are longer that 127 bytes, and a char and a short together
  257. take as much room as two shorts.
  258. 19-Dec-1988 Dick Grune (dick) at dick
  259. To avoid reading the files twice (which is still taking 25% of the
  260. time), the first pass will now collect newline information for the
  261. second pass in a buffer called nl_buff[]. This buffer, and the
  262. original token buffer now named TokenArray[], are managed by the file
  263. buff.c, which implements a layer between stream.h and pass?.c. This
  264. layer provides OpenText(), NextTextToken() and CloseText(), each
  265. with a parameter telling which pass it is.
  266. 06-Dec-1988 Dick Grune (dick) at dick
  267. As an introduction to removing the second pass altogether, the
  268. first and second scan were unified, i.e., their input is identical.
  269. This also means that the call sim -[12] has now been replaced by
  270. one call: sim --.
  271. 23-Sep-1988 Dick Grune (dick) at dick
  272. Dynamic allocation of line buffers in pass 3. This removes the
  273. restriction on the page width.
  274. 22-Sep-1988 Dick Grune (dick) at dick
  275. In order to give better messages on incorrect calls to sim, the
  276. whole option handling has been concentrated in a file option.c and
  277. separated from the options and their messages themselves. See sim.c
  278. 07-Sep-1988 Dick Grune (dick) at dick
  279. For long text sequences (say hundreds of thousands of tokens),
  280. the hashing is not really efficient any more since too many
  281. spurious matches occur. Therefore, the forward reference table is
  282. scanned a second time, eliminating from any chain all references to
  283. runs that do not end in the same token. For the UNIX manuals this
  284. reduced the number of matches from 91.9% to 1.9% (of which 0.06%
  285. were genuine).
  286. 30-Aug-1988 Dick Grune (dick) at dick
  287. For compatibility, NextTop has been rewritten to yield true or
  288. false and to accept a pointer to a run as a parameter.
  289. 30-Aug-1988 Dick Grune (dick) at dick
  290. When trying to find line-number and lseek position to beginnings
  291. and ends of runs found, the whole set of runs was scanned for each
  292. line in each file. Now only the runs belonging to that file are
  293. scanned; to this end another linked list has been braided through
  294. the data structures (tx_chunk).
  295. 30-Aug-1988 Dick Grune (dick) at dick
  296. The longest-common-substring algorithm was called much too often,
  297. mainly because the forward references made by hashing suffered from
  298. pollution. If you have say 1000 tokens and a hash range of say
  299. 10000, about 5 % of the hashings will be false matches, i.e. 50
  300. matches, which is quite a lot on a natural number of 2 to 3 matches.
  301. Improved by doing a second check in make_forw_ref().
  302. 12-Jun-1988 Dick Grune (dick) at dick
  303. Installed a Lisp version supplied by Gertjan Akkerman.
  304. 15-Jan-1988 Dick Grune (dick) at dick
  305. Added register declarations all over the place.
  306. 14-Jan-1988 Dick Grune (dick) at dick
  307. It is often useful to match a piece of code exactly, especially
  308. when function names (or, even more so, macro names) are involved.
  309. What one would want is having all the letters in the text array,
  310. but this is kind of hard, since each entry is one lexical item.
  311. This means that under the -F option each letter is a lex item, and
  312. normally each tag is a lex item; this requires two lex grammars in
  313. one program; no good. So, on the -F flag we hash the identifier
  314. into one lex item, which is hopefully characteristic enough. It
  315. works.
  316. 30-Sep-1987 Dick Grune (dick) at dick
  317. Some cosmetics.
  318. 31-Aug-1987 Dick Grune (dick) at dick
  319. Moved the whole thing to the SUN (while testing on a VAX and a
  320. MC68000)
  321. 16-Aug-1987 Dick Grune (dick) at dick
  322. The test program lang.c is no longer a main program, but rather a
  323. subroutine called in main() in sim.c, through the command line
  324. option -1 or -2.
  325. 23-Apr-1987 Dick Grune (dick) at tjalk
  326. Changed the name 'index' into 'elem_of', because of compatibility
  327. problems on different Unices. Added a declaration for it in
  328. the file algollike.c
  329. 10-Mar-1987 Dick Grune (dick) at tjalk
  330. Changed the printing of the header of a run so that:
  331. - long file names will no longer be truncated
  332. - the run length is displayed
  333. 27-Jan-1987 Dick Grune (dick) at tjalk
  334. Switched it right off again! Getting them in textual order is
  335. still more unpleasant, since now you cannot find the important
  336. ones if their are more than a few runs.
  337. 27-Jan-1987 Dick Grune (dick) at tjalk
  338. Going to experiment with leaving out the sorting; just all the
  339. runs, in the order we meet them. Should be as good or better.
  340. Comparisons of more than 100 runs are very rare anyway, so the
  341. fact that those over a 100 are rejected is probably no great
  342. help. Getting them in a funny order is a nuisance, however. Down
  343. with featurism. Just to be safe, present version saved as
  344. 870127.SV
  345. 26-Dec-1986 Dick Grune (dick) at tjalk
  346. Names of overall parameters in params.h changed to more uniformity.
  347. 26-Dec-1986 Dick Grune (dick) at tjalk
  348. Since the top package and the instantiation system have grown
  349. apart so much, I have integrated the old top package into sim,
  350. i.e., done the instantiation by hand. This removes top.g and
  351. top.p, and will save outsiders from wondering what is going on
  352. here.
  353. 23-Dec-1986 Dick Grune (dick) at tjalk
  354. Use setbuf to print unbuffered while reading the files (lex core
  355. dumps, other mishaps) and print buffered while printing the real
  356. output (for speed).
  357. 30-Nov-1986 Dick Grune (dick) at tjalk
  358. Various small changes in *lang.l:
  359. ; ignored conditionally (!options['f'])
  360. new format for tokens in struct idf
  361. cosmetics: macro Layout, macro UnsafeComChar, no \n
  362. in character denotations, more than one char
  363. in a char denotations in Pascal, etc.
  364. 30-Nov-1986 Dick Grune (dick) at tjalk
  365. Added a Modula-2 version.
  366. 29-Nov-1986 Dick Grune (dick) at tjalk
  367. Restricting tokens to the ASCII95 character set is really too
  368. severe: some languages have many more reserved words (COBOL!).
  369. Corrected this by adding a couple of '&0377' in strategic places.
  370. Added a routine for printing the 8-bit beasties: show_token().
  371. 15-Aug-1986 Dick Grune (dick) at tjalk
  372. Since the ; is superfluous in both C and Pascal, it is now ignored
  373. by clang.l and pascallang.l
  374. 15-Aug-1986 Dick Grune (dick) at tjalk
  375. The code in CheckRun in Xlang.l was incorrect in that it used the
  376. wrong criterion for throwing away trailing garbage. I've taken
  377. CheckRun etc. out of the Xlang.l-s and turned them into a module
  378. "algollike.c". Made a cleaner interface and avoided duplication of
  379. code.
  380. 02-Jul-1986 Dick Grune (dick) at tjalk
  381. Looking backwards in compare.c to see if we are in the middle of a
  382. run is an atavism. You can be and still be all right, e.g., if
  383. part of the run was rejected as not fitting for a function.
  384. Removed from compare.c.
  385. 10-Jun-1986 Dick Grune (dick) at tjalk
  386. The function hash_code() in hash.c could yield a negative value;
  387. corrected.
  388. 09-Jun-1986 Dick Grune (dick) at tjalk
  389. Changed the name of the file text.h to sim.h. Sim.h is more
  390. appropriate and text.h sounds as if it belongs to text.l, with
  391. which it has no connection.
  392. 04-Jun-1986 Dick Grune (dick) at tjalk
  393. After having looked at a couple of hash functions and having done
  394. some calculations on the number of duplicates normally encountered
  395. in hash functions, I conclude that our function in hash.c is quite
  396. good. Removed all the statistics-gathering stuff.
  397. Actually, hash_table[] is not the hash table at all; it is a
  398. forward reference table; likewise, the real hash table was called
  399. last[]. Renamed both.
  400. There is a way to keep the hash table local without putting it on
  401. the stack: use malloc().
  402. 02-Jun-1986 Dick Grune (dick) at tjalk
  403. Added a simple lex file for text: each word is condensed into a
  404. hash code which is mapped on the ASCII95 character set. This
  405. turns out to be quite effective.
  406. 01-Jun-1986 Dick Grune (dick) at tjalk
  407. The macros cput(tk) and c_eol() both have a return in them, so any
  408. code after them may not be executed -> they have to be last in an
  409. entry. But they weren't, in many places; I can't imagine why it
  410. all worked nevertheless. They have been renamed return_tk(tk) and
  411. return_eol() and the entries have been restructured.
  412. 30-May-1986 Dick Grune (dick) at tjalk
  413. Moved the string and character entries in clang.l and pascallang.l
  414. to a place behind the comment entries, to avoid strings (and
  415. characters) being recognized inside comments. I first thought
  416. this would not happen, but as Maarten pointed out, if both
  417. interpretations have the same length, lex will take the first
  418. entry. Now this will happen if the string occupies the whole line
  419. that would otherwise be taken as a comment. In short,
  420. /*
  421. "hallo"
  422. */
  423. would return ".
  424. 28-May-1986 Dick Grune (dick) at tjalk
  425. Added -d option, to display the output in diff(1) format (courtesy
  426. of Maarten van der Meulen).
  427. Rewrote the lexical parsing of comments (likewise courtesy Maarten
  428. van der Meulen).
  429. 20-May-1986 Dick Grune (dick) at tjalk
  430. Added a routine to convert identifiers to lower case in
  431. pascallang.l .
  432. 19-May-1986 Dick Grune (dick) at tjalk
  433. Added -a option, to quickly check antecedent of a file (courtesy
  434. of Maarten van der Meulen).
  435. 18-May-1986 Dick Grune (dick) at tjalk
  436. Brought everything under RCS/CVS.
  437. 18-Mar-1986 Dick Grune (dick) at tjalk
  438. Added modifications by Paul Bame (hp-lsd!paul@hp-labs) to have an
  439. option -w to set the page width.
  440. 21-Feb-1986 Dick Grune (dick) at tjalk
  441. Took array last[N_HASH] out of make_hash() in hash.c, due to stack
  442. overflow on the Gould (reported by George Walker
  443. [email protected])
  444. 16-Feb-1986 Dick Grune (dick) at tjalk
  445. Corrected some subtractions that caused unsigned ints to turn
  446. pseudo-negative. (Reported by jaap@mcvax)
  447. 11-Jan-1986 Dick Grune (dick) at tjalk
  448. Touched up for distribution.
  449. 10-Jan-1986 Dick Grune (dick) at tjalk
  450. Fill_line was not called for empty lines, which caused them to be
  451. printed as repetitions of the previous line.
  452. 24-Dec-1985 Dick Grune (dick) at tjalk
  453. Reduced hash table to a single array of indices; it is used only
  454. in one place, which makes it very easy to make it (the hash table)
  455. optional. General tune-up of everything. This seems to be
  456. another stable "final" version.
  457. 14-Dec-1985 Dick Grune (dick) at tjalk
  458. Some experiments with hash formulas:
  459. h = (h OP CST) + *p++ OP CST yields right wrong
  460. * 96 - 32 205 562
  461. * 96 - 2 205 560
  462. * 96 205 560
  463. * 97 205 559
  464. << 0 66 3128
  465. << 1 203 555
  466. << 2 205 536
  467. << 7 203 540
  468. Conclusion: it doesn't matter, unless you do it wrong.
  469. 01-Oct-1983 Dic8k Grune (dick) at vu44
  470. Oldest known files.
  471. # This file is part of the software similarity tester SIM.
  472. # Written by Dick Grune, Vrije Universiteit, Amsterdam.
  473. # $Id: ChangeLog,v 2.12 2007/08/27 09:57:30 dick Exp $
  474. #