zlib_how.html 29 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549
  1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
  2. "http://www.w3.org/TR/html4/loose.dtd">
  3. <html>
  4. <head>
  5. <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
  6. <title>zlib Usage Example</title>
  7. <!-- Copyright (c) 2004-2023 Mark Adler. -->
  8. </head>
  9. <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#00A000">
  10. <h2 align="center"> zlib Usage Example </h2>
  11. We often get questions about how the <tt>deflate()</tt> and <tt>inflate()</tt> functions should be used.
  12. Users wonder when they should provide more input, when they should use more output,
  13. what to do with a <tt>Z_BUF_ERROR</tt>, how to make sure the process terminates properly, and
  14. so on. So for those who have read <tt>zlib.h</tt> (a few times), and
  15. would like further edification, below is an annotated example in C of simple routines to compress and decompress
  16. from an input file to an output file using <tt>deflate()</tt> and <tt>inflate()</tt> respectively. The
  17. annotations are interspersed between lines of the code. So please read between the lines.
  18. We hope this helps explain some of the intricacies of <em>zlib</em>.
  19. <p>
  20. Without further ado, here is the program <a href="zpipe.c"><tt>zpipe.c</tt></a>:
  21. <pre><b>
  22. /* zpipe.c: example of proper use of zlib's inflate() and deflate()
  23. Not copyrighted -- provided to the public domain
  24. Version 1.4 11 December 2005 Mark Adler */
  25. /* Version history:
  26. 1.0 30 Oct 2004 First version
  27. 1.1 8 Nov 2004 Add void casting for unused return values
  28. Use switch statement for inflate() return values
  29. 1.2 9 Nov 2004 Add assertions to document zlib guarantees
  30. 1.3 6 Apr 2005 Remove incorrect assertion in inf()
  31. 1.4 11 Dec 2005 Add hack to avoid MSDOS end-of-line conversions
  32. Avoid some compiler warnings for input and output buffers
  33. */
  34. </b></pre><!-- -->
  35. We now include the header files for the required definitions. From
  36. <tt>stdio.h</tt> we use <tt>fopen()</tt>, <tt>fread()</tt>, <tt>fwrite()</tt>,
  37. <tt>feof()</tt>, <tt>ferror()</tt>, and <tt>fclose()</tt> for file i/o, and
  38. <tt>fputs()</tt> for error messages. From <tt>string.h</tt> we use
  39. <tt>strcmp()</tt> for command line argument processing.
  40. From <tt>assert.h</tt> we use the <tt>assert()</tt> macro.
  41. From <tt>zlib.h</tt>
  42. we use the basic compression functions <tt>deflateInit()</tt>,
  43. <tt>deflate()</tt>, and <tt>deflateEnd()</tt>, and the basic decompression
  44. functions <tt>inflateInit()</tt>, <tt>inflate()</tt>, and
  45. <tt>inflateEnd()</tt>.
  46. <pre><b>
  47. #include &lt;stdio.h&gt;
  48. #include &lt;string.h&gt;
  49. #include &lt;assert.h&gt;
  50. #include "zlib.h"
  51. </b></pre><!-- -->
  52. This is an ugly hack required to avoid corruption of the input and output data on
  53. Windows/MS-DOS systems. Without this, those systems would assume that the input and output
  54. files are text, and try to convert the end-of-line characters from one standard to
  55. another. That would corrupt binary data, and in particular would render the compressed data unusable.
  56. This sets the input and output to binary which suppresses the end-of-line conversions.
  57. <tt>SET_BINARY_MODE()</tt> will be used later on <tt>stdin</tt> and <tt>stdout</tt>, at the beginning of <tt>main()</tt>.
  58. <pre><b>
  59. #if defined(MSDOS) || defined(OS2) || defined(WIN32) || defined(__CYGWIN__)
  60. # include &lt;fcntl.h&gt;
  61. # include &lt;io.h&gt;
  62. # define SET_BINARY_MODE(file) setmode(fileno(file), O_BINARY)
  63. #else
  64. # define SET_BINARY_MODE(file)
  65. #endif
  66. </b></pre><!-- -->
  67. <tt>CHUNK</tt> is simply the buffer size for feeding data to and pulling data
  68. from the <em>zlib</em> routines. Larger buffer sizes would be more efficient,
  69. especially for <tt>inflate()</tt>. If the memory is available, buffers sizes
  70. on the order of 128K or 256K bytes should be used.
  71. <pre><b>
  72. #define CHUNK 16384
  73. </b></pre><!-- -->
  74. The <tt>def()</tt> routine compresses data from an input file to an output file. The output data
  75. will be in the <em>zlib</em> format, which is different from the <em>gzip</em> or <em>zip</em>
  76. formats. The <em>zlib</em> format has a very small header of only two bytes to identify it as
  77. a <em>zlib</em> stream and to provide decoding information, and a four-byte trailer with a fast
  78. check value to verify the integrity of the uncompressed data after decoding.
  79. <pre><b>
  80. /* Compress from file source to file dest until EOF on source.
  81. def() returns Z_OK on success, Z_MEM_ERROR if memory could not be
  82. allocated for processing, Z_STREAM_ERROR if an invalid compression
  83. level is supplied, Z_VERSION_ERROR if the version of zlib.h and the
  84. version of the library linked do not match, or Z_ERRNO if there is
  85. an error reading or writing the files. */
  86. int def(FILE *source, FILE *dest, int level)
  87. {
  88. </b></pre>
  89. Here are the local variables for <tt>def()</tt>. <tt>ret</tt> will be used for <em>zlib</em>
  90. return codes. <tt>flush</tt> will keep track of the current flushing state for <tt>deflate()</tt>,
  91. which is either no flushing, or flush to completion after the end of the input file is reached.
  92. <tt>have</tt> is the amount of data returned from <tt>deflate()</tt>. The <tt>strm</tt> structure
  93. is used to pass information to and from the <em>zlib</em> routines, and to maintain the
  94. <tt>deflate()</tt> state. <tt>in</tt> and <tt>out</tt> are the input and output buffers for
  95. <tt>deflate()</tt>.
  96. <pre><b>
  97. int ret, flush;
  98. unsigned have;
  99. z_stream strm;
  100. unsigned char in[CHUNK];
  101. unsigned char out[CHUNK];
  102. </b></pre><!-- -->
  103. The first thing we do is to initialize the <em>zlib</em> state for compression using
  104. <tt>deflateInit()</tt>. This must be done before the first use of <tt>deflate()</tt>.
  105. The <tt>zalloc</tt>, <tt>zfree</tt>, and <tt>opaque</tt> fields in the <tt>strm</tt>
  106. structure must be initialized before calling <tt>deflateInit()</tt>. Here they are
  107. set to the <em>zlib</em> constant <tt>Z_NULL</tt> to request that <em>zlib</em> use
  108. the default memory allocation routines. An application may also choose to provide
  109. custom memory allocation routines here. <tt>deflateInit()</tt> will allocate on the
  110. order of 256K bytes for the internal state.
  111. (See <a href="zlib_tech.html"><em>zlib Technical Details</em></a>.)
  112. <p>
  113. <tt>deflateInit()</tt> is called with a pointer to the structure to be initialized and
  114. the compression level, which is an integer in the range of -1 to 9. Lower compression
  115. levels result in faster execution, but less compression. Higher levels result in
  116. greater compression, but slower execution. The <em>zlib</em> constant Z_DEFAULT_COMPRESSION,
  117. equal to -1,
  118. provides a good compromise between compression and speed and is equivalent to level 6.
  119. Level 0 actually does no compression at all, and in fact expands the data slightly to produce
  120. the <em>zlib</em> format (it is not a byte-for-byte copy of the input).
  121. More advanced applications of <em>zlib</em>
  122. may use <tt>deflateInit2()</tt> here instead. Such an application may want to reduce how
  123. much memory will be used, at some price in compression. Or it may need to request a
  124. <em>gzip</em> header and trailer instead of a <em>zlib</em> header and trailer, or raw
  125. encoding with no header or trailer at all.
  126. <p>
  127. We must check the return value of <tt>deflateInit()</tt> against the <em>zlib</em> constant
  128. <tt>Z_OK</tt> to make sure that it was able to
  129. allocate memory for the internal state, and that the provided arguments were valid.
  130. <tt>deflateInit()</tt> will also check that the version of <em>zlib</em> that the <tt>zlib.h</tt>
  131. file came from matches the version of <em>zlib</em> actually linked with the program. This
  132. is especially important for environments in which <em>zlib</em> is a shared library.
  133. <p>
  134. Note that an application can initialize multiple, independent <em>zlib</em> streams, which can
  135. operate in parallel. The state information maintained in the structure allows the <em>zlib</em>
  136. routines to be reentrant.
  137. <pre><b>
  138. /* allocate deflate state */
  139. strm.zalloc = Z_NULL;
  140. strm.zfree = Z_NULL;
  141. strm.opaque = Z_NULL;
  142. ret = deflateInit(&amp;strm, level);
  143. if (ret != Z_OK)
  144. return ret;
  145. </b></pre><!-- -->
  146. With the pleasantries out of the way, now we can get down to business. The outer <tt>do</tt>-loop
  147. reads all of the input file and exits at the bottom of the loop once end-of-file is reached.
  148. This loop contains the only call of <tt>deflate()</tt>. So we must make sure that all of the
  149. input data has been processed and that all of the output data has been generated and consumed
  150. before we fall out of the loop at the bottom.
  151. <pre><b>
  152. /* compress until end of file */
  153. do {
  154. </b></pre>
  155. We start off by reading data from the input file. The number of bytes read is put directly
  156. into <tt>avail_in</tt>, and a pointer to those bytes is put into <tt>next_in</tt>. We also
  157. check to see if end-of-file on the input has been reached using feof().
  158. If we are at the end of file, then <tt>flush</tt> is set to the
  159. <em>zlib</em> constant <tt>Z_FINISH</tt>, which is later passed to <tt>deflate()</tt> to
  160. indicate that this is the last chunk of input data to compress.
  161. If we are not yet at the end of the input, then the <em>zlib</em>
  162. constant <tt>Z_NO_FLUSH</tt> will be passed to <tt>deflate</tt> to indicate that we are still
  163. in the middle of the uncompressed data.
  164. <p>
  165. If there is an error in reading from the input file, the process is aborted with
  166. <tt>deflateEnd()</tt> being called to free the allocated <em>zlib</em> state before returning
  167. the error. We wouldn't want a memory leak, now would we? <tt>deflateEnd()</tt> can be called
  168. at any time after the state has been initialized. Once that's done, <tt>deflateInit()</tt> (or
  169. <tt>deflateInit2()</tt>) would have to be called to start a new compression process. There is
  170. no point here in checking the <tt>deflateEnd()</tt> return code. The deallocation can't fail.
  171. <pre><b>
  172. strm.avail_in = fread(in, 1, CHUNK, source);
  173. if (ferror(source)) {
  174. (void)deflateEnd(&amp;strm);
  175. return Z_ERRNO;
  176. }
  177. flush = feof(source) ? Z_FINISH : Z_NO_FLUSH;
  178. strm.next_in = in;
  179. </b></pre><!-- -->
  180. The inner <tt>do</tt>-loop passes our chunk of input data to <tt>deflate()</tt>, and then
  181. keeps calling <tt>deflate()</tt> until it is done producing output. Once there is no more
  182. new output, <tt>deflate()</tt> is guaranteed to have consumed all of the input, i.e.,
  183. <tt>avail_in</tt> will be zero.
  184. <pre><b>
  185. /* run deflate() on input until output buffer not full, finish
  186. compression if all of source has been read in */
  187. do {
  188. </b></pre>
  189. Output space is provided to <tt>deflate()</tt> by setting <tt>avail_out</tt> to the number
  190. of available output bytes and <tt>next_out</tt> to a pointer to that space.
  191. <pre><b>
  192. strm.avail_out = CHUNK;
  193. strm.next_out = out;
  194. </b></pre>
  195. Now we call the compression engine itself, <tt>deflate()</tt>. It takes as many of the
  196. <tt>avail_in</tt> bytes at <tt>next_in</tt> as it can process, and writes as many as
  197. <tt>avail_out</tt> bytes to <tt>next_out</tt>. Those counters and pointers are then
  198. updated past the input data consumed and the output data written. It is the amount of
  199. output space available that may limit how much input is consumed.
  200. Hence the inner loop to make sure that
  201. all of the input is consumed by providing more output space each time. Since <tt>avail_in</tt>
  202. and <tt>next_in</tt> are updated by <tt>deflate()</tt>, we don't have to mess with those
  203. between <tt>deflate()</tt> calls until it's all used up.
  204. <p>
  205. The parameters to <tt>deflate()</tt> are a pointer to the <tt>strm</tt> structure containing
  206. the input and output information and the internal compression engine state, and a parameter
  207. indicating whether and how to flush data to the output. Normally <tt>deflate</tt> will consume
  208. several K bytes of input data before producing any output (except for the header), in order
  209. to accumulate statistics on the data for optimum compression. It will then put out a burst of
  210. compressed data, and proceed to consume more input before the next burst. Eventually,
  211. <tt>deflate()</tt>
  212. must be told to terminate the stream, complete the compression with provided input data, and
  213. write out the trailer check value. <tt>deflate()</tt> will continue to compress normally as long
  214. as the flush parameter is <tt>Z_NO_FLUSH</tt>. Once the <tt>Z_FINISH</tt> parameter is provided,
  215. <tt>deflate()</tt> will begin to complete the compressed output stream. However depending on how
  216. much output space is provided, <tt>deflate()</tt> may have to be called several times until it
  217. has provided the complete compressed stream, even after it has consumed all of the input. The flush
  218. parameter must continue to be <tt>Z_FINISH</tt> for those subsequent calls.
  219. <p>
  220. There are other values of the flush parameter that are used in more advanced applications. You can
  221. force <tt>deflate()</tt> to produce a burst of output that encodes all of the input data provided
  222. so far, even if it wouldn't have otherwise, for example to control data latency on a link with
  223. compressed data. You can also ask that <tt>deflate()</tt> do that as well as erase any history up to
  224. that point so that what follows can be decompressed independently, for example for random access
  225. applications. Both requests will degrade compression by an amount depending on how often such
  226. requests are made.
  227. <p>
  228. <tt>deflate()</tt> has a return value that can indicate errors, yet we do not check it here. Why
  229. not? Well, it turns out that <tt>deflate()</tt> can do no wrong here. Let's go through
  230. <tt>deflate()</tt>'s return values and dispense with them one by one. The possible values are
  231. <tt>Z_OK</tt>, <tt>Z_STREAM_END</tt>, <tt>Z_STREAM_ERROR</tt>, or <tt>Z_BUF_ERROR</tt>. <tt>Z_OK</tt>
  232. is, well, ok. <tt>Z_STREAM_END</tt> is also ok and will be returned for the last call of
  233. <tt>deflate()</tt>. This is already guaranteed by calling <tt>deflate()</tt> with <tt>Z_FINISH</tt>
  234. until it has no more output. <tt>Z_STREAM_ERROR</tt> is only possible if the stream is not
  235. initialized properly, but we did initialize it properly. There is no harm in checking for
  236. <tt>Z_STREAM_ERROR</tt> here, for example to check for the possibility that some
  237. other part of the application inadvertently clobbered the memory containing the <em>zlib</em> state.
  238. <tt>Z_BUF_ERROR</tt> will be explained further below, but
  239. suffice it to say that this is simply an indication that <tt>deflate()</tt> could not consume
  240. more input or produce more output. <tt>deflate()</tt> can be called again with more output space
  241. or more available input, which it will be in this code.
  242. <pre><b>
  243. ret = deflate(&amp;strm, flush); /* no bad return value */
  244. assert(ret != Z_STREAM_ERROR); /* state not clobbered */
  245. </b></pre>
  246. Now we compute how much output <tt>deflate()</tt> provided on the last call, which is the
  247. difference between how much space was provided before the call, and how much output space
  248. is still available after the call. Then that data, if any, is written to the output file.
  249. We can then reuse the output buffer for the next call of <tt>deflate()</tt>. Again if there
  250. is a file i/o error, we call <tt>deflateEnd()</tt> before returning to avoid a memory leak.
  251. <pre><b>
  252. have = CHUNK - strm.avail_out;
  253. if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
  254. (void)deflateEnd(&amp;strm);
  255. return Z_ERRNO;
  256. }
  257. </b></pre>
  258. The inner <tt>do</tt>-loop is repeated until the last <tt>deflate()</tt> call fails to fill the
  259. provided output buffer. Then we know that <tt>deflate()</tt> has done as much as it can with
  260. the provided input, and that all of that input has been consumed. We can then fall out of this
  261. loop and reuse the input buffer.
  262. <p>
  263. The way we tell that <tt>deflate()</tt> has no more output is by seeing that it did not fill
  264. the output buffer, leaving <tt>avail_out</tt> greater than zero. However suppose that
  265. <tt>deflate()</tt> has no more output, but just so happened to exactly fill the output buffer!
  266. <tt>avail_out</tt> is zero, and we can't tell that <tt>deflate()</tt> has done all it can.
  267. As far as we know, <tt>deflate()</tt>
  268. has more output for us. So we call it again. But now <tt>deflate()</tt> produces no output
  269. at all, and <tt>avail_out</tt> remains unchanged as <tt>CHUNK</tt>. That <tt>deflate()</tt> call
  270. wasn't able to do anything, either consume input or produce output, and so it returns
  271. <tt>Z_BUF_ERROR</tt>. (See, I told you I'd cover this later.) However this is not a problem at
  272. all. Now we finally have the desired indication that <tt>deflate()</tt> is really done,
  273. and so we drop out of the inner loop to provide more input to <tt>deflate()</tt>.
  274. <p>
  275. With <tt>flush</tt> set to <tt>Z_FINISH</tt>, this final set of <tt>deflate()</tt> calls will
  276. complete the output stream. Once that is done, subsequent calls of <tt>deflate()</tt> would return
  277. <tt>Z_STREAM_ERROR</tt> if the flush parameter is not <tt>Z_FINISH</tt>, and do no more processing
  278. until the state is reinitialized.
  279. <p>
  280. Some applications of <em>zlib</em> have two loops that call <tt>deflate()</tt>
  281. instead of the single inner loop we have here. The first loop would call
  282. without flushing and feed all of the data to <tt>deflate()</tt>. The second loop would call
  283. <tt>deflate()</tt> with no more
  284. data and the <tt>Z_FINISH</tt> parameter to complete the process. As you can see from this
  285. example, that can be avoided by simply keeping track of the current flush state.
  286. <pre><b>
  287. } while (strm.avail_out == 0);
  288. assert(strm.avail_in == 0); /* all input will be used */
  289. </b></pre><!-- -->
  290. Now we check to see if we have already processed all of the input file. That information was
  291. saved in the <tt>flush</tt> variable, so we see if that was set to <tt>Z_FINISH</tt>. If so,
  292. then we're done and we fall out of the outer loop. We're guaranteed to get <tt>Z_STREAM_END</tt>
  293. from the last <tt>deflate()</tt> call, since we ran it until the last chunk of input was
  294. consumed and all of the output was generated.
  295. <pre><b>
  296. /* done when last data in file processed */
  297. } while (flush != Z_FINISH);
  298. assert(ret == Z_STREAM_END); /* stream will be complete */
  299. </b></pre><!-- -->
  300. The process is complete, but we still need to deallocate the state to avoid a memory leak
  301. (or rather more like a memory hemorrhage if you didn't do this). Then
  302. finally we can return with a happy return value.
  303. <pre><b>
  304. /* clean up and return */
  305. (void)deflateEnd(&amp;strm);
  306. return Z_OK;
  307. }
  308. </b></pre><!-- -->
  309. Now we do the same thing for decompression in the <tt>inf()</tt> routine. <tt>inf()</tt>
  310. decompresses what is hopefully a valid <em>zlib</em> stream from the input file and writes the
  311. uncompressed data to the output file. Much of the discussion above for <tt>def()</tt>
  312. applies to <tt>inf()</tt> as well, so the discussion here will focus on the differences between
  313. the two.
  314. <pre><b>
  315. /* Decompress from file source to file dest until stream ends or EOF.
  316. inf() returns Z_OK on success, Z_MEM_ERROR if memory could not be
  317. allocated for processing, Z_DATA_ERROR if the deflate data is
  318. invalid or incomplete, Z_VERSION_ERROR if the version of zlib.h and
  319. the version of the library linked do not match, or Z_ERRNO if there
  320. is an error reading or writing the files. */
  321. int inf(FILE *source, FILE *dest)
  322. {
  323. </b></pre>
  324. The local variables have the same functionality as they do for <tt>def()</tt>. The
  325. only difference is that there is no <tt>flush</tt> variable, since <tt>inflate()</tt>
  326. can tell from the <em>zlib</em> stream itself when the stream is complete.
  327. <pre><b>
  328. int ret;
  329. unsigned have;
  330. z_stream strm;
  331. unsigned char in[CHUNK];
  332. unsigned char out[CHUNK];
  333. </b></pre><!-- -->
  334. The initialization of the state is the same, except that there is no compression level,
  335. of course, and two more elements of the structure are initialized. <tt>avail_in</tt>
  336. and <tt>next_in</tt> must be initialized before calling <tt>inflateInit()</tt>. This
  337. is because the application has the option to provide the start of the zlib stream in
  338. order for <tt>inflateInit()</tt> to have access to information about the compression
  339. method to aid in memory allocation. In the current implementation of <em>zlib</em>
  340. (up through versions 1.2.x), the method-dependent memory allocations are deferred to the first call of
  341. <tt>inflate()</tt> anyway. However those fields must be initialized since later versions
  342. of <em>zlib</em> that provide more compression methods may take advantage of this interface.
  343. In any case, no decompression is performed by <tt>inflateInit()</tt>, so the
  344. <tt>avail_out</tt> and <tt>next_out</tt> fields do not need to be initialized before calling.
  345. <p>
  346. Here <tt>avail_in</tt> is set to zero and <tt>next_in</tt> is set to <tt>Z_NULL</tt> to
  347. indicate that no input data is being provided.
  348. <pre><b>
  349. /* allocate inflate state */
  350. strm.zalloc = Z_NULL;
  351. strm.zfree = Z_NULL;
  352. strm.opaque = Z_NULL;
  353. strm.avail_in = 0;
  354. strm.next_in = Z_NULL;
  355. ret = inflateInit(&amp;strm);
  356. if (ret != Z_OK)
  357. return ret;
  358. </b></pre><!-- -->
  359. The outer <tt>do</tt>-loop decompresses input until <tt>inflate()</tt> indicates
  360. that it has reached the end of the compressed data and has produced all of the uncompressed
  361. output. This is in contrast to <tt>def()</tt> which processes all of the input file.
  362. If end-of-file is reached before the compressed data self-terminates, then the compressed
  363. data is incomplete and an error is returned.
  364. <pre><b>
  365. /* decompress until deflate stream ends or end of file */
  366. do {
  367. </b></pre>
  368. We read input data and set the <tt>strm</tt> structure accordingly. If we've reached the
  369. end of the input file, then we leave the outer loop and report an error, since the
  370. compressed data is incomplete. Note that we may read more data than is eventually consumed
  371. by <tt>inflate()</tt>, if the input file continues past the <em>zlib</em> stream.
  372. For applications where <em>zlib</em> streams are embedded in other data, this routine would
  373. need to be modified to return the unused data, or at least indicate how much of the input
  374. data was not used, so the application would know where to pick up after the <em>zlib</em> stream.
  375. <pre><b>
  376. strm.avail_in = fread(in, 1, CHUNK, source);
  377. if (ferror(source)) {
  378. (void)inflateEnd(&amp;strm);
  379. return Z_ERRNO;
  380. }
  381. if (strm.avail_in == 0)
  382. break;
  383. strm.next_in = in;
  384. </b></pre><!-- -->
  385. The inner <tt>do</tt>-loop has the same function it did in <tt>def()</tt>, which is to
  386. keep calling <tt>inflate()</tt> until has generated all of the output it can with the
  387. provided input.
  388. <pre><b>
  389. /* run inflate() on input until output buffer not full */
  390. do {
  391. </b></pre>
  392. Just like in <tt>def()</tt>, the same output space is provided for each call of <tt>inflate()</tt>.
  393. <pre><b>
  394. strm.avail_out = CHUNK;
  395. strm.next_out = out;
  396. </b></pre>
  397. Now we run the decompression engine itself. There is no need to adjust the flush parameter, since
  398. the <em>zlib</em> format is self-terminating. The main difference here is that there are
  399. return values that we need to pay attention to. <tt>Z_DATA_ERROR</tt>
  400. indicates that <tt>inflate()</tt> detected an error in the <em>zlib</em> compressed data format,
  401. which means that either the data is not a <em>zlib</em> stream to begin with, or that the data was
  402. corrupted somewhere along the way since it was compressed. The other error to be processed is
  403. <tt>Z_MEM_ERROR</tt>, which can occur since memory allocation is deferred until <tt>inflate()</tt>
  404. needs it, unlike <tt>deflate()</tt>, whose memory is allocated at the start by <tt>deflateInit()</tt>.
  405. <p>
  406. Advanced applications may use
  407. <tt>deflateSetDictionary()</tt> to prime <tt>deflate()</tt> with a set of likely data to improve the
  408. first 32K or so of compression. This is noted in the <em>zlib</em> header, so <tt>inflate()</tt>
  409. requests that that dictionary be provided before it can start to decompress. Without the dictionary,
  410. correct decompression is not possible. For this routine, we have no idea what the dictionary is,
  411. so the <tt>Z_NEED_DICT</tt> indication is converted to a <tt>Z_DATA_ERROR</tt>.
  412. <p>
  413. <tt>inflate()</tt> can also return <tt>Z_STREAM_ERROR</tt>, which should not be possible here,
  414. but could be checked for as noted above for <tt>def()</tt>. <tt>Z_BUF_ERROR</tt> does not need to be
  415. checked for here, for the same reasons noted for <tt>def()</tt>. <tt>Z_STREAM_END</tt> will be
  416. checked for later.
  417. <pre><b>
  418. ret = inflate(&amp;strm, Z_NO_FLUSH);
  419. assert(ret != Z_STREAM_ERROR); /* state not clobbered */
  420. switch (ret) {
  421. case Z_NEED_DICT:
  422. ret = Z_DATA_ERROR; /* and fall through */
  423. case Z_DATA_ERROR:
  424. case Z_MEM_ERROR:
  425. (void)inflateEnd(&amp;strm);
  426. return ret;
  427. }
  428. </b></pre>
  429. The output of <tt>inflate()</tt> is handled identically to that of <tt>deflate()</tt>.
  430. <pre><b>
  431. have = CHUNK - strm.avail_out;
  432. if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
  433. (void)inflateEnd(&amp;strm);
  434. return Z_ERRNO;
  435. }
  436. </b></pre>
  437. The inner <tt>do</tt>-loop ends when <tt>inflate()</tt> has no more output as indicated
  438. by not filling the output buffer, just as for <tt>deflate()</tt>. In this case, we cannot
  439. assert that <tt>strm.avail_in</tt> will be zero, since the deflate stream may end before the file
  440. does.
  441. <pre><b>
  442. } while (strm.avail_out == 0);
  443. </b></pre><!-- -->
  444. The outer <tt>do</tt>-loop ends when <tt>inflate()</tt> reports that it has reached the
  445. end of the input <em>zlib</em> stream, has completed the decompression and integrity
  446. check, and has provided all of the output. This is indicated by the <tt>inflate()</tt>
  447. return value <tt>Z_STREAM_END</tt>. The inner loop is guaranteed to leave <tt>ret</tt>
  448. equal to <tt>Z_STREAM_END</tt> if the last chunk of the input file read contained the end
  449. of the <em>zlib</em> stream. So if the return value is not <tt>Z_STREAM_END</tt>, the
  450. loop continues to read more input.
  451. <pre><b>
  452. /* done when inflate() says it's done */
  453. } while (ret != Z_STREAM_END);
  454. </b></pre><!-- -->
  455. At this point, decompression successfully completed, or we broke out of the loop due to no
  456. more data being available from the input file. If the last <tt>inflate()</tt> return value
  457. is not <tt>Z_STREAM_END</tt>, then the <em>zlib</em> stream was incomplete and a data error
  458. is returned. Otherwise, we return with a happy return value. Of course, <tt>inflateEnd()</tt>
  459. is called first to avoid a memory leak.
  460. <pre><b>
  461. /* clean up and return */
  462. (void)inflateEnd(&amp;strm);
  463. return ret == Z_STREAM_END ? Z_OK : Z_DATA_ERROR;
  464. }
  465. </b></pre><!-- -->
  466. That ends the routines that directly use <em>zlib</em>. The following routines make this
  467. a command-line program by running data through the above routines from <tt>stdin</tt> to
  468. <tt>stdout</tt>, and handling any errors reported by <tt>def()</tt> or <tt>inf()</tt>.
  469. <p>
  470. <tt>zerr()</tt> is used to interpret the possible error codes from <tt>def()</tt>
  471. and <tt>inf()</tt>, as detailed in their comments above, and print out an error message.
  472. Note that these are only a subset of the possible return values from <tt>deflate()</tt>
  473. and <tt>inflate()</tt>.
  474. <pre><b>
  475. /* report a zlib or i/o error */
  476. void zerr(int ret)
  477. {
  478. fputs("zpipe: ", stderr);
  479. switch (ret) {
  480. case Z_ERRNO:
  481. if (ferror(stdin))
  482. fputs("error reading stdin\n", stderr);
  483. if (ferror(stdout))
  484. fputs("error writing stdout\n", stderr);
  485. break;
  486. case Z_STREAM_ERROR:
  487. fputs("invalid compression level\n", stderr);
  488. break;
  489. case Z_DATA_ERROR:
  490. fputs("invalid or incomplete deflate data\n", stderr);
  491. break;
  492. case Z_MEM_ERROR:
  493. fputs("out of memory\n", stderr);
  494. break;
  495. case Z_VERSION_ERROR:
  496. fputs("zlib version mismatch!\n", stderr);
  497. }
  498. }
  499. </b></pre><!-- -->
  500. Here is the <tt>main()</tt> routine used to test <tt>def()</tt> and <tt>inf()</tt>. The
  501. <tt>zpipe</tt> command is simply a compression pipe from <tt>stdin</tt> to <tt>stdout</tt>, if
  502. no arguments are given, or it is a decompression pipe if <tt>zpipe -d</tt> is used. If any other
  503. arguments are provided, no compression or decompression is performed. Instead a usage
  504. message is displayed. Examples are <tt>zpipe < foo.txt > foo.txt.z</tt> to compress, and
  505. <tt>zpipe -d < foo.txt.z > foo.txt</tt> to decompress.
  506. <pre><b>
  507. /* compress or decompress from stdin to stdout */
  508. int main(int argc, char **argv)
  509. {
  510. int ret;
  511. /* avoid end-of-line conversions */
  512. SET_BINARY_MODE(stdin);
  513. SET_BINARY_MODE(stdout);
  514. /* do compression if no arguments */
  515. if (argc == 1) {
  516. ret = def(stdin, stdout, Z_DEFAULT_COMPRESSION);
  517. if (ret != Z_OK)
  518. zerr(ret);
  519. return ret;
  520. }
  521. /* do decompression if -d specified */
  522. else if (argc == 2 &amp;&amp; strcmp(argv[1], "-d") == 0) {
  523. ret = inf(stdin, stdout);
  524. if (ret != Z_OK)
  525. zerr(ret);
  526. return ret;
  527. }
  528. /* otherwise, report usage */
  529. else {
  530. fputs("zpipe usage: zpipe [-d] &lt; source &gt; dest\n", stderr);
  531. return 1;
  532. }
  533. }
  534. </b></pre>
  535. <hr>
  536. <i>Last modified 24 January 2023<br>
  537. Copyright &#169; 2004-2023 Mark Adler</i><br>
  538. <a rel="license" href="http://creativecommons.org/licenses/by-nd/4.0/">
  539. <img alt="Creative Commons License" style="border-width:0"
  540. src="https://i.creativecommons.org/l/by-nd/4.0/88x31.png"></a>
  541. <a rel="license" href="http://creativecommons.org/licenses/by-nd/4.0/">
  542. Creative Commons Attribution-NoDerivatives 4.0 International License</a>.
  543. </body>
  544. </html>