%!s(int64=11) %!d(string=hai) anos · 43fb9942de
--- a/docs/stb_resample_ideas.txt
+++ b/docs/stb_resample_ideas.txt
@@ -1,201 +0,0 @@
 
				-1.
			
 
				-
			
 
				-Consider just porting this C++ public domain
			
 
				-library back to C:
			
 
				-    https://code.google.com/p/imageresampler/source/browse/#svn%2Ftrunk
			
 
				-(recommended by @castano)
			
 
				-
			
 
				-
			
 
				-2.
			
 
				-
			
 
				-Consider three cases just to suggest the spectrum
			
 
				-of possiblities:
			
 
				-
			
 
				-a) linear upsample: each output pixel is a weighted sum
			
 
				-of 4 input pixels
			
 
				-
			
 
				-b) cubic upsample: each output pixel is a weighted sum
			
 
				-of 16 input pixels
			
 
				-
			
 
				-c) downsample by N with box filter: each output pixel
			
 
				-is a weighted sum of NxN input pixels, N can be very large
			
 
				-
			
 
				-Now, suppose you want to handle 8-bit input, 16-bit
			
 
				-input, and float input, and you want to do sRGB correction
			
 
				-or not.
			
 
				-
			
 
				-Suppose you create a temporary buffer of float pixels, say
			
 
				-one scanline tall. Actually two temp buffers, one for the
			
 
				-input and one for the output. You decode a scanline of the
			
 
				-input into the temp buffer which is always linear floats. This
			
 
				-isolates the handling of 8/16/float and sRGB to one place
			
 
				-(and still allows you to make optimized 8-bit-sRGB-to-float
			
 
				-lookup tables). This also allows you to put wrap logic here,
			
 
				-explicitly wrapping, reflecting, or replicating-from-edge
			
 
				-pixels that would come from off-edge.
			
 
				-
			
 
				-You then do whatever the appropriate weighted sums are
			
 
				-into the output buffer, and you move on to the next
			
 
				-scanline of the input.
			
 
				-
			
 
				-The algorithm just described works directly for case (c).
			
 
				-Suppose you're downsampling by 2.5; then output scanline 0
			
 
				-sums from input scanlines 0, 1, and 2; output scanline 1
			
 
				-sums from 2,3,4; output 2 from 5,6,7; output 3 from 7,8,9.
			
 
				-Note how 2 & 7 get reused, but we don't have to recompute
			
 
				-them because we can do things in a single linear pass
			
 
				-through the input and output at the same time.
			
 
				-
			
 
				-Now, consider case (a). When upsampling, the same two input
			
 
				-scanlines will get sampled-from for multiple output scanlines.
			
 
				-So, to avoid recomputing the input scanlines, we need either
			
 
				-multiple input or multiple output temp buffer lines. Since
			
 
				-the number of output lines a given pair of input scanlines
			
 
				-might touch scales with the upsample amount, it makes more
			
 
				-sense to use two input scanline buffers. For cubic, you'll
			
 
				-need four scanline buffers, and in general the number of
			
 
				-buffers will be limited by the max filter width, which is
			
 
				-presumably hardcoded.
			
 
				-
			
 
				-It turns out to be slightly different for two reasons:
			
 
				-
			
 
				-   1. when using an arbitrary filter and downsampling,
			
 
				-      you actually need N output buffers and 1 input buffer
			
 
				-      (vs 1 output buffer and N input buffers upsampling)
			
 
				-
			
 
				-   2. this approach will be very inefficient as written.
			
 
				-      you want to use separable filters and actually do
			
 
				-      seperable computation: first decode an input scanline
			
 
				-      into a 'decode' buffer, then horizontally resample it
			
 
				-      into the "input" buffer (kind of a misnomer, but
			
 
				-      they're the inputs to the vertical resampler)
			
 
				-
			
 
				-(The above approach isn't optimal for non-uniform resampling;
			
 
				-optimal is to do whichever axis is smaller first, but I don't
			
 
				-think we have to care about doing that right.)
			
 
				-
			
 
				-
			
 
				-Now, you can either:
			
 
				-
			
 
				-    1. malloc the temp memory
			
 
				-    2. alloca it
			
 
				-    3. allocate a fixed amount on the stack
			
 
				-    4. let the user pass it in
			
 
				-
			
 
				-I forbid #2 in stb libraries for portability.
			
 
				-
			
 
				-If you're not allocating the output image, but rather requiring
			
 
				-the user to pass it in, it's probably worth trying to avoid #1
			
 
				-because people always want to use stb libs without any memory
			
 
				-allocations for various reason. (Note that most stb libs go
			
 
				-crazy with memory allocations--you shouldn't use stb_image
			
 
				-in a console game--but I've tried to avoid it more in newer
			
 
				-libs.)
			
 
				-
			
 
				-The way #3 would work is instead of using a scanline-width
			
 
				-temp buffer, use some fixed-width temp buffer that's W pixels,
			
 
				-and scale the image in vertical stripes that are that wide.
			
 
				-Suppose you make the temp buffers 256 wide; then an upsample
			
 
				-by 8 computes 256-pixel-width strips (from ~32-pixel-wide input
			
 
				-strips), but a downsample by 8 computes ~32-pixel-width
			
 
				-strips (from a 256-pixel width strip). Note this limits
			
 
				-the max down/upsampling to be ballpark 256x along the
			
 
				-horizontal axis.
			
 
				-
			
 
				-In the following, I do #3 and allow #4 for cases where #3 is
			
 
				-too small, but it's not the only possibility:
			
 
				-
			
 
				-
			
 
				-
			
 
				-Function prototypes:
			
 
				-
			
 
				-the highest-level one could be:
			
 
				-
			
 
				-   stb_resample_8bit(uint8_t       *dest, int dest_width, int dest_height,
			
 
				-                     uint8_t const *src , int  src_width, int  src_height,
			
 
				-                     int channels,
			
 
				-                     stbr_filter filter);
			
 
				-
			
 
				-the lowest-level one could be:
			
 
				-
			
 
				-   stb_resample_arbitrary(void       *dst, stbr_type dst_type, int dst_width, int dst_height, int dst_stride_in_bytes,
			
 
				-                          void const *src, stbr_type src_type, int src_width, int src_height, int src_stride_in_bytes,
			
 
				-                          float s0, float t0, float s1, float t1, // range of source to use, 0..1 in GPU texture-coordinate style
			
 
				-                          int channels,
			
 
				-                          int nonpremul_alpha_channel_index,
			
 
				-                          stbr_wrapmode wrap,                     // clamp, wrap, mirror
			
 
				-                          stbr_filter filter,
			
 
				-                          void  *tempmem, size_t tempmem_size_in_bytes);
			
 
				-
			
 
				-And there would be a bunch of convenience functions in-between those two levels.
			
 
				-
			
 
				-
			
 
				-Some notes:
			
 
				-
			
 
				-   s0,t0,s1,t1:
			
 
				-       this allows fine subpixel-positioning and subpixel-resizing in an explicit way without
			
 
				-           things having to be exact pixel multiples. it allows people to pseudo-stream
			
 
				-           images by computing "tiles" of images a bit at a time without forcing those
			
 
				-           tiles to quantize their source data.
			
 
				-
			
 
				-   nonpremul_alpha_channel_index:
			
 
				-       if this is negative, no channels are processed specially
			
 
				-       if this is non-negative, then it's the index of the alpha channel,
			
 
				-           and the image should be treated as non-premultiplied alpha that
			
 
				-           needs to be resampled accounting for this (weight the sampling
			
 
				-           by the alpha channel, i.e. premultiply, filter, unpremultiply).
			
 
				-           this mechanism only allows one alpha channel and ALL channels 
			
 
				-           are scaled by it; an alternative would be to find some way to
			
 
				-           pass in which channels serve as alpha channels for which other
			
 
				-           channels, but eh.
			
 
				-
			
 
				-   tempmem, tempmem_size:
			
 
				-       all functions will needed tempmem, but they can allocate a fixed tempmem buffer
			
 
				-           on the stack. providing an API that allows overriding the amount of tempmem
			
 
				-           available allows people to process arbitrarily large images. the return
			
 
				-           value for the function could be 0 on success or non-0 being the size of
			
 
				-           tempmem needed.
			
 
				-   
			
 
				-   src_stride, dest_stride:
			
 
				-       the stride variables are signed to allow you to describe both traditional
			
 
				-           top-to-bottom images (pass in a pointer to the top-left pixel and
			
 
				-           a positive stride) and bottom-to-top images (pass in a pointer to
			
 
				-           the bottom-left pixel and a negative stride)
			
 
				-
			
 
				-   ordering of src & dest:
			
 
				-       put these in whatever order you like, i just chose one arbitrarily
			
 
				-
			
 
				-   width & height
			
 
				-       these are ints not unsigned ints or size_ts because i personally forbid
			
 
				-           unsigned variables for almost everything to avoid signed/unsigned comparison
			
 
				-           issues, but this is a matter of personal taste and you can do differently
			
 
				-
			
 
				-   Intermediate-level functions should be provided for each source type & same dest type
			
 
				-   so that the code is typesafe; only when people fall back to stb_resample_arbitrary should
			
 
				-   they be at risk for type unsafety. (One way to deal avoid an explosion of functions of
			
 
				-   every possible *combination* of types in a type-safe way would be to define one function
			
 
				-   for each input type, and accept three separate output pointers, one for each type, only
			
 
				-   one of which can be non-NULL. 9 functions isn't that bad, but if you want to have three
			
 
				-   or four intermediate-level functions with fewer parameters, 9*4 gets silly. Could also
			
 
				-   use the same trick for stb_resample_arbitrary, replacing it with three typesafe functions.)
			
 
				-
			
 
				-
			
 
				-
			
 
				-
			
 
				-Reference:
			
 
				-
			
 
				-Cubic sampling function for seperable cubic:
			
 
				-   f(x) = (a+2)*x^3 - (a+3)*x^2 + 1       for 0 <= x <= 1
			
 
				-   f(x) = a*x^3 - 5*a*x^2 + 8*a*x - 4*a   for 1 < x <= 2
			
 
				-   f(x) = 0                               otherwise
			
 
				-   "a" is configurable, try -1/2 (from http://pixinsight.com/forum/index.php?topic=556.0 )
			
 
				-
			
 
				-
			
 
				-
			
 
				-Wish list:
			
 
				-   s0, t0, s1, t1 vs scale_x, scale_y, offset_x, offset_y - What's the best interface?
			
 
				-   Separate wrap modes and filter modes per axis
			
 
				-   Alpha test coverage respecting resize (FloatImage::alphaTestCoverage and FloatImage::scaleAlphaToCoverage: https://code.google.com/p/nvidia-texture-tools/source/browse/trunk/src/nvimage/FloatImage.cpp)
			
 
				-   Installable filter kernels
			
 
				-
			
 
				-
			
--- a/stb_image_resize.h
+++ b/stb_image_resize.h
@@ -31,13 +31,9 @@
 
				    ADDITIONAL DOCUMENTATION
			
 
				 
			
 
				       SRGB & FLOATING POINT REPRESENTATION
			
 
				-         Some srgb-related code in this library relies on floats being 32-bit
			
 
				-         IEEE floating point, and relies on a specific bitpacking order of C
			
 
				-         bitfields. If you are on a system that uses non-IEEE floats or packs
			
 
				-         C bitfields in the opposite order, then you can use a slower fallback
			
 
				-         codepath by defining STBIR_NON_IEEE_FLOAT. (We didn't make this choice
			
 
				-         idly; using mostly-but-not-100%-portable-code for this is a massive
			
 
				-         speedup, especially upsampling where colorspace conversion dominates.)
			
 
				+         The sRGB functions presume IEEE floating point. If you do not have
			
 
				+         IEEE floating point, define STBIR_NON_IEEE_FLOAT. This will use
			
 
				+         a slower implementation.
			
 
				 
			
 
				       MEMORY ALLOCATION
			
 
				          The resize functions here perform a single memory allocation using
			
@@ -655,12 +651,6 @@ typedef union
 
				 {
			
 
				     stbir_uint32 u;
			
 
				     float f;
			
 
				-    struct
			
 
				-    {
			
 
				-        stbir_uint32 Mantissa : 23;
			
 
				-        stbir_uint32 Exponent : 8;
			
 
				-        stbir_uint32 Sign : 1;
			
 
				-    };
			
 
				 } stbir__FP32;
			
 
				 
			
 
				 static const stbir_uint32 fp32_to_srgb8_tab4[104] = {