Browse Source

cleanup merge of ARM NEON support
tweak docs

Sean Barrett 10 years ago
parent
commit
53ca163e85
1 changed files with 50 additions and 16 deletions
  1. 50 16
      stb_image.h

+ 50 - 16
stb_image.h

@@ -39,12 +39,21 @@
       - PPM and PGM binary formats are now supported, thanks to Ken Miller.
 
       - x86 platforms now make use of SSE2 SIMD instructions for
-        JPEG decoding, and ARM platforms use NEON SIMD. This release is
-        2x faster on our test JPEGs on x86 (except progressive JPEGs,
-        which see much less speedup), mostly due to the addition of SIMD.
-        This work was done by Fabian "ryg" Giesen.
-
-      - Compilation of SIMD code can be suppressed with
+        JPEG decoding, and ARM platforms can use NEON SIMD if requested.
+        This work was done by Fabian "ryg" Giesen. SSE2 is used by
+        default, but NEON must be enabled explicitly; see docs.
+
+        With other JPEG optimizations included in this version, we see
+        2x speedup on a JPEGs on an x86 machine, and a 1.5x speedup
+        on a JPEG on an ARM machine, relative to previous versions of this
+        library. The same results will not obtain for all JPGs and for all
+        X86/ARM machines. (Note that progressive JPEGs are significantly
+        slower to decode than regular JPEGs.) This doesn't mean that this
+        is the fastest JPEG decoder in the land; rather, it brings it
+        closer to parity with standard libraries. If you want the fastest
+        decode, look elsewhere. (See "Philosophy" section of docs below.)
+
+      - Compilation of all SIMD code can be suppressed with
             #define STBI_NO_SIMD
         It should not be necessary to disable it unless you have issues
         compiling (e.g. using an x86 compiler which doesn't support SSE
@@ -76,7 +85,7 @@
 
         Please note that STBI_JPEG_OLD is a temporary feature; it will be
         removed in future versions of the library. It is only intended for
-        back-compatibility use.
+        near-term back-compatibility use.
 
       - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
         the memory allocator. Unlike other STBI libraries, these macros don't
@@ -86,7 +95,7 @@
 
 
    Latest revision history:
-      2.00 (2014-12-25) optimize JPEG, incl. x86 & NEON SIMD
+      2.00 (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
                         progressive JPEG
                         PGM/PPM support
                         STBI_MALLOC,STBI_REALLOC,STBI_FREE
@@ -136,7 +145,7 @@
 License:
    This software is in the public domain. Where that dedication is not
    recognized, you are granted a perpetual, irrevocable license to copy
-   and modify this file as you see fit.
+   and modify this file however you want.
 
 */
 
@@ -199,6 +208,29 @@ License:
 //
 // ===========================================================================
 //
+// Philosophy
+//
+// stb libraries are designed with the following priorities:
+//
+//    1. easy to use
+//    2. easy to maintain
+//    3. good performance
+//
+// Sometimes I let "good performance" creep up in priority over "easy to maintain",
+// and for best performance I may provide less-easy-to-use APIs that give higher
+// performance, in addition to the easy to use ones. Nevertheless, it's important
+// to keep in mind that from the standpoint of you, a client of this library,
+// all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
+//
+// Some secondary priorities arise directly from the first two, some of which
+// make more explicit reasons why performance can't be emphasized.
+//
+//    - Portable ("ease of use")
+//    - Small footprint ("easy to maintain")
+//    - No dependencies ("ease of use")
+//
+// ===========================================================================
+//
 // I/O callbacks
 //
 // I/O callbacks allow you to read from arbitrary sources, like packaged
@@ -213,16 +245,17 @@ License:
 //
 // SIMD support
 //
-// The JPEG decoder will try to automatically use SIMD kernels on when
-// supported by the compiler.
+// The JPEG decoder will try to automatically use SIMD kernels on x86 when
+// supported by the compiler. For ARM Neon support, you must explicitly
+// request it.
 //
 // (The old do-it-yourself SIMD API is no longer supported in the current
 // code.)
 //
-// On x86, SSE2 will automatically be used when available; if not, the
-// generic C versions are used as a fall-back. On ARM targets, the typical
-// path is to have separate builds for NEON and non-NEON devices (at least
-// this is true for iOS and Android). Therefore, the NEON support is
+// On x86, SSE2 will automatically be used when available based on a run-time
+// test; if not, the generic C versions are used as a fall-back. On ARM targets,
+// the typical path is to have separate builds for NEON and non-NEON devices
+// (at least this is true for iOS and Android). Therefore, the NEON support is
 // toggled by a build flag: define STBI_NEON to get NEON loops.
 //
 // The output of the JPEG decoder is slightly different from versions where
@@ -1722,7 +1755,6 @@ static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
 }
 
 #ifdef STBI_SSE2
-
 // sse2 integer IDCT. not the fastest possible implementation but it
 // produces bit-identical results to the generic C version so it's
 // fully "transparent".
@@ -2966,7 +2998,9 @@ static void stbi__setup_jpeg(stbi__jpeg *j)
 
 #ifdef STBI_NEON
    j->idct_block_kernel = stbi__idct_neon;
+   #ifndef STBI_JPEG_OLD
    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
+   #endif
    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
 #endif
 }