ext_buffer.html 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697
  1. <!DOCTYPE html>
  2. <html>
  3. <head>
  4. <title>String Buffer Library</title>
  5. <meta charset="utf-8">
  6. <meta name="Copyright" content="Copyright (C) 2005-2022">
  7. <meta name="Language" content="en">
  8. <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen">
  9. <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print">
  10. <style type="text/css">
  11. .lib {
  12. vertical-align: middle;
  13. margin-left: 5px;
  14. padding: 0 5px;
  15. font-size: 60%;
  16. border-radius: 5px;
  17. background: #c5d5ff;
  18. color: #000;
  19. }
  20. </style>
  21. </head>
  22. <body>
  23. <div id="site">
  24. <a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a>
  25. </div>
  26. <div id="head">
  27. <h1>String Buffer Library</h1>
  28. </div>
  29. <div id="nav">
  30. <ul><li>
  31. <a href="luajit.html">LuaJIT</a>
  32. <ul><li>
  33. <a href="https://luajit.org/download.html">Download <span class="ext">&raquo;</span></a>
  34. </li><li>
  35. <a href="install.html">Installation</a>
  36. </li><li>
  37. <a href="running.html">Running</a>
  38. </li></ul>
  39. </li><li>
  40. <a href="extensions.html">Extensions</a>
  41. <ul><li>
  42. <a href="ext_ffi.html">FFI Library</a>
  43. <ul><li>
  44. <a href="ext_ffi_tutorial.html">FFI Tutorial</a>
  45. </li><li>
  46. <a href="ext_ffi_api.html">ffi.* API</a>
  47. </li><li>
  48. <a href="ext_ffi_semantics.html">FFI Semantics</a>
  49. </li></ul>
  50. </li><li>
  51. <a class="current" href="ext_buffer.html">String Buffers</a>
  52. </li><li>
  53. <a href="ext_jit.html">jit.* Library</a>
  54. </li><li>
  55. <a href="ext_c_api.html">Lua/C API</a>
  56. </li><li>
  57. <a href="ext_profiler.html">Profiler</a>
  58. </li></ul>
  59. </li><li>
  60. <a href="status.html">Status</a>
  61. </li><li>
  62. <a href="faq.html">FAQ</a>
  63. </li><li>
  64. <a href="http://wiki.luajit.org/">Wiki <span class="ext">&raquo;</span></a>
  65. </li><li>
  66. <a href="https://luajit.org/list.html">Mailing List <span class="ext">&raquo;</span></a>
  67. </li></ul>
  68. </div>
  69. <div id="main">
  70. <p>
  71. The string buffer library allows <b>high-performance manipulation of
  72. string-like data</b>.
  73. </p>
  74. <p>
  75. Unlike Lua strings, which are constants, string buffers are
  76. <b>mutable</b> sequences of 8-bit (binary-transparent) characters. Data
  77. can be stored, formatted and encoded into a string buffer and later
  78. converted, extracted or decoded.
  79. </p>
  80. <p>
  81. The convenient string buffer API simplifies common string manipulation
  82. tasks, that would otherwise require creating many intermediate strings.
  83. String buffers improve performance by eliminating redundant memory
  84. copies, object creation, string interning and garbage collection
  85. overhead. In conjunction with the FFI library, they allow zero-copy
  86. operations.
  87. </p>
  88. <p>
  89. The string buffer library also includes a high-performance
  90. <a href="serialize">serializer</a> for Lua objects.
  91. </p>
  92. <h2 id="wip" style="color:#ff0000">Work in Progress</h2>
  93. <p>
  94. <b style="color:#ff0000">This library is a work in progress. More
  95. functionality will be added soon.</b>
  96. </p>
  97. <h2 id="use">Using the String Buffer Library</h2>
  98. <p>
  99. The string buffer library is built into LuaJIT by default, but it's not
  100. loaded by default. Add this to the start of every Lua file that needs
  101. one of its functions:
  102. </p>
  103. <pre class="code">
  104. local buffer = require("string.buffer")
  105. </pre>
  106. <p>
  107. The convention for the syntax shown on this page is that <tt>buffer</tt>
  108. refers to the buffer library and <tt>buf</tt> refers to an individual
  109. buffer object.
  110. </p>
  111. <p>
  112. Please note the difference between a Lua function call, e.g.
  113. <tt>buffer.new()</tt> (with a dot) and a Lua method call, e.g.
  114. <tt>buf:reset()</tt> (with a colon).
  115. </p>
  116. <h3 id="buffer_object">Buffer Objects</h3>
  117. <p>
  118. A buffer object is a garbage-collected Lua object. After creation with
  119. <tt>buffer.new()</tt>, it can (and should) be reused for many operations.
  120. When the last reference to a buffer object is gone, it will eventually
  121. be freed by the garbage collector, along with the allocated buffer
  122. space.
  123. </p>
  124. <p>
  125. Buffers operate like a FIFO (first-in first-out) data structure. Data
  126. can be appended (written) to the end of the buffer and consumed (read)
  127. from the front of the buffer. These operations may be freely mixed.
  128. </p>
  129. <p>
  130. The buffer space that holds the characters is managed automatically
  131. &mdash; it grows as needed and already consumed space is recycled. Use
  132. <tt>buffer.new(size)</tt> and <tt>buf:free()</tt>, if you need more
  133. control.
  134. </p>
  135. <p>
  136. The maximum size of a single buffer is the same as the maximum size of a
  137. Lua string, which is slightly below two gigabytes. For huge data sizes,
  138. neither strings nor buffers are the right data structure &mdash; use the
  139. FFI library to directly map memory or files up to the virtual memory
  140. limit of your OS.
  141. </p>
  142. <h3 id="buffer_overview">Buffer Method Overview</h3>
  143. <ul>
  144. <li>
  145. The <tt>buf:put*()</tt>-like methods append (write) characters to the
  146. end of the buffer.
  147. </li>
  148. <li>
  149. The <tt>buf:get*()</tt>-like methods consume (read) characters from the
  150. front of the buffer.
  151. </li>
  152. <li>
  153. Other methods, like <tt>buf:tostring()</tt> only read the buffer
  154. contents, but don't change the buffer.
  155. </li>
  156. <li>
  157. The <tt>buf:set()</tt> method allows zero-copy consumption of a string
  158. or an FFI cdata object as a buffer.
  159. </li>
  160. <li>
  161. The FFI-specific methods allow zero-copy read/write-style operations or
  162. modifying the buffer contents in-place. Please check the
  163. <a href="#ffi_caveats">FFI caveats</a> below, too.
  164. </li>
  165. <li>
  166. Methods that don't need to return anything specific, return the buffer
  167. object itself as a convenience. This allows method chaining, e.g.:
  168. <tt>buf:reset():encode(obj)</tt> or <tt>buf:skip(len):get()</tt>
  169. </li>
  170. </ul>
  171. <h2 id="create">Buffer Creation and Management</h2>
  172. <h3 id="buffer_new"><tt>local buf = buffer.new([size [,options]])<br>
  173. local buf = buffer.new([options])</tt></h3>
  174. <p>
  175. Creates a new buffer object.
  176. </p>
  177. <p>
  178. The optional <tt>size</tt> argument ensures a minimum initial buffer
  179. size. This is strictly an optimization when the required buffer size is
  180. known beforehand. The buffer space will grow as needed, in any case.
  181. </p>
  182. <p>
  183. The optional table <tt>options</tt> sets various
  184. <a href="#serialize_options">serialization options</a>.
  185. </p>
  186. <h3 id="buffer_reset"><tt>buf = buf:reset()</tt></h3>
  187. <p>
  188. Reset (empty) the buffer. The allocated buffer space is not freed and
  189. may be reused.
  190. </p>
  191. <h3 id="buffer_free"><tt>buf = buf:free()</tt></h3>
  192. <p>
  193. The buffer space of the buffer object is freed. The object itself
  194. remains intact, empty and may be reused.
  195. </p>
  196. <p>
  197. Note: you normally don't need to use this method. The garbage collector
  198. automatically frees the buffer space, when the buffer object is
  199. collected. Use this method, if you need to free the associated memory
  200. immediately.
  201. </p>
  202. <h2 id="write">Buffer Writers</h2>
  203. <h3 id="buffer_put"><tt>buf = buf:put([str|num|obj] [,…])</tt></h3>
  204. <p>
  205. Appends a string <tt>str</tt>, a number <tt>num</tt> or any object
  206. <tt>obj</tt> with a <tt>__tostring</tt> metamethod to the buffer.
  207. Multiple arguments are appended in the given order.
  208. </p>
  209. <p>
  210. Appending a buffer to a buffer is possible and short-circuited
  211. internally. But it still involves a copy. Better combine the buffer
  212. writes to use a single buffer.
  213. </p>
  214. <h3 id="buffer_putf"><tt>buf = buf:putf(format, …)</tt></h3>
  215. <p>
  216. Appends the formatted arguments to the buffer. The <tt>format</tt>
  217. string supports the same options as <tt>string.format()</tt>.
  218. </p>
  219. <h3 id="buffer_putcdata"><tt>buf = buf:putcdata(cdata, len)</tt><span class="lib">FFI</span></h3>
  220. <p>
  221. Appends the given <tt>len</tt> number of bytes from the memory pointed
  222. to by the FFI <tt>cdata</tt> object to the buffer. The object needs to
  223. be convertible to a (constant) pointer.
  224. </p>
  225. <h3 id="buffer_set"><tt>buf = buf:set(str)<br>
  226. buf = buf:set(cdata, len)</tt><span class="lib">FFI</span></h3>
  227. <p>
  228. This method allows zero-copy consumption of a string or an FFI cdata
  229. object as a buffer. It stores a reference to the passed string
  230. <tt>str</tt> or the FFI <tt>cdata</tt> object in the buffer. Any buffer
  231. space originally allocated is freed. This is <i>not</i> an append
  232. operation, unlike the <tt>buf:put*()</tt> methods.
  233. </p>
  234. <p>
  235. After calling this method, the buffer behaves as if
  236. <tt>buf:free():put(str)</tt> or <tt>buf:free():put(cdata,&nbsp;len)</tt>
  237. had been called. However, the data is only referenced and not copied, as
  238. long as the buffer is only consumed.
  239. </p>
  240. <p>
  241. In case the buffer is written to later on, the referenced data is copied
  242. and the object reference is removed (copy-on-write semantics).
  243. </p>
  244. <p>
  245. The stored reference is an anchor for the garbage collector and keeps the
  246. originally passed string or FFI cdata object alive.
  247. </p>
  248. <h3 id="buffer_reserve"><tt>ptr, len = buf:reserve(size)</tt><span class="lib">FFI</span><br>
  249. <tt>buf = buf:commit(used)</tt><span class="lib">FFI</span></h3>
  250. <p>
  251. The <tt>reserve</tt> method reserves at least <tt>size</tt> bytes of
  252. write space in the buffer. It returns an <tt>uint8_t&nbsp;*</tt> FFI
  253. cdata pointer <tt>ptr</tt> that points to this space.
  254. </p>
  255. <p>
  256. The available length in bytes is returned in <tt>len</tt>. This is at
  257. least <tt>size</tt> bytes, but may be more to facilitate efficient
  258. buffer growth. You can either make use of the additional space or ignore
  259. <tt>len</tt> and only use <tt>size</tt> bytes.
  260. </p>
  261. <p>
  262. The <tt>commit</tt> method appends the <tt>used</tt> bytes of the
  263. previously returned write space to the buffer data.
  264. </p>
  265. <p>
  266. This pair of methods allows zero-copy use of C read-style APIs:
  267. </p>
  268. <pre class="code">
  269. local MIN_SIZE = 65536
  270. repeat
  271. local ptr, len = buf:reserve(MIN_SIZE)
  272. local n = C.read(fd, ptr, len)
  273. if n == 0 then break end -- EOF.
  274. if n &lt; 0 then error("read error") end
  275. buf:commit(n)
  276. until false
  277. </pre>
  278. <p>
  279. The reserved write space is <i>not</i> initialized. At least the
  280. <tt>used</tt> bytes <b>must</b> be written to before calling the
  281. <tt>commit</tt> method. There's no need to call the <tt>commit</tt>
  282. method, if nothing is added to the buffer (e.g. on error).
  283. </p>
  284. <h2 id="read">Buffer Readers</h2>
  285. <h3 id="buffer_length"><tt>len = #buf</tt></h3>
  286. <p>
  287. Returns the current length of the buffer data in bytes.
  288. </p>
  289. <h3 id="buffer_concat"><tt>res = str|num|buf .. str|num|buf […]</tt></h3>
  290. <p>
  291. The Lua concatenation operator <tt>..</tt> also accepts buffers, just
  292. like strings or numbers. It always returns a string and not a buffer.
  293. </p>
  294. <p>
  295. Note that although this is supported for convenience, this thwarts one
  296. of the main reasons to use buffers, which is to avoid string
  297. allocations. Rewrite it with <tt>buf:put()</tt> and <tt>buf:get()</tt>.
  298. </p>
  299. <p>
  300. Mixing this with unrelated objects that have a <tt>__concat</tt>
  301. metamethod may not work, since these probably only expect strings.
  302. </p>
  303. <h3 id="buffer_skip"><tt>buf = buf:skip(len)</tt></h3>
  304. <p>
  305. Skips (consumes) <tt>len</tt> bytes from the buffer up to the current
  306. length of the buffer data.
  307. </p>
  308. <h3 id="buffer_get"><tt>str, … = buf:get([len|nil] [,…])</tt></h3>
  309. <p>
  310. Consumes the buffer data and returns one or more strings. If called
  311. without arguments, the whole buffer data is consumed. If called with a
  312. number, up to <tt>len</tt> bytes are consumed. A <tt>nil</tt> argument
  313. consumes the remaining buffer space (this only makes sense as the last
  314. argument). Multiple arguments consume the buffer data in the given
  315. order.
  316. </p>
  317. <p>
  318. Note: a zero length or no remaining buffer data returns an empty string
  319. and not <tt>nil</tt>.
  320. </p>
  321. <h3 id="buffer_tostring"><tt>str = buf:tostring()<br>
  322. str = tostring(buf)</tt></h3>
  323. <p>
  324. Creates a string from the buffer data, but doesn't consume it. The
  325. buffer remains unchanged.
  326. </p>
  327. <p>
  328. Buffer objects also define a <tt>__tostring</tt> metamethod. This means
  329. buffers can be passed to the global <tt>tostring()</tt> function and
  330. many other functions that accept this in place of strings. The important
  331. internal uses in functions like <tt>io.write()</tt> are short-circuited
  332. to avoid the creation of an intermediate string object.
  333. </p>
  334. <h3 id="buffer_ref"><tt>ptr, len = buf:ref()</tt><span class="lib">FFI</span></h3>
  335. <p>
  336. Returns an <tt>uint8_t&nbsp;*</tt> FFI cdata pointer <tt>ptr</tt> that
  337. points to the buffer data. The length of the buffer data in bytes is
  338. returned in <tt>len</tt>.
  339. </p>
  340. <p>
  341. The returned pointer can be directly passed to C functions that expect a
  342. buffer and a length. You can also do bytewise reads
  343. (<tt>local&nbsp;x&nbsp;=&nbsp;ptr[i]</tt>) or writes
  344. (<tt>ptr[i]&nbsp;=&nbsp;0x40</tt>) of the buffer data.
  345. </p>
  346. <p>
  347. In conjunction with the <tt>skip</tt> method, this allows zero-copy use
  348. of C write-style APIs:
  349. </p>
  350. <pre class="code">
  351. repeat
  352. local ptr, len = buf:ref()
  353. if len == 0 then break end
  354. local n = C.write(fd, ptr, len)
  355. if n &lt; 0 then error("write error") end
  356. buf:skip(n)
  357. until n >= len
  358. </pre>
  359. <p>
  360. Unlike Lua strings, buffer data is <i>not</i> implicitly
  361. zero-terminated. It's not safe to pass <tt>ptr</tt> to C functions that
  362. expect zero-terminated strings. If you're not using <tt>len</tt>, then
  363. you're doing something wrong.
  364. </p>
  365. <h2 id="serialize">Serialization of Lua Objects</h2>
  366. <p>
  367. The following functions and methods allow <b>high-speed serialization</b>
  368. (encoding) of a Lua object into a string and decoding it back to a Lua
  369. object. This allows convenient storage and transport of <b>structured
  370. data</b>.
  371. </p>
  372. <p>
  373. The encoded data is in an <a href="#serialize_format">internal binary
  374. format</a>. The data can be stored in files, binary-transparent
  375. databases or transmitted to other LuaJIT instances across threads,
  376. processes or networks.
  377. </p>
  378. <p>
  379. Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or
  380. server-class system, even when serializing many small objects. Decoding
  381. speed is mostly constrained by object creation cost.
  382. </p>
  383. <p>
  384. The serializer handles most Lua types, common FFI number types and
  385. nested structures. Functions, thread objects, other FFI cdata and full
  386. userdata cannot be serialized (yet).
  387. </p>
  388. <p>
  389. The encoder serializes nested structures as trees. Multiple references
  390. to a single object will be stored separately and create distinct objects
  391. after decoding. Circular references cause an error.
  392. </p>
  393. <h3 id="serialize_methods">Serialization Functions and Methods</h3>
  394. <h3 id="buffer_encode"><tt>str = buffer.encode(obj)<br>
  395. buf = buf:encode(obj)</tt></h3>
  396. <p>
  397. Serializes (encodes) the Lua object <tt>obj</tt>. The stand-alone
  398. function returns a string <tt>str</tt>. The buffer method appends the
  399. encoding to the buffer.
  400. </p>
  401. <p>
  402. <tt>obj</tt> can be any of the supported Lua types &mdash; it doesn't
  403. need to be a Lua table.
  404. </p>
  405. <p>
  406. This function may throw an error when attempting to serialize
  407. unsupported object types, circular references or deeply nested tables.
  408. </p>
  409. <h3 id="buffer_decode"><tt>obj = buffer.decode(str)<br>
  410. obj = buf:decode()</tt></h3>
  411. <p>
  412. The stand-alone function deserializes (decodes) the string
  413. <tt>str</tt>, the buffer method deserializes one object from the
  414. buffer. Both return a Lua object <tt>obj</tt>.
  415. </p>
  416. <p>
  417. The returned object may be any of the supported Lua types &mdash;
  418. even <tt>nil</tt>.
  419. </p>
  420. <p>
  421. This function may throw an error when fed with malformed or incomplete
  422. encoded data. The stand-alone function throws when there's left-over
  423. data after decoding a single top-level object. The buffer method leaves
  424. any left-over data in the buffer.
  425. </p>
  426. <p>
  427. Attempting to deserialize an FFI type will throw an error, if the FFI
  428. library is not built-in or has not been loaded, yet.
  429. </p>
  430. <h3 id="serialize_options">Serialization Options</h3>
  431. <p>
  432. The <tt>options</tt> table passed to <tt>buffer.new()</tt> may contain
  433. the following members (all optional):
  434. </p>
  435. <ul>
  436. <li>
  437. <tt>dict</tt> is a Lua table holding a <b>dictionary of strings</b> that
  438. commonly occur as table keys of objects you are serializing. These keys
  439. are compactly encoded as indexes during serialization. A well-chosen
  440. dictionary saves space and improves serialization performance.
  441. </li>
  442. <li>
  443. <tt>metatable</tt> is a Lua table holding a <b>dictionary of metatables</b>
  444. for the table objects you are serializing.
  445. </li>
  446. </ul>
  447. <p>
  448. <tt>dict</tt> needs to be an array of strings and <tt>metatable</tt> needs
  449. to be an array of tables. Both starting at index 1 and without holes (no
  450. <tt>nil</tt> in between). The tables are anchored in the buffer object and
  451. internally modified into a two-way index (don't do this yourself, just pass
  452. a plain array). The tables must not be modified after they have been passed
  453. to <tt>buffer.new()</tt>.
  454. </p>
  455. <p>
  456. The <tt>dict</tt> and <tt>metatable</tt> tables used by the encoder and
  457. decoder must be the same. Put the most common entries at the front. Extend
  458. at the end to ensure backwards-compatibility &mdash; older encodings can
  459. then still be read. You may also set some indexes to <tt>false</tt> to
  460. explicitly drop backwards-compatibility. Old encodings that use these
  461. indexes will throw an error when decoded.
  462. </p>
  463. <p>
  464. Metatables that are not found in the <tt>metatable</tt> dictionary are
  465. ignored when encoding. Decoding returns a table with a <tt>nil</tt>
  466. metatable.
  467. </p>
  468. <p>
  469. Note: parsing and preparation of the options table is somewhat
  470. expensive. Create a buffer object only once and recycle it for multiple
  471. uses. Avoid mixing encoder and decoder buffers, since the
  472. <tt>buf:set()</tt> method frees the already allocated buffer space:
  473. </p>
  474. <pre class="code">
  475. local options = {
  476. dict = { "commonly", "used", "string", "keys" },
  477. }
  478. local buf_enc = buffer.new(options)
  479. local buf_dec = buffer.new(options)
  480. local function encode(obj)
  481. return buf_enc:reset():encode(obj):get()
  482. end
  483. local function decode(str)
  484. return buf_dec:set(str):decode()
  485. end
  486. </pre>
  487. <h3 id="serialize_stream">Streaming Serialization</h3>
  488. <p>
  489. In some contexts, it's desirable to do piecewise serialization of large
  490. datasets, also known as <i>streaming</i>.
  491. </p>
  492. <p>
  493. This serialization format can be safely concatenated and supports streaming.
  494. Multiple encodings can simply be appended to a buffer and later decoded
  495. individually:
  496. </p>
  497. <pre class="code">
  498. local buf = buffer.new()
  499. buf:encode(obj1)
  500. buf:encode(obj2)
  501. local copy1 = buf:decode()
  502. local copy2 = buf:decode()
  503. </pre>
  504. <p>
  505. Here's how to iterate over a stream:
  506. </p>
  507. <pre class="code">
  508. while #buf ~= 0 do
  509. local obj = buf:decode()
  510. -- Do something with obj.
  511. end
  512. </pre>
  513. <p>
  514. Since the serialization format doesn't prepend a length to its encoding,
  515. network applications may need to transmit the length, too.
  516. </p>
  517. <h3 id="serialize_format">Serialization Format Specification</h3>
  518. <p>
  519. This serialization format is designed for <b>internal use</b> by LuaJIT
  520. applications. Serialized data is upwards-compatible and portable across
  521. all supported LuaJIT platforms.
  522. </p>
  523. <p>
  524. It's an <b>8-bit binary format</b> and not human-readable. It uses e.g.
  525. embedded zeroes and stores embedded Lua string objects unmodified, which
  526. are 8-bit-clean, too. Encoded data can be safely concatenated for
  527. streaming and later decoded one top-level object at a time.
  528. </p>
  529. <p>
  530. The encoding is reasonably compact, but tuned for maximum performance,
  531. not for minimum space usage. It compresses well with any of the common
  532. byte-oriented data compression algorithms.
  533. </p>
  534. <p>
  535. Although documented here for reference, this format is explicitly
  536. <b>not</b> intended to be a 'public standard' for structured data
  537. interchange across computer languages (like JSON or MessagePack). Please
  538. do not use it as such.
  539. </p>
  540. <p>
  541. The specification is given below as a context-free grammar with a
  542. top-level <tt>object</tt> as the starting point. Alternatives are
  543. separated by the <tt>|</tt> symbol and <tt>*</tt> indicates repeats.
  544. Grouping is implicit or indicated by <tt>{…}</tt>. Terminals are
  545. either plain hex numbers, encoded as bytes, or have a <tt>.format</tt>
  546. suffix.
  547. </p>
  548. <pre>
  549. object → nil | false | true
  550. | null | lightud32 | lightud64
  551. | int | num | tab | tab_mt
  552. | int64 | uint64 | complex
  553. | string
  554. nil → 0x00
  555. false → 0x01
  556. true → 0x02
  557. null → 0x03 // NULL lightuserdata
  558. lightud32 → 0x04 data.I // 32 bit lightuserdata
  559. lightud64 → 0x05 data.L // 64 bit lightuserdata
  560. int → 0x06 int.I // int32_t
  561. num → 0x07 double.L
  562. tab → 0x08 // Empty table
  563. | 0x09 h.U h*{object object} // Key/value hash
  564. | 0x0a a.U a*object // 0-based array
  565. | 0x0b a.U a*object h.U h*{object object} // Mixed
  566. | 0x0c a.U (a-1)*object // 1-based array
  567. | 0x0d a.U (a-1)*object h.U h*{object object} // Mixed
  568. tab_mt → 0x0e (index-1).U tab // Metatable dict entry
  569. int64 → 0x10 int.L // FFI int64_t
  570. uint64 → 0x11 uint.L // FFI uint64_t
  571. complex → 0x12 re.L im.L // FFI complex
  572. string → (0x20+len).U len*char.B
  573. | 0x0f (index-1).U // String dict entry
  574. .B = 8 bit
  575. .I = 32 bit little-endian
  576. .L = 64 bit little-endian
  577. .U = prefix-encoded 32 bit unsigned number n:
  578. 0x00..0xdf → n.B
  579. 0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B
  580. 0x1fe0.. → 0xff n.I
  581. </pre>
  582. <h2 id="error">Error handling</h2>
  583. <p>
  584. Many of the buffer methods can throw an error. Out-of-memory or usage
  585. errors are best caught with an outer wrapper for larger parts of code.
  586. There's not much one can do after that, anyway.
  587. </p>
  588. <p>
  589. OTOH, you may want to catch some errors individually. Buffer methods need
  590. to receive the buffer object as the first argument. The Lua colon-syntax
  591. <tt>obj:method()</tt> does that implicitly. But to wrap a method with
  592. <tt>pcall()</tt>, the arguments need to be passed like this:
  593. </p>
  594. <pre class="code">
  595. local ok, err = pcall(buf.encode, buf, obj)
  596. if not ok then
  597. -- Handle error in err.
  598. end
  599. </pre>
  600. <h2 id="ffi_caveats">FFI caveats</h2>
  601. <p>
  602. The string buffer library has been designed to work well together with
  603. the FFI library. But due to the low-level nature of the FFI library,
  604. some care needs to be taken:
  605. </p>
  606. <p>
  607. First, please remember that FFI pointers are zero-indexed. The space
  608. returned by <tt>buf:reserve()</tt> and <tt>buf:ref()</tt> starts at the
  609. returned pointer and ends before <tt>len</tt> bytes after that.
  610. </p>
  611. <p>
  612. I.e. the first valid index is <tt>ptr[0]</tt> and the last valid index
  613. is <tt>ptr[len-1]</tt>. If the returned length is zero, there's no valid
  614. index at all. The returned pointer may even be <tt>NULL</tt>.
  615. </p>
  616. <p>
  617. The space pointed to by the returned pointer is only valid as long as
  618. the buffer is not modified in any way (neither append, nor consume, nor
  619. reset, etc.). The pointer is also not a GC anchor for the buffer object
  620. itself.
  621. </p>
  622. <p>
  623. Buffer data is only guaranteed to be byte-aligned. Casting the returned
  624. pointer to a data type with higher alignment may cause unaligned
  625. accesses. It depends on the CPU architecture whether this is allowed or
  626. not (it's always OK on x86/x64 and mostly OK on other modern
  627. architectures).
  628. </p>
  629. <p>
  630. FFI pointers or references do not count as GC anchors for an underlying
  631. object. E.g. an <tt>array</tt> allocated with <tt>ffi.new()</tt> is
  632. anchored by <tt>buf:set(array,&nbsp;len)</tt>, but not by
  633. <tt>buf:set(array+offset,&nbsp;len)</tt>. The addition of the offset
  634. creates a new pointer, even when the offset is zero. In this case, you
  635. need to make sure there's still a reference to the original array as
  636. long as its contents are in use by the buffer.
  637. </p>
  638. <p>
  639. Even though each LuaJIT VM instance is single-threaded (but you can
  640. create multiple VMs), FFI data structures can be accessed concurrently.
  641. Be careful when reading/writing FFI cdata from/to buffers to avoid
  642. concurrent accesses or modifications. In particular, the memory
  643. referenced by <tt>buf:set(cdata,&nbsp;len)</tt> must not be modified
  644. while buffer readers are working on it. Shared, but read-only memory
  645. mappings of files are OK, but only if the file does not change.
  646. </p>
  647. <br class="flush">
  648. </div>
  649. <div id="foot">
  650. <hr class="hide">
  651. Copyright &copy; 2005-2022
  652. <span class="noprint">
  653. &middot;
  654. <a href="contact.html">Contact</a>
  655. </span>
  656. </div>
  657. </body>
  658. </html>