usermanual-what-is-harfbuzz.xml 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441
  1. <?xml version="1.0"?>
  2. <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
  4. <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
  5. <!ENTITY version SYSTEM "version.xml">
  6. ]>
  7. <chapter id="what-is-harfbuzz">
  8. <title>What is HarfBuzz?</title>
  9. <para>
  10. HarfBuzz is a <emphasis>text-shaping engine</emphasis>. If you
  11. give HarfBuzz a font and a string containing a sequence of Unicode
  12. codepoints, HarfBuzz selects and positions the corresponding
  13. glyphs from the font, applying all of the necessary layout rules
  14. and font features. HarfBuzz then returns the string to you in the
  15. form that is correctly arranged for the language and writing
  16. system.
  17. </para>
  18. <para>
  19. HarfBuzz can properly shape all of the world's major writing
  20. systems. It runs on all major operating systems and software
  21. platforms and it supports the major font formats in use
  22. today.
  23. </para>
  24. <section id="what-is-text-shaping">
  25. <title>What is text shaping?</title>
  26. <para>
  27. Text shaping is the process of translating a string of character
  28. codes (such as Unicode codepoints) into a properly arranged
  29. sequence of glyphs that can be rendered onto a screen or into
  30. final output form for inclusion in a document.
  31. </para>
  32. <para>
  33. The shaping process is dependent on the input string, the active
  34. font, the script (or writing system) that the string is in, and
  35. the language that the string is in.
  36. </para>
  37. <para>
  38. Modern software systems generally only deal with strings in the
  39. Unicode encoding scheme (although legacy systems and documents may
  40. involve other encodings).
  41. </para>
  42. <para>
  43. There are several font formats that a program might
  44. encounter, each of which has a set of standard text-shaping
  45. rules.
  46. </para>
  47. <para>The dominant format is <ulink
  48. url="http://www.microsoft.com/typography/otspec/">OpenType</ulink>. The
  49. OpenType specification defines a series of <ulink url="https://github.com/n8willis/opentype-shaping-documents">shaping models</ulink> for
  50. various scripts from around the world. These shaping models depend on
  51. the font incorporating certain features as
  52. <emphasis>lookups</emphasis> in its <literal>GSUB</literal>
  53. and <literal>GPOS</literal> tables.
  54. </para>
  55. <para>
  56. Alternatively, OpenType fonts can include shaping features for
  57. the <ulink url="https://graphite.sil.org/">Graphite</ulink> shaping model.
  58. </para>
  59. <para>
  60. TrueType fonts can also include OpenType shaping
  61. features. Alternatively, TrueType fonts can also include <ulink url="https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html">Apple
  62. Advanced Typography</ulink> (AAT) tables to implement shaping
  63. support. AAT fonts are generally only found on macOS and iOS systems.
  64. </para>
  65. <para>
  66. Text strings will usually be tagged with a script and language
  67. tag that provide the context needed to perform text shaping
  68. correctly. The necessary <ulink
  69. url="https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags">script</ulink>
  70. and <ulink
  71. url="https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags">language</ulink>
  72. tags are defined by OpenType.
  73. </para>
  74. </section>
  75. <section id="why-do-i-need-a-shaping-engine">
  76. <title>Why do I need a shaping engine?</title>
  77. <para>
  78. Text shaping is an integral part of preparing text for
  79. display. Before a Unicode sequence can be rendered, the
  80. codepoints in the sequence must be mapped to the corresponding
  81. glyphs provided in the font, and those glyphs must be positioned
  82. correctly relative to each other. For many of the scripts
  83. supported in Unicode, these steps involve script-specific layout
  84. rules, including complex joining, reordering, and positioning
  85. behavior. Implementing these rules is the job of the shaping engine.
  86. </para>
  87. <para>
  88. Text shaping is a fairly low-level operation. HarfBuzz is
  89. used directly by text-handling libraries like <ulink
  90. url="https://www.pango.org/">Pango</ulink>, as well as by the layout
  91. engines in Firefox, LibreOffice, and Chromium. Unless you are
  92. <emphasis>writing</emphasis> one of these layout engines
  93. yourself, you will probably not need to use HarfBuzz: normally,
  94. a layout engine, toolkit, or other library will turn text into
  95. glyphs for you.
  96. </para>
  97. <para>
  98. However, if you <emphasis>are</emphasis> writing a layout engine
  99. or graphics library yourself, then you will need to perform text
  100. shaping, and this is where HarfBuzz can help you.
  101. </para>
  102. <para>
  103. Here are some specific scenarios where a text-shaping engine
  104. like HarfBuzz helps you:
  105. </para>
  106. <itemizedlist>
  107. <listitem>
  108. <para>
  109. OpenType fonts contain a set of glyphs (that is, shapes
  110. to represent the letters, numbers, punctuation marks, and
  111. all other symbols), which are indexed by a <literal>glyph ID</literal>.
  112. </para>
  113. <para>
  114. A particular glyph ID within the font does not necessarily
  115. correlate to a predictable Unicode codepoint. For instance,
  116. some fonts have the letter &quot;a&quot; as glyph ID 1, but
  117. many others do not. In order to retrieve the right glyph
  118. from the font to display &quot;a&quot;, you need to consult
  119. the table inside the font (the <literal>cmap</literal>
  120. table) that maps Unicode codepoints to glyph IDs. In other
  121. words, <emphasis>text shaping turns codepoints into glyph
  122. IDs</emphasis>.
  123. </para>
  124. </listitem>
  125. <listitem>
  126. <para>
  127. Many OpenType fonts contain ligatures: combinations of
  128. characters that are rendered as a single unit. For instance,
  129. it is common for the &quot;f, i&quot; letter
  130. sequence to appear in print as the single ligature glyph
  131. &quot;fi&quot;.
  132. </para>
  133. <para>
  134. Whether you should render an &quot;f, i&quot; sequence
  135. as <literal>fi</literal> or as &quot;fi&quot; does not
  136. depend on the input text. Instead, it depends on the whether
  137. or not the font includes an &quot;fi&quot; glyph and on the
  138. level of ligature application you wish to perform. The font
  139. and the amount of ligature application used are under your
  140. control. In other words, <emphasis>text shaping involves
  141. querying the font's ligature tables and determining what
  142. substitutions should be made</emphasis>.
  143. </para>
  144. </listitem>
  145. <listitem>
  146. <para>
  147. While ligatures like &quot;fi&quot; are optional typographic
  148. refinements, some languages <emphasis>require</emphasis> certain
  149. substitutions to be made in order to display text correctly.
  150. </para>
  151. <para>
  152. For example, in Tamil, when the letter &quot;TTA&quot; (ட)
  153. letter is followed by the vowel sign &quot;U&quot; (ு), the pair
  154. must be replaced by the single glyph &quot;டு&quot;. The
  155. sequence of Unicode characters &quot;ட,ு&quot; needs to be
  156. substituted with a single &quot;டு&quot; glyph from the
  157. font.
  158. </para>
  159. <para>
  160. But &quot;டு&quot; does not have a Unicode codepoint. To
  161. find this glyph, you need to consult the table inside
  162. the font (the <literal>GSUB</literal> table) that contains
  163. substitution information. In other words, <emphasis>text shaping
  164. chooses the correct glyph for a sequence of characters
  165. provided</emphasis>.
  166. </para>
  167. </listitem>
  168. <listitem>
  169. <para>
  170. Similarly, each Arabic character has four different variants
  171. corresponding to the different positions it might appear in
  172. within a sequence. Inside a font, there will be separate
  173. glyphs for the initial, medial, final, and isolated forms of
  174. each letter, each at a different glyph ID.
  175. </para>
  176. <para>
  177. Unicode only assigns one codepoint per character, so a
  178. Unicode string will not tell you which glyph variant to use
  179. for each character. To decide, you need to analyze the whole
  180. string and determine the appropriate glyph for each character
  181. based on its position. In other words, <emphasis>text
  182. shaping chooses the correct form of the letter by its
  183. position and returns the correct glyph from the font</emphasis>.
  184. </para>
  185. </listitem>
  186. <listitem>
  187. <para>
  188. Other languages involve marks and accents that need to be
  189. rendered in specific positions relative a base character. For
  190. instance, the Moldovan language includes the Cyrillic letter
  191. &quot;zhe&quot; (ж) with a breve accent, like so: &quot;ӂ&quot;.
  192. </para>
  193. <para>
  194. Some fonts will provide this character as a single
  195. zhe-with-breve glyph, but other fonts will not and, instead,
  196. will expect the rendering engine to form the character by
  197. superimposing the separate &quot;ж&quot; and &quot;˘&quot;
  198. glyphs.
  199. </para>
  200. <para>
  201. But exactly where you should draw the breve depends on the
  202. height and width of the preceding zhe glyph. To find the
  203. right position, you need to consult the table inside
  204. the font (the <literal>GPOS</literal> table) that contains
  205. positioning information.
  206. In other words, <emphasis>text shaping tells you whether you
  207. have a precomposed glyph within your font or if you need to
  208. compose a glyph yourself out of combining marks&mdash;and,
  209. if so, where to position those marks.</emphasis>
  210. </para>
  211. </listitem>
  212. </itemizedlist>
  213. <para>
  214. If tasks like these are something that you need to do, then you
  215. need a text shaping engine. You could use Uniscribe if you are
  216. writing Windows software; you could use CoreText on macOS; or
  217. you could use HarfBuzz.
  218. </para>
  219. <note>
  220. <para>
  221. In the rest of this manual, the text will assume that the reader
  222. is that implementor of a text-layout engine.
  223. </para>
  224. </note>
  225. </section>
  226. <section id="what-does-harfbuzz-do">
  227. <title>What does HarfBuzz do?</title>
  228. <para>
  229. HarfBuzz provides text shaping through a cross-platform
  230. C API that accepts sequences of Unicode codepoints as input. Currently,
  231. the following OpenType shaping models are supported:
  232. </para>
  233. <itemizedlist>
  234. <listitem>
  235. <para>
  236. Indic (covering Devanagari, Bengali, Gujarati,
  237. Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu)
  238. </para>
  239. </listitem>
  240. <listitem>
  241. <para>
  242. Arabic (covering Arabic, N'Ko, Syriac, and Mongolian)
  243. </para>
  244. </listitem>
  245. <listitem>
  246. <para>
  247. Thai and Lao
  248. </para>
  249. </listitem>
  250. <listitem>
  251. <para>
  252. Khmer
  253. </para>
  254. </listitem>
  255. <listitem>
  256. <para>
  257. Myanmar
  258. </para>
  259. </listitem>
  260. <listitem>
  261. <para>
  262. Tibetan
  263. </para>
  264. </listitem>
  265. <listitem>
  266. <para>
  267. Hangul
  268. </para>
  269. </listitem>
  270. <listitem>
  271. <para>
  272. Hebrew
  273. </para>
  274. </listitem>
  275. <listitem>
  276. <para>
  277. The Universal Shaping Engine or <emphasis>USE</emphasis>
  278. (covering complex scripts not covered by the above shaping
  279. models)
  280. </para>
  281. </listitem>
  282. <listitem>
  283. <para>
  284. A default shaping model for non-complex scripts
  285. (covering Latin, Cyrillic, Greek, Armenian, Georgian, Tifinagh,
  286. and many others)
  287. </para>
  288. </listitem>
  289. <listitem>
  290. <para>
  291. Emoji (including emoji modifier sequences, flag sequences,
  292. and ZWJ sequences)
  293. </para>
  294. </listitem>
  295. </itemizedlist>
  296. <para>
  297. In addition to OpenType shaping, HarfBuzz supports the latest
  298. version of Graphite shaping (the "Graphite 2" model) and AAT
  299. shaping.
  300. </para>
  301. <para>
  302. HarfBuzz can read and understand TrueType fonts (.ttf), TrueType
  303. collections (.ttc), and OpenType fonts (.otf, including those
  304. fonts that contain TrueType-style outlines and those that
  305. contain PostScript CFF or CFF2 outlines).
  306. </para>
  307. <para>
  308. HarfBuzz is designed and tested to run on top of the FreeType
  309. font renderer. It can run on Linux, Android, Windows, macOS, and
  310. iOS systems.
  311. </para>
  312. <para>
  313. In addition to its core shaping functionality, HarfBuzz provides
  314. functions for accessing other font features, including optional
  315. GSUB and GPOS OpenType features, as well as
  316. all color-font formats (<literal>CBDT</literal>,
  317. <literal>sbix</literal>, <literal>COLR/CPAL</literal>, and
  318. <literal>SVG-OT</literal>) and OpenType variable fonts. HarfBuzz
  319. also includes a font-subsetting feature. HarfBuzz can perform
  320. some low-level math-shaping operations, although it does not
  321. currently perform full shaping for mathematical typesetting.
  322. </para>
  323. <para>
  324. A suite of command-line utilities is also provided in the
  325. source-code tree, designed to help users test and debug
  326. HarfBuzz's features on real-world fonts and input.
  327. </para>
  328. </section>
  329. <section id="what-harfbuzz-doesnt-do">
  330. <title>What HarfBuzz doesn't do</title>
  331. <para>
  332. HarfBuzz will take a Unicode string, shape it, and give you the
  333. information required to lay it out correctly on a single
  334. horizontal (or vertical) line using the font provided. That is the
  335. extent of HarfBuzz's responsibility.
  336. </para>
  337. <para>
  338. It is important to note that if you are implementing a complete
  339. text-layout engine you may have other responsibilities that
  340. HarfBuzz will <emphasis>not</emphasis> help you with. For example:
  341. </para>
  342. <itemizedlist>
  343. <listitem>
  344. <para>
  345. HarfBuzz won't help you with bidirectionality. If you want to
  346. lay out text that includes a mix of Hebrew and English, you
  347. will need to ensure that each buffer provided to HarfBuzz
  348. has all of its characters in the same order and that the
  349. directionality of the buffer is set correctly. This may mean
  350. segmenting the text before it is placed into HarfBuzz buffers. In
  351. other words, the user will hit the keys in the following
  352. sequence:
  353. </para>
  354. <programlisting>
  355. A B C [space] ג ב א [space] D E F
  356. </programlisting>
  357. <para>
  358. but will expect to see in the output:
  359. </para>
  360. <programlisting>
  361. ABC אבג DEF
  362. </programlisting>
  363. <para>
  364. This reordering is called <emphasis>bidi processing</emphasis>
  365. (&quot;bidi&quot; is short for bidirectional), and there's an
  366. algorithm as an annex to the Unicode Standard which tells you how
  367. to process a string of mixed directionality.
  368. Before sending your string to HarfBuzz, you may need to apply the
  369. bidi algorithm to it. Libraries such as <ulink
  370. url="http://icu-project.org/">ICU</ulink> and <ulink
  371. url="http://fribidi.org/">fribidi</ulink> can do this for you.
  372. </para>
  373. </listitem>
  374. <listitem>
  375. <para>
  376. HarfBuzz won't help you with text that contains different font
  377. properties. For instance, if you have the string &quot;a
  378. <emphasis>huge</emphasis> breakfast&quot;, and you expect
  379. &quot;huge&quot; to be italic, then you will need to send three
  380. strings to HarfBuzz: <literal>a</literal>, in your Roman font;
  381. <literal>huge</literal> using your italic font; and
  382. <literal>breakfast</literal> using your Roman font again.
  383. </para>
  384. <para>
  385. Similarly, if you change the font, font size, script,
  386. language, or direction within your string, then you will
  387. need to shape each run independently and output them
  388. independently. HarfBuzz expects to shape a run of characters
  389. that all share the same properties.
  390. </para>
  391. </listitem>
  392. <listitem>
  393. <para>
  394. HarfBuzz won't help you with line breaking, hyphenation, or
  395. justification. As mentioned above, HarfBuzz lays out the string
  396. along a <emphasis>single line</emphasis> of, notionally,
  397. infinite length. If you want to find out where the potential
  398. word, sentence and line break points are in your text, you
  399. could use the ICU library's break iterator functions.
  400. </para>
  401. <para>
  402. HarfBuzz can tell you how wide a shaped piece of text is, which is
  403. useful input to a justification algorithm, but it knows nothing
  404. about paragraphs, lines or line lengths. Nor will it adjust the
  405. space between words to fit them proportionally into a line.
  406. </para>
  407. </listitem>
  408. </itemizedlist>
  409. <para>
  410. As a layout-engine implementor, HarfBuzz will help you with the
  411. interface between your text and your font, and that's something
  412. that you'll need&mdash;what you then do with the glyphs that your font
  413. returns is up to you.
  414. </para>
  415. </section>
  416. <section id="why-is-it-called-harfbuzz">
  417. <title>Why is it called HarfBuzz?</title>
  418. <para>
  419. HarfBuzz began its life as text-shaping code within the FreeType
  420. project (and you will see references to the FreeType authors
  421. within the source code copyright declarations), but was then
  422. extracted out to its own project. This project is maintained by
  423. Behdad Esfahbod, who named it HarfBuzz. Originally, it was a
  424. shaping engine for OpenType fonts&mdash;&quot;HarfBuzz&quot; is
  425. the Persian for &quot;open type&quot;.
  426. </para>
  427. </section>
  428. </chapter>