| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441 |
- <?xml version="1.0"?>
- <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
- "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
- <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
- <!ENTITY version SYSTEM "version.xml">
- ]>
- <chapter id="what-is-harfbuzz">
- <title>What is HarfBuzz?</title>
- <para>
- HarfBuzz is a <emphasis>text-shaping engine</emphasis>. If you
- give HarfBuzz a font and a string containing a sequence of Unicode
- codepoints, HarfBuzz selects and positions the corresponding
- glyphs from the font, applying all of the necessary layout rules
- and font features. HarfBuzz then returns the string to you in the
- form that is correctly arranged for the language and writing
- system.
- </para>
- <para>
- HarfBuzz can properly shape all of the world's major writing
- systems. It runs on all major operating systems and software
- platforms and it supports the major font formats in use
- today.
- </para>
- <section id="what-is-text-shaping">
- <title>What is text shaping?</title>
- <para>
- Text shaping is the process of translating a string of character
- codes (such as Unicode codepoints) into a properly arranged
- sequence of glyphs that can be rendered onto a screen or into
- final output form for inclusion in a document.
- </para>
- <para>
- The shaping process is dependent on the input string, the active
- font, the script (or writing system) that the string is in, and
- the language that the string is in.
- </para>
- <para>
- Modern software systems generally only deal with strings in the
- Unicode encoding scheme (although legacy systems and documents may
- involve other encodings).
- </para>
- <para>
- There are several font formats that a program might
- encounter, each of which has a set of standard text-shaping
- rules.
- </para>
- <para>The dominant format is <ulink
- url="http://www.microsoft.com/typography/otspec/">OpenType</ulink>. The
- OpenType specification defines a series of <ulink url="https://github.com/n8willis/opentype-shaping-documents">shaping models</ulink> for
- various scripts from around the world. These shaping models depend on
- the font incorporating certain features as
- <emphasis>lookups</emphasis> in its <literal>GSUB</literal>
- and <literal>GPOS</literal> tables.
- </para>
- <para>
- Alternatively, OpenType fonts can include shaping features for
- the <ulink url="https://graphite.sil.org/">Graphite</ulink> shaping model.
- </para>
- <para>
- TrueType fonts can also include OpenType shaping
- features. Alternatively, TrueType fonts can also include <ulink url="https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html">Apple
- Advanced Typography</ulink> (AAT) tables to implement shaping
- support. AAT fonts are generally only found on macOS and iOS systems.
- </para>
- <para>
- Text strings will usually be tagged with a script and language
- tag that provide the context needed to perform text shaping
- correctly. The necessary <ulink
- url="https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags">script</ulink>
- and <ulink
- url="https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags">language</ulink>
- tags are defined by OpenType.
- </para>
- </section>
-
- <section id="why-do-i-need-a-shaping-engine">
- <title>Why do I need a shaping engine?</title>
- <para>
- Text shaping is an integral part of preparing text for
- display. Before a Unicode sequence can be rendered, the
- codepoints in the sequence must be mapped to the corresponding
- glyphs provided in the font, and those glyphs must be positioned
- correctly relative to each other. For many of the scripts
- supported in Unicode, these steps involve script-specific layout
- rules, including complex joining, reordering, and positioning
- behavior. Implementing these rules is the job of the shaping engine.
- </para>
- <para>
- Text shaping is a fairly low-level operation. HarfBuzz is
- used directly by text-handling libraries like <ulink
- url="https://www.pango.org/">Pango</ulink>, as well as by the layout
- engines in Firefox, LibreOffice, and Chromium. Unless you are
- <emphasis>writing</emphasis> one of these layout engines
- yourself, you will probably not need to use HarfBuzz: normally,
- a layout engine, toolkit, or other library will turn text into
- glyphs for you.
- </para>
- <para>
- However, if you <emphasis>are</emphasis> writing a layout engine
- or graphics library yourself, then you will need to perform text
- shaping, and this is where HarfBuzz can help you.
- </para>
- <para>
- Here are some specific scenarios where a text-shaping engine
- like HarfBuzz helps you:
- </para>
- <itemizedlist>
- <listitem>
- <para>
- OpenType fonts contain a set of glyphs (that is, shapes
- to represent the letters, numbers, punctuation marks, and
- all other symbols), which are indexed by a <literal>glyph ID</literal>.
- </para>
- <para>
- A particular glyph ID within the font does not necessarily
- correlate to a predictable Unicode codepoint. For instance,
- some fonts have the letter "a" as glyph ID 1, but
- many others do not. In order to retrieve the right glyph
- from the font to display "a", you need to consult
- the table inside the font (the <literal>cmap</literal>
- table) that maps Unicode codepoints to glyph IDs. In other
- words, <emphasis>text shaping turns codepoints into glyph
- IDs</emphasis>.
- </para>
- </listitem>
- <listitem>
- <para>
- Many OpenType fonts contain ligatures: combinations of
- characters that are rendered as a single unit. For instance,
- it is common for the "f, i" letter
- sequence to appear in print as the single ligature glyph
- "fi".
- </para>
- <para>
- Whether you should render an "f, i" sequence
- as <literal>fi</literal> or as "fi" does not
- depend on the input text. Instead, it depends on the whether
- or not the font includes an "fi" glyph and on the
- level of ligature application you wish to perform. The font
- and the amount of ligature application used are under your
- control. In other words, <emphasis>text shaping involves
- querying the font's ligature tables and determining what
- substitutions should be made</emphasis>.
- </para>
- </listitem>
- <listitem>
- <para>
- While ligatures like "fi" are optional typographic
- refinements, some languages <emphasis>require</emphasis> certain
- substitutions to be made in order to display text correctly.
- </para>
- <para>
- For example, in Tamil, when the letter "TTA" (ட)
- letter is followed by the vowel sign "U" (ு), the pair
- must be replaced by the single glyph "டு". The
- sequence of Unicode characters "ட,ு" needs to be
- substituted with a single "டு" glyph from the
- font.
- </para>
- <para>
- But "டு" does not have a Unicode codepoint. To
- find this glyph, you need to consult the table inside
- the font (the <literal>GSUB</literal> table) that contains
- substitution information. In other words, <emphasis>text shaping
- chooses the correct glyph for a sequence of characters
- provided</emphasis>.
- </para>
- </listitem>
- <listitem>
- <para>
- Similarly, each Arabic character has four different variants
- corresponding to the different positions it might appear in
- within a sequence. Inside a font, there will be separate
- glyphs for the initial, medial, final, and isolated forms of
- each letter, each at a different glyph ID.
- </para>
- <para>
- Unicode only assigns one codepoint per character, so a
- Unicode string will not tell you which glyph variant to use
- for each character. To decide, you need to analyze the whole
- string and determine the appropriate glyph for each character
- based on its position. In other words, <emphasis>text
- shaping chooses the correct form of the letter by its
- position and returns the correct glyph from the font</emphasis>.
- </para>
- </listitem>
- <listitem>
- <para>
- Other languages involve marks and accents that need to be
- rendered in specific positions relative a base character. For
- instance, the Moldovan language includes the Cyrillic letter
- "zhe" (ж) with a breve accent, like so: "ӂ".
- </para>
- <para>
- Some fonts will provide this character as a single
- zhe-with-breve glyph, but other fonts will not and, instead,
- will expect the rendering engine to form the character by
- superimposing the separate "ж" and "˘"
- glyphs.
- </para>
- <para>
- But exactly where you should draw the breve depends on the
- height and width of the preceding zhe glyph. To find the
- right position, you need to consult the table inside
- the font (the <literal>GPOS</literal> table) that contains
- positioning information.
- In other words, <emphasis>text shaping tells you whether you
- have a precomposed glyph within your font or if you need to
- compose a glyph yourself out of combining marks—and,
- if so, where to position those marks.</emphasis>
- </para>
- </listitem>
- </itemizedlist>
- <para>
- If tasks like these are something that you need to do, then you
- need a text shaping engine. You could use Uniscribe if you are
- writing Windows software; you could use CoreText on macOS; or
- you could use HarfBuzz.
- </para>
- <note>
- <para>
- In the rest of this manual, the text will assume that the reader
- is that implementor of a text-layout engine.
- </para>
- </note>
- </section>
-
- <section id="what-does-harfbuzz-do">
- <title>What does HarfBuzz do?</title>
- <para>
- HarfBuzz provides text shaping through a cross-platform
- C API that accepts sequences of Unicode codepoints as input. Currently,
- the following OpenType shaping models are supported:
- </para>
- <itemizedlist>
- <listitem>
- <para>
- Indic (covering Devanagari, Bengali, Gujarati,
- Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu)
- </para>
- </listitem>
- <listitem>
- <para>
- Arabic (covering Arabic, N'Ko, Syriac, and Mongolian)
- </para>
- </listitem>
- <listitem>
- <para>
- Thai and Lao
- </para>
- </listitem>
- <listitem>
- <para>
- Khmer
- </para>
- </listitem>
- <listitem>
- <para>
- Myanmar
- </para>
- </listitem>
-
- <listitem>
- <para>
- Tibetan
- </para>
- </listitem>
-
- <listitem>
- <para>
- Hangul
- </para>
- </listitem>
-
- <listitem>
- <para>
- Hebrew
- </para>
- </listitem>
- <listitem>
- <para>
- The Universal Shaping Engine or <emphasis>USE</emphasis>
- (covering complex scripts not covered by the above shaping
- models)
- </para>
- </listitem>
- <listitem>
- <para>
- A default shaping model for non-complex scripts
- (covering Latin, Cyrillic, Greek, Armenian, Georgian, Tifinagh,
- and many others)
- </para>
- </listitem>
- <listitem>
- <para>
- Emoji (including emoji modifier sequences, flag sequences,
- and ZWJ sequences)
- </para>
- </listitem>
- </itemizedlist>
- <para>
- In addition to OpenType shaping, HarfBuzz supports the latest
- version of Graphite shaping (the "Graphite 2" model) and AAT
- shaping.
- </para>
-
- <para>
- HarfBuzz can read and understand TrueType fonts (.ttf), TrueType
- collections (.ttc), and OpenType fonts (.otf, including those
- fonts that contain TrueType-style outlines and those that
- contain PostScript CFF or CFF2 outlines).
- </para>
- <para>
- HarfBuzz is designed and tested to run on top of the FreeType
- font renderer. It can run on Linux, Android, Windows, macOS, and
- iOS systems.
- </para>
-
- <para>
- In addition to its core shaping functionality, HarfBuzz provides
- functions for accessing other font features, including optional
- GSUB and GPOS OpenType features, as well as
- all color-font formats (<literal>CBDT</literal>,
- <literal>sbix</literal>, <literal>COLR/CPAL</literal>, and
- <literal>SVG-OT</literal>) and OpenType variable fonts. HarfBuzz
- also includes a font-subsetting feature. HarfBuzz can perform
- some low-level math-shaping operations, although it does not
- currently perform full shaping for mathematical typesetting.
- </para>
-
- <para>
- A suite of command-line utilities is also provided in the
- source-code tree, designed to help users test and debug
- HarfBuzz's features on real-world fonts and input.
- </para>
- </section>
- <section id="what-harfbuzz-doesnt-do">
- <title>What HarfBuzz doesn't do</title>
- <para>
- HarfBuzz will take a Unicode string, shape it, and give you the
- information required to lay it out correctly on a single
- horizontal (or vertical) line using the font provided. That is the
- extent of HarfBuzz's responsibility.
- </para>
- <para>
- It is important to note that if you are implementing a complete
- text-layout engine you may have other responsibilities that
- HarfBuzz will <emphasis>not</emphasis> help you with. For example:
- </para>
- <itemizedlist>
- <listitem>
- <para>
- HarfBuzz won't help you with bidirectionality. If you want to
- lay out text that includes a mix of Hebrew and English, you
- will need to ensure that each buffer provided to HarfBuzz
- has all of its characters in the same order and that the
- directionality of the buffer is set correctly. This may mean
- segmenting the text before it is placed into HarfBuzz buffers. In
- other words, the user will hit the keys in the following
- sequence:
- </para>
- <programlisting>
- A B C [space] ג ב א [space] D E F
- </programlisting>
- <para>
- but will expect to see in the output:
- </para>
- <programlisting>
- ABC אבג DEF
- </programlisting>
- <para>
- This reordering is called <emphasis>bidi processing</emphasis>
- ("bidi" is short for bidirectional), and there's an
- algorithm as an annex to the Unicode Standard which tells you how
- to process a string of mixed directionality.
- Before sending your string to HarfBuzz, you may need to apply the
- bidi algorithm to it. Libraries such as <ulink
- url="http://icu-project.org/">ICU</ulink> and <ulink
- url="http://fribidi.org/">fribidi</ulink> can do this for you.
- </para>
- </listitem>
- <listitem>
- <para>
- HarfBuzz won't help you with text that contains different font
- properties. For instance, if you have the string "a
- <emphasis>huge</emphasis> breakfast", and you expect
- "huge" to be italic, then you will need to send three
- strings to HarfBuzz: <literal>a</literal>, in your Roman font;
- <literal>huge</literal> using your italic font; and
- <literal>breakfast</literal> using your Roman font again.
- </para>
- <para>
- Similarly, if you change the font, font size, script,
- language, or direction within your string, then you will
- need to shape each run independently and output them
- independently. HarfBuzz expects to shape a run of characters
- that all share the same properties.
- </para>
- </listitem>
- <listitem>
- <para>
- HarfBuzz won't help you with line breaking, hyphenation, or
- justification. As mentioned above, HarfBuzz lays out the string
- along a <emphasis>single line</emphasis> of, notionally,
- infinite length. If you want to find out where the potential
- word, sentence and line break points are in your text, you
- could use the ICU library's break iterator functions.
- </para>
- <para>
- HarfBuzz can tell you how wide a shaped piece of text is, which is
- useful input to a justification algorithm, but it knows nothing
- about paragraphs, lines or line lengths. Nor will it adjust the
- space between words to fit them proportionally into a line.
- </para>
- </listitem>
- </itemizedlist>
- <para>
- As a layout-engine implementor, HarfBuzz will help you with the
- interface between your text and your font, and that's something
- that you'll need—what you then do with the glyphs that your font
- returns is up to you.
- </para>
- </section>
-
- <section id="why-is-it-called-harfbuzz">
- <title>Why is it called HarfBuzz?</title>
- <para>
- HarfBuzz began its life as text-shaping code within the FreeType
- project (and you will see references to the FreeType authors
- within the source code copyright declarations), but was then
- extracted out to its own project. This project is maintained by
- Behdad Esfahbod, who named it HarfBuzz. Originally, it was a
- shaping engine for OpenType fonts—"HarfBuzz" is
- the Persian for "open type".
- </para>
- </section>
- </chapter>
|