| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666 |
- Encapsulation of FLAC in ISO Base Media File Format
- Version 0.0.4 (draft)
- Table of Contents
- 1 Scope
- 2 Supporting Normative References
- 3 Design Rules of Encapsulation
- 3.1 File Type Identification
- 3.2 Overview of Track Structure
- 3.3 Definition of FLAC sample
- 3.3.1 Sample entry format
- 3.3.2 FLAC Specific Box
- 3.3.3 Sample format
- 3.3.4 Duration of FLAC sample
- 3.3.5 Sub-sample
- 3.3.6 Random Access
- 3.3.6.1 Random Access Point
- 3.4 Basic Structure (informative)
- 3.4.1 Initial Movie
- 3.5 Example of Encapsulation (informative)
- 4 Acknowledgements
- 5 Author's Address
- 1 Scope
- This document specifies the normative mapping for encapsulation of
- FLAC coded audio bitstreams in ISO Base Media file format and its
- derivatives. The encapsulation of FLAC coded bitstreams in
- QuickTime file format is outside the scope of this specification.
- 2 Supporting Normative References
- [1] ISO/IEC 14496-12:2012 Corrected version
- Information technology — Coding of audio-visual objects — Part
- 12: ISO base media file format
- [2] ISO/IEC 14496-12:2012/Amd.1:2013
- Information technology — Coding of audio-visual objects — Part
- 12: ISO base media file format AMENDMENT 1: Various
- enhancements including support for large metadata
- [3] FLAC format specification
- https://xiph.org/flac/format.html
- Definition of the FLAC Audio Codec stream format
- [4] FLAC-in-Ogg mapping specification
- https://xiph.org/flac/ogg_mapping.html
- Ogg Encapsulation for the FLAC Audio Codec
- [5] Matroska specification
- 3 Design Rules of Encapsulation
- 3.1 File Type Identification
-
- This specification does not define any brand to declare files
- which conform to this specification. Files which conform to
- this specification shall contain at least one brand which
- supports the requirements and the requirements described in
- this clause without contradiction in the compatible brands
- list of the File Type Box. The minimal support of the
- encapsulation of FLAC bitstreams in ISO Base Media file format
- requires the 'isom' brand.
-
- 3.2 Overview of Track Structure
- FLAC coded audio shall be encapsulated into the ISO Base
- Media File Format as media data within an audio track.
-
- + The handler_type field in the Handler Reference Box
- shall be set to 'soun'.
-
- + The Media Information Box shall contain the Sound Media
- Header Box.
-
- + The codingname of the sample entry is 'fLaC'.
-
- This specification does not define any encapsulation
- using MP4AudioSampleEntry with objectTypeIndication
- specified by the MPEG-4 Registration Authority
- (http://www.mp4ra.org/). See section 'Sample entry
- format' for the definition of the sample entry.
-
- + The 'dfLa' box is added to the sample entry to convey
- initializing information for the decoder.
- See section 'FLAC Specific Box' for the definition of
- the box contents.
- + A FLAC sample is exactly one FLAC frame as described
- in the format specification[3]. See section
- 'Sample format' for details of the frame contents.
- + Every FLAC sample is a sync sample. No pre-roll or
- lapping is required. See section 'Random Access' for
- further details.
- 3.3 Definition of a FLAC sample
-
- 3.3.1 Sample entry format
- For any track containing one or more FLAC bitstreams, a
- sample entry describing the corresponding FLAC bitstream
- shall be present inside the Sample Table Box. This version
- of the specification defines only one sample entry format
- named FLACSampleEntry whose codingname is 'fLaC'. This
- sample entry includes exactly one FLAC Specific Box
- defined in section 'FLAC specific box' as a mandatory box
- and indicates that FLAC samples described by this sample
- entry are stored by the sample format described in section
- 'Sample format'.
- The syntax and semantics of the FLACSampleEntry is shown
- as follows. The data fields of this box and native
- FLAC[3] structures encoded within FLAC blocks are both
- stored in big-endian format, though for purposes of the
- ISO BMFF container, FLAC native metadata and data blocks
- are treated as unstructured octet streams.
-
- class FLACSampleEntry() extends AudioSampleEntry ('fLaC'){
- FLACSpecificBox();
- }
- The fields of the AudioSampleEntry portion shall be set as
- follows:
- + channelcount:
- The channelcount field shall be set equal to the
- channel count specified by the FLAC bitstream's native
- METADATA_BLOCK_STREAMINFO header as described in [3].
- Note that the FLAC FRAME_HEADER structure that begins
- each FLAC sample redundantly encodes channel number;
- the number of channels declared in each FRAME_HEADER
- MUST match the number of channels declared here and in
- the METADATA_BLOCK_STREAMINFO header.
- + samplesize:
- The samplesize field shall be set equal to the bits
- per sample specified by the FLAC bitstream's native
- METADATA_BLOCK_STREAMINFO header as described in [3].
- Note that the FLAC FRAME_HEADER structure that begins
- each FLAC sample redundantly encodes the number of
- bits per sample; the bits per sample declared in each
- FRAME_HEADER MUST match the samplesize declared here
- and the bits per sample field declared in the
- METADATA_BLOCK_STREAMINFO header.
-
- + samplerate:
- When possible, the samplerate field shall be set
- equal to the sample rate specified by the FLAC
- bitstream's native METADATA_BLOCK_STREAMINFO header
- as described in [3], left-shifted by 16 bits to
- create the appropriate 16.16 fixed-point
- representation.
- When the bitstream's native sample rate is greater
- than the maximum expressible value of 65535 Hz,
- the samplerate field shall hold the greatest
- expressible regular division of that rate. I.e.
- the samplerate field shall hold 48000.0 for
- native sample rates of 96 and 192 kHz. In the
- case of unusual sample rates which do not have
- an expressible regular division, the maximum value
- of 65535.0 Hz should be used.
- High-rate FLAC bitstreams are common, and the native
- value from the METADATA_BLOCK_STREAMINFO header in
- the FLACSpecificBox MUST be read to determine the
- correct sample rate of the bitstream.
- Note that the FLAC FRAME_HEADER structure that begins
- each FLAC sample redundantly encodes the sample rate;
- the sample rate declared in each FRAME_HEADER MUST
- match the sample rate declared in the
- METADATA_BLOCK_STREAMINFO header, and here in the
- AudioSampleEntry portion of the FLACSampleEntry
- as much as is allowed by the encoding restrictions
- described above.
-
- Finally, the FLACSpecificBox carries codec headers:
- + FLACSpecificBox
-
- This box contains initializing information for the
- decoder as defined in section 'FLAC specific box'.
- 3.3.2 FLAC Specific Box
-
- Exactly one FLAC Specific Box shall be present in each
- FLACSampleEntry. This specification defines version 0
- of this box. If incompatible changes occur in future
- versions of this specification, another version number
- will be defined. The data fields of this box and native
- FLAC[3] structures encoded within FLAC blocks are both
- stored in big-endian format, though for purposes of the
- ISO BMFF container, FLAC native metadata and data blocks
- are treated as unstructured octet streams.
- The syntax and semantics of the FLAC Specific Box is shown
- as follows.
- class FLACMetadataBlock {
- unsigned int(1) LastMetadataBlockFlag;
- unsigned int(7) BlockType;
- unsigned int(24) Length;
- unsigned int(8) BlockData[Length];
- }
- aligned(8) class FLACSpecificBox
- extends FullBox('dfLa', version=0, 0){
- for (i=0; ; i++) { // to end of box
- FLACMetadataBlock();
- }
- }
- + Version:
- The Version field shall be set to 0.
- In the future versions of this specification, this
- field may be set to other values. And without support
- of those values, the reader shall not read the fields
- after this within the FLACSpecificBox.
- + Flags:
- The Flags field shall be set to 0.
- After the FullBox header, the box contains a sequence of
- FLAC[3] native-metadata block structures that fill the
- remainder of the box.
- Each FLACMetadataBlock structure consists of three fields
- filling a total of four bytes that form a FLAC[3] native
- METADATA_BLOCK_HEADER, followed by raw octet bytes that
- comprise the FLAC[3] native METADATA_BLOCK_DATA.
- + LastMetadataBlockFlag:
- The LastMetadataBlockFlag field maps semantically to
- the FLAC[3] native METADATA_BLOCK_HEADER
- Last-metadata-block flag as defined in the FLAC[3]
- file specification.
-
- The LastMetadataBlockFlag is set to 1 if this
- MetadataBlock is the last metadata block in the
- FLACSpecificBox. It is set to 0 otherwise.
-
- + BlockType:
- The BlockType field maps semantically to the FLAC[3]
- native METADATA_BLOCK_HEADER BLOCK_TYPE field as
- defined in the FLAC[3] file specification.
- The BlockType is set to a valid FLAC[3] BLOCK_TYPE
- value that identifies the type of this native metadata
- block. The BlockType of the first FLACMetadataBlock
- must be set to 0, signifying this is a FLAC[3] native
- METADATA_BLOCK_STREAMINFO block.
-
- + Length:
- The Length field maps semantically to the FLAC[3]
- native METADATA_BLOCK_HEADER Length field as
- defined in the FLAC[3] file specification.
- The length field specifies the number of bytes of
- MetadataBlockData to follow.
- + BlockData
- The BlockData field maps semantically to the FLAC[3]
- native METADATA_BLOCK_HEADER METADATA_BLOCK_DATA as
- defined in the FLAC[3] file specification.
- Taken together, the bytes of the FLACMetadataBlock form a
- complete FLAC[3] native METADATA_BLOCK structure.
- Note that a minimum of a single FLACMetadataBlock,
- consisting of a FLAC[3] native METADATA_BLOCK_STREAMINFO
- structure, is required. Should the FLACSpecificBox
- contain more than a single FLACMetadataBlock structure,
- the FLACMetadataBlock containing the FLAC[3] native
- METADATA_BLOCK_STREAMINFO must occur first in the list.
- Other containers that package FLAC audio streams, such as
- Ogg[4] and Matroska[5], wrap FLAC[3] native metadata without
- modification similar to this specification. When
- repackaging or remuxing FLAC[3] streams from another
- format that contains FLAC[3] native metadata into an ISO
- BMFF file, the complete FLAC[3] native metadata should be
- preserved in the ISO BMFF stream as described above. It
- is also allowed to parse this native metadata and include
- contextually redundant ISO BMFF-native repackagings and/or
- reparsings of FLAC[3] native metadata, so long as the
- native metadata is also preserved.
- 3.3.3 Sample format
-
- A FLAC sample is exactly one FLAC audio FRAME (as defined
- in the FLAC[3] file specification) belonging to a FLAC
- bitstreams. The FLAC sample data begins with a complete
- FLAC FRAME_HEADER, followed by one FLAC SUBFRAME per
- channel, any necessary bit padding, and ends with the
- usual FLAC FRAME_FOOTER.
- Note that the FLAC native FRAME_HEADER structure that
- begins each FLAC sample redundantly encodes channel count,
- sample rate, and sample size. The values of these fields
- must agree both with the values declared in the FLAC
- METADATA_BLOCK_STREAMINFO structure as well as the
- FLACSampleEntry box.
- 3.3.4 Duration of a FLAC sample
- The duration of any given FLAC sample is determined by
- dividing the decoded block size of a FLAC frame, as
- encoded in the FLAC FRAME's FRAME_HEADER structure, by the
- value of the timescale field in the Media Header Box.
- FLAC samples are permitted to have variable durations
- within a given audio stream. FLAC does not use padding
- values.
- 3.3.5 Sub-sample
- Sub-samples are not defined for FLAC samples in this
- specification.
- 3.3.6 Random Access
-
- This subclause describes the nature of the random access
- of FLAC sample.
- 3.3.6.1 Random Access Point
-
- All FLAC samples can be independently decoded
- i.e. every FLAC sample is a sync sample. The Sync
- Sample Box shall not be present as long as there are
- no samples other than FLAC samples in the same
- track. The sample_is_non_sync_sample field for FLAC
- samples shall be set to 0.
- 3.4 Basic Structure (informative)
- 3.4.1 Initial Movie
-
- This subclause shows a basic structure of the Movie Box as follows:
- +----+----+----+----+----+----+----+----+------------------------------+
- |moov| | | | | | | | Movie Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | |mvhd| | | | | | | Movie Header Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | |trak| | | | | | | Track Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | |tkhd| | | | | | Track Header Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | |edts|* | | | | | Edit Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | |elst|* | | | | Edit List Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | |mdia| | | | | | Media Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | |mdhd| | | | | Media Header Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | |hdlr| | | | | Handler Reference Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | |minf| | | | | Media Information Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | |smhd| | | | Sound Media Header Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | |dinf| | | | Data Information Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | | |dref| | | Data Reference Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | | | |url | | DataEntryUrlBox |
- +----+----+----+----+----+----+ or +----+------------------------------+
- | | | | | | |urn | | DataEntryUrnBox |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | |stbl| | | | Sample Table |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | | |stsd| | | Sample Description Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | | | |fLaC| | FLACSampleEntry |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | | | | |dfLa| FLAC Specific Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | | |stts| | | Decoding Time to Sample Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | | |stsc| | | Sample To Chunk Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | | |stsz| | | Sample Size Box |
- +----+----+----+----+----+ or +----+----+------------------------------+
- | | | | | |stz2| | | Compact Sample Size Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | | | | |stco| | | Chunk Offset Box |
- +----+----+----+----+----+ or +----+----+------------------------------+
- | | | | | |co64| | | Chunk Large Offset Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | |mvex|* | | | | | | Movie Extends Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- | | |trex|* | | | | | Track Extends Box |
- +----+----+----+----+----+----+----+----+------------------------------+
- Figure 1 - Basic structure of Movie Box
- It is strongly recommended that the order of boxes should
- follow the above structure. Boxes marked with an asterisk
- (*) may or may not be present depending on context. For
- most boxes listed above, the definition is as is defined
- in ISO/IEC 14496-12 [1]. The additional boxes and the
- additional requirements, restrictions and recommendations
- to the other boxes are described in this specification.
-
- 3.5 Example of Encapsulation (informative)
- [File]
- size = 17790
- [ftyp: File Type Box]
- position = 0
- size = 24
- major_brand = mp42 : MP4 version 2
- minor_version = 0
- compatible_brands
- brand[0] = mp42 : MP4 version 2
- brand[1] = isom : ISO Base Media file format
- [moov: Movie Box]
- position = 24
- size = 757
- [mvhd: Movie Header Box]
- position = 32
- size = 108
- version = 0
- flags = 0x000000
- creation_time = UTC 2014/12/12, 18:41:19
- modification_time = UTC 2014/12/12, 18:41:19
- timescale = 48000
- duration = 33600 (00:00:00.700)
- rate = 1.000000
- volume = 1.000000
- reserved = 0x0000
- reserved = 0x00000000
- reserved = 0x00000000
- transformation matrix
- | a, b, u | | 1.000000, 0.000000, 0.000000 |
- | c, d, v | = | 0.000000, 1.000000, 0.000000 |
- | x, y, w | | 0.000000, 0.000000, 1.000000 |
- pre_defined = 0x00000000
- pre_defined = 0x00000000
- pre_defined = 0x00000000
- pre_defined = 0x00000000
- pre_defined = 0x00000000
- pre_defined = 0x00000000
- next_track_ID = 2
- [iods: Object Descriptor Box]
- position = 140
- size = 33
- version = 0
- flags = 0x000000
- [tag = 0x10: MP4_IOD]
- expandableClassSize = 16
- ObjectDescriptorID = 1
- URL_Flag = 0
- includeInlineProfileLevelFlag = 0
- reserved = 0xf
- ODProfileLevelIndication = 0xff
- sceneProfileLevelIndication = 0xff
- audioProfileLevelIndication = 0xfe
- visualProfileLevelIndication = 0xff
- graphicsProfileLevelIndication = 0xff
- [tag = 0x0e: ES_ID_Inc]
- expandableClassSize = 4
- Track_ID = 1
- [trak: Track Box]
- position = 173
- size = 608
- [tkhd: Track Header Box]
- position = 181
- size = 92
- version = 0
- flags = 0x000007
- Track enabled
- Track in movie
- Track in preview
- creation_time = UTC 2014/12/12, 18:41:19
- modification_time = UTC 2014/12/12, 18:41:19
- track_ID = 1
- reserved = 0x00000000
- duration = 33600 (00:00:00.700)
- reserved = 0x00000000
- reserved = 0x00000000
- layer = 0
- alternate_group = 0
- volume = 1.000000
- reserved = 0x0000
- transformation matrix
- | a, b, u | | 1.000000, 0.000000, 0.000000 |
- | c, d, v | = | 0.000000, 1.000000, 0.000000 |
- | x, y, w | | 0.000000, 0.000000, 1.000000 |
- width = 0.000000
- height = 0.000000
- [mdia: Media Box]
- position = 273
- size = 472
- [mdhd: Media Header Box]
- position = 281
- size = 32
- version = 0
- flags = 0x000000
- creation_time = UTC 2014/12/12, 18:41:19
- modification_time = UTC 2014/12/12, 18:41:19
- timescale = 48000
- duration = 34560 (00:00:00.720)
- language = und
- pre_defined = 0x0000
- [hdlr: Handler Reference Box]
- position = 313
- size = 51
- version = 0
- flags = 0x000000
- pre_defined = 0x00000000
- handler_type = soun
- reserved = 0x00000000
- reserved = 0x00000000
- reserved = 0x00000000
- name = Xiph Audio Handler
- [minf: Media Information Box]
- position = 364
- size = 381
- [smhd: Sound Media Header Box]
- position = 372
- size = 16
- version = 0
- flags = 0x000000
- balance = 0.000000
- reserved = 0x0000
- [dinf: Data Information Box]
- position = 388
- size = 36
- [dref: Data Reference Box]
- position = 396
- size = 28
- version = 0
- flags = 0x000000
- entry_count = 1
- [url : Data Entry Url Box]
- position = 412
- size = 12
- version = 0
- flags = 0x000001
- location = in the same file
- [stbl: Sample Table Box]
- position = 424
- size = 321
- [stsd: Sample Description Box]
- position = 432
- size = 79
- version = 0
- flags = 0x000000
- entry_count = 1
- [fLaC: Audio Description]
- position = 448
- size = 63
- reserved = 0x000000000000
- data_reference_index = 1
- reserved = 0x0000
- reserved = 0x0000
- reserved = 0x00000000
- channelcount = 2
- samplesize = 16
- pre_defined = 0
- reserved = 0
- samplerate = 48000.000000
- [dfLa: FLAC Specific Box]
- position = 484
- size = 50
- version = 0
- flags = 0x000000
- [FLACMetadataBlock]
- LastMetadataBlockFlag = 1
- BlockType = 0
- Length = 34
- BlockData[34];
- [stts: Decoding Time to Sample Box]
- position = 492
- size = 24
- version = 0
- flags = 0x000000
- entry_count = 1
- entry[0]
- sample_count = 18
- sample_delta = 1920
- [stsc: Sample To Chunk Box]
- position = 516
- size = 40
- version = 0
- flags = 0x000000
- entry_count = 2
- entry[0]
- first_chunk = 1
- samples_per_chunk = 13
- sample_description_index = 1
- entry[1]
- first_chunk = 2
- samples_per_chunk = 5
- sample_description_index = 1
- [stsz: Sample Size Box]
- position = 556
- size = 92
- version = 0
- flags = 0x000000
- sample_size = 0 (variable)
- sample_count = 18
- entry_size[0] = 977
- entry_size[1] = 938
- entry_size[2] = 939
- entry_size[3] = 938
- entry_size[4] = 934
- entry_size[5] = 945
- entry_size[6] = 948
- entry_size[7] = 956
- entry_size[8] = 955
- entry_size[9] = 930
- entry_size[10] = 933
- entry_size[11] = 934
- entry_size[12] = 972
- entry_size[13] = 977
- entry_size[14] = 958
- entry_size[15] = 949
- entry_size[16] = 962
- entry_size[17] = 848
- [stco: Chunk Offset Box]
- position = 648
- size = 24
- version = 0
- flags = 0x000000
- entry_count = 2
- chunk_offset[0] = 686
- chunk_offset[1] = 12985
- [free: Free Space Box]
- position = 672
- size = 8
- [mdat: Media Data Box]
- position = 680
- size = 17001
- 4 Acknowledgements
- This spec draws heavily from the Opus-in-ISOBMFF specification
- work done by Yusuke Nakamura <muken.the.vfrmaniac |at| gmail.com>
- Thank you to Tim Terriberry, David Evans, and Yusuke Nakamura
- for valuable feedback. Thank you to Ralph Giles for editorial
- help.
- 5 Author Address
- Monty Montgomery <[email protected]>
|