123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436 |
- LIBARCHIVE-FORMATS(5) manual page
- == NAME ==
- '''libarchive-formats'''
- - archive formats supported by the libarchive library
- == DESCRIPTION ==
- The
- [[ManPageLibarchive3]]
- library reads and writes a variety of streaming archive formats.
- Generally speaking, all of these archive formats consist of a series of
- "entries".
- Each entry stores a single file system object, such as a file, directory,
- or symbolic link.
- The following provides a brief description of each format supported
- by libarchive, with some information about recognized extensions or
- limitations of the current library support.
- Note that just because a format is supported by libarchive does not
- imply that a program that uses libarchive will support that format.
- Applications that use libarchive specify which formats they wish
- to support, though many programs do use libarchive convenience
- functions to enable all supported formats.
- === Tar Formats===
- The
- [[ManPageLibarchive3]]
- library can read most tar archives.
- It can write POSIX-standard
- "ustar"
- and
- "pax interchange"
- formats as well as v7 tar format and a subset of the legacy GNU tar format.
- All tar formats store each entry in one or more 512-byte records.
- The first record is used for file metadata, including filename,
- timestamp, and mode information, and the file data is stored in
- subsequent records.
- Later variants have extended this by either appropriating undefined
- areas of the header record, extending the header to multiple records,
- or by storing special entries that modify the interpretation of
- subsequent entries.
- <dl>
- <dt>'''gnutar'''</dt><dd>
- The
- [[ManPageLibarchive3]]
- library can read most GNU-format tar archives.
- It currently supports the most popular GNU extensions, including
- modern long filename and linkname support, as well as atime and ctime data.
- The libarchive library does not support multi-volume
- archives, nor the old GNU long filename format.
- It can read GNU sparse file entries, including the new POSIX-based
- formats.
- The
- [[ManPageLibarchive3]]
- library can write GNU tar format, including long filename
- and linkname support, as well as atime and ctime data.
- </dd><dt>'''pax'''</dt><dd>
- The
- [[ManPageLibarchive3]]
- library can read and write POSIX-compliant pax interchange format
- archives.
- Pax interchange format archives are an extension of the older ustar
- format that adds a separate entry with additional attributes stored
- as key/value pairs immediately before each regular entry.
- The presence of these additional entries is the only difference between
- pax interchange format and the older ustar format.
- The extended attributes are of unlimited length and are stored
- as UTF-8 Unicode strings.
- Keywords defined in the standard are in all lowercase; vendors are allowed
- to define custom keys by preceding them with the vendor name in all uppercase.
- When writing pax archives, libarchive uses many of the SCHILY keys
- defined by Joerg Schilling's
- "star"
- archiver and a few LIBARCHIVE keys.
- The libarchive library can read most of the SCHILY keys
- and most of the GNU keys introduced by GNU tar.
- It silently ignores any keywords that it does not understand.
- The pax interchange format converts filenames to Unicode
- and stores them using the UTF-8 encoding.
- Prior to libarchive 3.0, libarchive erroneously assumed
- that the system wide-character routines natively supported
- Unicode.
- This caused it to mis-handle non-ASCII filenames on systems
- that did not satisfy this assumption.
- </dd><dt>'''restricted''' pax</dt><dd>
- The libarchive library can also write pax archives in which it
- attempts to suppress the extended attributes entry whenever
- possible.
- The result will be identical to a ustar archive unless the
- extended attributes entry is required to store a long file
- name, long linkname, extended ACL, file flags, or if any of the standard
- ustar data (user name, group name, UID, GID, etc) cannot be fully
- represented in the ustar header.
- In all cases, the result can be dearchived by any program that
- can read POSIX-compliant pax interchange format archives.
- Programs that correctly read ustar format (see below) will also be
- able to read this format; any extended attributes will be extracted as
- separate files stored in
- ''PaxHeader''
- directories.
- </dd><dt>'''ustar'''</dt><dd>
- The libarchive library can both read and write this format.
- This format has the following limitations:
- <ul>
- <li>
- Device major and minor numbers are limited to 21 bits.
- Nodes with larger numbers will not be added to the archive.
- </li><li>
- Path names in the archive are limited to 255 bytes.
- (Shorter if there is no / character in exactly the right place.)
- </li><li>
- Symbolic links and hard links are stored in the archive with
- the name of the referenced file.
- This name is limited to 100 bytes.
- </li><li>
- Extended attributes, file flags, and other extended
- security information cannot be stored.
- </li><li>
- Archive entries are limited to 8 gigabytes in size.
- </li></ul>
- Note that the pax interchange format has none of these restrictions.
- The ustar format is old and widely supported.
- It is recommended when compatibility is the primary concern.
- </dd><dt>'''v7'''</dt><dd>
- The libarchive library can read and write the legacy v7 tar format.
- This format has the following limitations:
- <ul>
- <li>
- Only regular files, directories, and symbolic links can be archived.
- Block and character device nodes, FIFOs, and sockets cannot be archived.
- </li><li>
- Path names in the archive are limited to 100 bytes.
- </li><li>
- Symbolic links and hard links are stored in the archive with
- the name of the referenced file.
- This name is limited to 100 bytes.
- </li><li>
- User and group information are stored as numeric IDs; there
- is no provision for storing user or group names.
- </li><li>
- Extended attributes, file flags, and other extended
- security information cannot be stored.
- </li><li>
- Archive entries are limited to 8 gigabytes in size.
- </li></ul>
- Generally, users should prefer the ustar format for portability
- as the v7 tar format is both less useful and less portable.
- </dd></dl>
- The libarchive library also reads a variety of commonly-used extensions to
- the basic tar format.
- These extensions are recognized automatically whenever they appear.
- <dl>
- <dt>Numeric extensions.</dt><dd>
- The POSIX standards require fixed-length numeric fields to be written with
- some character position reserved for terminators.
- Libarchive allows these fields to be written without terminator characters.
- This extends the allowable range; in particular, ustar archives with this
- extension can support entries up to 64 gigabytes in size.
- Libarchive also recognizes base-256 values in most numeric fields.
- This essentially removes all limitations on file size, modification time,
- and device numbers.
- </dd><dt>Solaris extensions</dt><dd>
- Libarchive recognizes ACL and extended attribute records written
- by Solaris tar.
- </dd></dl>
- The first tar program appeared in Seventh Edition Unix in 1979.
- The first official standard for the tar file format was the
- "ustar"
- (Unix Standard Tar) format defined by POSIX in 1988.
- POSIX.1-2001 extended the ustar format to create the
- "pax interchange"
- format.
- === Cpio Formats===
- The libarchive library can read and write a number of common cpio
- variants. A cpio archive stores each entry as a fixed-size header
- followed by a variable-length filename and variable-length data.
- Unlike the tar format, the cpio format does only minimal padding of
- the header or file data. There are several cpio variants, which
- differ primarily in how they store the initial header: some store the
- values as octal or hexadecimal numbers in ASCII, others as binary
- values of varying byte order and length.
- <dl>
- <dt>'''binary'''</dt><dd>
- The libarchive library transparently reads both big-endian and
- little-endian variants of the the two binary cpio formats; the
- original one from PWB/UNIX, and the later, more widely used, variant.
- This format used 32-bit binary values for file size and mtime, and
- 16-bit binary values for the other fields. The formats support only
- the file types present in UNIX at the time of their creation. File
- sizes are limited to 24 bits in the PWB format, because of the limits
- of the file system, and to 31 bits in the newer binary format, where
- signed 32 bit longs were used.
- </dd><dt>'''odc'''</dt><dd>
- This is the POSIX standardized format, which is officially known as the
- "cpio interchange format"
- or the
- "octet-oriented cpio archive format"
- and sometimes unofficially referred to as the
- "old character format".
- This format stores the header contents as octal values in ASCII.
- It is standard, portable, and immune from byte-order confusion.
- File sizes and mtime are limited to 33 bits (8GB file size),
- other fields are limited to 18 bits.
- </dd><dt>'''SVR4/newc'''</dt><dd>
- The libarchive library can read both CRC and non-CRC variants of
- this format.
- The SVR4 format uses eight-digit hexadecimal values for
- all header fields.
- This limits file size to 4GB, and also limits the mtime and
- other fields to 32 bits.
- The SVR4 format can optionally include a CRC of the file
- contents, although libarchive does not currently verify this CRC.
- </dd></dl>
- Cpio first appeared in PWB/UNIX 1.0, which was released within
- AT&T in 1977.
- PWB/UNIX 1.0 formed the basis of System III Unix, released outside
- of AT&T in 1981.
- This makes cpio older than tar, although cpio was not included
- in Version 7 AT&T Unix.
- As a result, the tar command became much better known in universities
- and research groups that used Version 7.
- The combination of the
- '''find'''
- and
- '''cpio'''
- utilities provided very precise control over file selection.
- Unfortunately, the format has many limitations that make it unsuitable
- for widespread use.
- Only the POSIX format permits files over 4GB, and its 18-bit
- limit for most other fields makes it unsuitable for modern systems.
- In addition, cpio formats only store numeric UID/GID values (not
- usernames and group names), which can make it very difficult to correctly
- transfer archives across systems with dissimilar user numbering.
- === Shar Formats===
- A
- "shell archive"
- is a shell script that, when executed on a POSIX-compliant
- system, will recreate a collection of file system objects.
- The libarchive library can write two different kinds of shar archives:
- <dl>
- <dt>'''shar'''</dt><dd>
- The traditional shar format uses a limited set of POSIX
- commands, including
- [[echo(1)|http://www.freebsd.org/cgi/man.cgi?query=echo&sektion=1]],
- [[mkdir(1)|http://www.freebsd.org/cgi/man.cgi?query=mkdir&sektion=1]],
- and
- [[sed(1)|http://www.freebsd.org/cgi/man.cgi?query=sed&sektion=1]].
- It is suitable for portably archiving small collections of plain text files.
- However, it is not generally well-suited for large archives
- (many implementations of
- [[sh(1)|http://www.freebsd.org/cgi/man.cgi?query=sh&sektion=1]]
- have limits on the size of a script) nor should it be used with non-text files.
- </dd><dt>'''shardump'''</dt><dd>
- This format is similar to shar but encodes files using
- [[uuencode(1)|http://www.freebsd.org/cgi/man.cgi?query=uuencode&sektion=1]]
- so that the result will be a plain text file regardless of the file contents.
- It also includes additional shell commands that attempt to reproduce as
- many file attributes as possible, including owner, mode, and flags.
- The additional commands used to restore file attributes make
- shardump archives less portable than plain shar archives.
- </dd></dl>
- === ISO9660 format===
- Libarchive can read and extract from files containing ISO9660-compliant
- CDROM images.
- In many cases, this can remove the need to burn a physical CDROM
- just in order to read the files contained in an ISO9660 image.
- It also avoids security and complexity issues that come with
- virtual mounts and loopback devices.
- Libarchive supports the most common Rockridge extensions and has partial
- support for Joliet extensions.
- If both extensions are present, the Joliet extensions will be
- used and the Rockridge extensions will be ignored.
- In particular, this can create problems with hardlinks and symlinks,
- which are supported by Rockridge but not by Joliet.
- Libarchive reads ISO9660 images using a streaming strategy.
- This allows it to read compressed images directly
- (decompressing on the fly) and allows it to read images
- directly from network sockets, pipes, and other non-seekable
- data sources.
- This strategy works well for optimized ISO9660 images created
- by many popular programs.
- Such programs collect all directory information at the beginning
- of the ISO9660 image so it can be read from a physical disk
- with a minimum of seeking.
- However, not all ISO9660 images can be read in this fashion.
- Libarchive can also write ISO9660 images.
- Such images are fully optimized with the directory information
- preceding all file data.
- This is done by storing all file data to a temporary file
- while collecting directory information in memory.
- When the image is finished, libarchive writes out the
- directory structure followed by the file data.
- The location used for the temporary file can be changed
- by the usual environment variables.
- === Zip format===
- Libarchive can read and write zip format archives that have
- uncompressed entries and entries compressed with the
- "deflate"
- algorithm.
- Other zip compression algorithms are not supported.
- It can extract jar archives, archives that use Zip64 extensions and
- self-extracting zip archives.
- Libarchive can use either of two different strategies for
- reading Zip archives:
- a streaming strategy which is fast and can handle extremely
- large archives, and a seeking strategy which can correctly
- process self-extracting Zip archives and archives with
- deleted members or other in-place modifications.
- The streaming reader processes Zip archives as they are read.
- It can read archives of arbitrary size from tape or
- network sockets, and can decode Zip archives that have
- been separately compressed or encoded.
- However, self-extracting Zip archives and archives with
- certain types of modifications cannot be correctly
- handled.
- Such archives require that the reader first process the
- Central Directory, which is ordinarily located
- at the end of a Zip archive and is thus inaccessible
- to the streaming reader.
- If the program using libarchive has enabled seek support, then
- libarchive will use this to processes the central directory first.
- In particular, the seeking reader must be used to
- correctly handle self-extracting archives.
- Such archives consist of a program followed by a regular
- Zip archive.
- The streaming reader cannot parse the initial program
- portion, but the seeking reader starts by reading the
- Central Directory from the end of the archive.
- Similarly, Zip archives that have been modified in-place
- can have deleted entries or other garbage data that
- can only be accurately detected by first reading the
- Central Directory.
- === Archive (library) file format===
- The Unix archive format (commonly created by the
- [[ar(1)|http://www.freebsd.org/cgi/man.cgi?query=ar&sektion=1]]
- archiver) is a general-purpose format which is
- used almost exclusively for object files to be
- read by the link editor
- [[ld(1)|http://www.freebsd.org/cgi/man.cgi?query=ld&sektion=1]].
- The ar format has never been standardised.
- There are two common variants:
- the GNU format derived from SVR4,
- and the BSD format, which first appeared in 4.4BSD.
- The two differ primarily in their handling of filenames
- longer than 15 characters:
- the GNU/SVR4 variant writes a filename table at the beginning of the archive;
- the BSD format stores each long filename in an extension
- area adjacent to the entry.
- Libarchive can read both extensions,
- including archives that may include both types of long filenames.
- Programs using libarchive can write GNU/SVR4 format
- if they provide an entry called
- ''//''
- containing a filename table to be written into the archive
- before any of the entries.
- Any entries whose names are not in the filename table
- will be written using BSD-style long filenames.
- This can cause problems for programs such as
- GNU ld that do not support the BSD-style long filenames.
- === mtree===
- Libarchive can read and write files in
- [[ManPageMtree5]]
- format.
- This format is not a true archive format, but rather a textual description
- of a file hierarchy in which each line specifies the name of a file and
- provides specific metadata about that file.
- Libarchive can read all of the keywords supported by both
- the NetBSD and FreeBSD versions of
- [[mtree(8)|http://www.freebsd.org/cgi/man.cgi?query=mtree&sektion=8]],
- although many of the keywords cannot currently be stored in an
- '''archive_entry'''
- object.
- When writing, libarchive supports use of the
- [[ManPageArchiveWriteSetOptions3]]
- interface to specify which keywords should be included in the
- output.
- If libarchive was compiled with access to suitable
- cryptographic libraries (such as the OpenSSL libraries),
- it can compute hash entries such as
- '''sha512'''
- or
- '''md5'''
- from file data being written to the mtree writer.
- When reading an mtree file, libarchive will locate the corresponding
- files on disk using the
- '''contents'''
- keyword if present or the regular filename.
- If it can locate and open the file on disk, it will use that
- to fill in any metadata that is missing from the mtree file
- and will read the file contents and return those to the program
- using libarchive.
- If it cannot locate and open the file on disk, libarchive
- will return an error for any attempt to read the entry
- body.
- === 7-Zip===
- Libarchive can read and write 7-Zip format archives.
- TODO: Need more information
- === CAB===
- Libarchive can read Microsoft Cabinet (
- "CAB )"
- format archives.
- TODO: Need more information.
- === LHA===
- TODO: Information about libarchive's LHA support
- === RAR===
- Libarchive has limited support for reading RAR format archives.
- Currently, libarchive can read RARv3 format archives
- which have been either created uncompressed, or compressed using
- any of the compression methods supported by the RARv3 format.
- Libarchive can also read self-extracting RAR archives.
- === Warc===
- Libarchive can read and write
- "web archives".
- TODO: Need more information
- === XAR===
- Libarchive can read and write the XAR format used by many Apple tools.
- TODO: Need more information
- == SEE ALSO ==
- [[ar(1)|http://www.freebsd.org/cgi/man.cgi?query=ar&sektion=1]],
- [[ManPageBsdcpio1]],
- [[mkisofs(1)|http://www.freebsd.org/cgi/man.cgi?query=mkisofs&sektion=1]],
- [[shar(1)|http://www.freebsd.org/cgi/man.cgi?query=shar&sektion=1]],
- [[ManPageBsdtar1]],
- [[zip(1)|http://www.freebsd.org/cgi/man.cgi?query=zip&sektion=1]],
- [[zlib(3)|http://www.freebsd.org/cgi/man.cgi?query=zlib&sektion=3]],
- [[ManPageCpio5]],
- [[ManPageMtree5]],
- [[ManPageTar5]]
|