123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405 |
- .TH CPIO 5 "December 23, 2011" ""
- .SH NAME
- .ad l
- \fB\%cpio\fP
- \- format of cpio archive files
- .SH DESCRIPTION
- .ad l
- The
- \fB\%cpio\fP
- archive format collects any number of files, directories, and other
- file system objects (symbolic links, device nodes, etc.) into a single
- stream of bytes.
- .SS General Format
- Each file system object in a
- \fB\%cpio\fP
- archive comprises a header record with basic numeric metadata
- followed by the full pathname of the entry and the file data.
- The header record stores a series of integer values that generally
- follow the fields in
- \fIstruct\fP stat.
- (See
- \fBstat\fP(2)
- for details.)
- The variants differ primarily in how they store those integers
- (binary, octal, or hexadecimal).
- The header is followed by the pathname of the
- entry (the length of the pathname is stored in the header)
- and any file data.
- The end of the archive is indicated by a special record with
- the pathname
- ``TRAILER!!!''.
- .SS PWB format
- The PWB binary
- \fB\%cpio\fP
- format is the original format, when cpio was introduced as part of the
- Programmer's Work Bench system, a variant of 6th Edition UNIX. It
- stores numbers as 2-byte and 4-byte binary values.
- Each entry begins with a header in the following format:
- .PP
- .RS 4
- .nf
- struct header_pwb_cpio {
- short h_magic;
- short h_dev;
- short h_ino;
- short h_mode;
- short h_uid;
- short h_gid;
- short h_nlink;
- short h_majmin;
- long h_mtime;
- short h_namesize;
- long h_filesize;
- };
- .RE
- .PP
- The
- \fIshort\fP
- fields here are 16-bit integer values, while the
- \fIlong\fP
- fields are 32 bit integers. Since PWB UNIX, like the 6th Edition UNIX
- it was based on, only ran on PDP-11 computers, they
- are in PDP-endian format, which has little-endian shorts, and
- big-endian longs. That is, the long integer whose hexadecimal
- representation is 0x12345678 would be stored in four successive bytes
- as 0x34, 0x12, 0x78, 0x56.
- The fields are as follows:
- .RS 5
- .TP
- \fIh_magic\fP
- The integer value octal 070707.
- .TP
- \fIh_dev\fP, \fIh_ino\fP
- The device and inode numbers from the disk.
- These are used by programs that read
- \fB\%cpio\fP
- archives to determine when two entries refer to the same file.
- Programs that synthesize
- \fB\%cpio\fP
- archives should be careful to set these to distinct values for each entry.
- .TP
- \fIh_mode\fP
- The mode specifies both the regular permissions and the file type, and
- it also holds a couple of bits that are irrelevant to the cpio format,
- because the field is actually a raw copy of the mode field in the inode
- representing the file. These are the IALLOC flag, which shows that
- the inode entry is in use, and the ILARG flag, which shows that the
- file it represents is large enough to have indirect blocks pointers in
- the inode.
- The mode is decoded as follows:
- .PP
- .RS 5
- .TP
- 0100000
- IALLOC flag - irrelevant to cpio.
- .TP
- 0060000
- This masks the file type bits.
- .TP
- 0040000
- File type value for directories.
- .TP
- 0020000
- File type value for character special devices.
- .TP
- 0060000
- File type value for block special devices.
- .TP
- 0010000
- ILARG flag - irrelevant to cpio.
- .TP
- 0004000
- SUID bit.
- .TP
- 0002000
- SGID bit.
- .TP
- 0001000
- Sticky bit.
- .TP
- 0000777
- The lower 9 bits specify read/write/execute permissions
- for world, group, and user following standard POSIX conventions.
- .RE
- .TP
- \fIh_uid\fP, \fIh_gid\fP
- The numeric user id and group id of the owner.
- .TP
- \fIh_nlink\fP
- The number of links to this file.
- Directories always have a value of at least two here.
- Note that hardlinked files include file data with every copy in the archive.
- .TP
- \fIh_majmin\fP
- For block special and character special entries,
- this field contains the associated device number, with the major
- number in the high byte, and the minor number in the low byte.
- For all other entry types, it should be set to zero by writers
- and ignored by readers.
- .TP
- \fIh_mtime\fP
- Modification time of the file, indicated as the number
- of seconds since the start of the epoch,
- 00:00:00 UTC January 1, 1970.
- .TP
- \fIh_namesize\fP
- The number of bytes in the pathname that follows the header.
- This count includes the trailing NUL byte.
- .TP
- \fIh_filesize\fP
- The size of the file. Note that this archive format is limited to 16
- megabyte file sizes, because PWB UNIX, like 6th Edition, only used
- an unsigned 24 bit integer for the file size internally.
- .RE
- .PP
- The pathname immediately follows the fixed header.
- If
- \fBh_namesize\fP
- is odd, an additional NUL byte is added after the pathname.
- The file data is then appended, again with an additional NUL
- appended if needed to get the next header at an even offset.
- .PP
- Hardlinked files are not given special treatment;
- the full file contents are included with each copy of the
- file.
- .SS New Binary Format
- The new binary
- \fB\%cpio\fP
- format showed up when cpio was adopted into late 7th Edition UNIX.
- It is exactly like the PWB binary format, described above, except for
- three changes:
- .PP
- First, UNIX now ran on more than one hardware type, so the endianness
- of 16 bit integers must be determined by observing the magic number at
- the start of the header. The 32 bit integers are still always stored
- with the most significant word first, though, so each of those two, in
- the struct shown above, was stored as an array of two 16 bit integers,
- in the traditional order. Those 16 bit integers, like all the others
- in the struct, were accessed using a macro that byte swapped them if
- necessary.
- .PP
- Next, 7th Edition had more file types to store, and the IALLOC and ILARG
- flag bits were re-purposed to accommodate these. The revised use of the
- various bits is as follows:
- .PP
- .RS 5
- .TP
- 0170000
- This masks the file type bits.
- .TP
- 0140000
- File type value for sockets.
- .TP
- 0120000
- File type value for symbolic links.
- For symbolic links, the link body is stored as file data.
- .TP
- 0100000
- File type value for regular files.
- .TP
- 0060000
- File type value for block special devices.
- .TP
- 0040000
- File type value for directories.
- .TP
- 0020000
- File type value for character special devices.
- .TP
- 0010000
- File type value for named pipes or FIFOs.
- .TP
- 0004000
- SUID bit.
- .TP
- 0002000
- SGID bit.
- .TP
- 0001000
- Sticky bit.
- .TP
- 0000777
- The lower 9 bits specify read/write/execute permissions
- for world, group, and user following standard POSIX conventions.
- .RE
- .PP
- Finally, the file size field now represents a signed 32 bit integer in
- the underlying file system, so the maximum file size has increased to
- 2 gigabytes.
- .PP
- Note that there is no obvious way to tell which of the two binary
- formats an archive uses, other than to see which one makes more
- sense. The typical error scenario is that a PWB format archive
- unpacked as if it were in the new format will create named sockets
- instead of directories, and then fail to unpack files that should
- go in those directories. Running
- \fIbsdcpio\fP -itv
- on an unknown archive will make it obvious which it is: if it's
- PWB format, directories will be listed with an 's' instead of
- a 'd' as the first character of the mode string, and the larger
- files will have a '?' in that position.
- .SS Portable ASCII Format
- Version 2 of the Single UNIX Specification (``SUSv2'')
- standardized an ASCII variant that is portable across all
- platforms.
- It is commonly known as the
- ``old character''
- format or as the
- ``odc''
- format.
- It stores the same numeric fields as the old binary format, but
- represents them as 6-character or 11-character octal values.
- .PP
- .RS 4
- .nf
- struct cpio_odc_header {
- char c_magic[6];
- char c_dev[6];
- char c_ino[6];
- char c_mode[6];
- char c_uid[6];
- char c_gid[6];
- char c_nlink[6];
- char c_rdev[6];
- char c_mtime[11];
- char c_namesize[6];
- char c_filesize[11];
- };
- .RE
- .PP
- The fields are identical to those in the new binary format.
- The name and file body follow the fixed header.
- Unlike the binary formats, there is no additional padding
- after the pathname or file contents.
- If the files being archived are themselves entirely ASCII, then
- the resulting archive will be entirely ASCII, except for the
- NUL byte that terminates the name field.
- .SS New ASCII Format
- The "new" ASCII format uses 8-byte hexadecimal fields for
- all numbers and separates device numbers into separate fields
- for major and minor numbers.
- .PP
- .RS 4
- .nf
- struct cpio_newc_header {
- char c_magic[6];
- char c_ino[8];
- char c_mode[8];
- char c_uid[8];
- char c_gid[8];
- char c_nlink[8];
- char c_mtime[8];
- char c_filesize[8];
- char c_devmajor[8];
- char c_devminor[8];
- char c_rdevmajor[8];
- char c_rdevminor[8];
- char c_namesize[8];
- char c_check[8];
- };
- .RE
- .PP
- Except as specified below, the fields here match those specified
- for the new binary format above.
- .RS 5
- .TP
- \fImagic\fP
- The string
- ``070701''.
- .TP
- \fIcheck\fP
- This field is always set to zero by writers and ignored by readers.
- See the next section for more details.
- .RE
- .PP
- The pathname is followed by NUL bytes so that the total size
- of the fixed header plus pathname is a multiple of four.
- Likewise, the file data is padded to a multiple of four bytes.
- Note that this format supports only 4 gigabyte files (unlike the
- older ASCII format, which supports 8 gigabyte files).
- .PP
- In this format, hardlinked files are handled by setting the
- filesize to zero for each entry except the first one that
- appears in the archive.
- .SS New CRC Format
- The CRC format is identical to the new ASCII format described
- in the previous section except that the magic field is set
- to
- ``070702''
- and the
- \fIcheck\fP
- field is set to the sum of all bytes in the file data.
- This sum is computed treating all bytes as unsigned values
- and using unsigned arithmetic.
- Only the least-significant 32 bits of the sum are stored.
- .SS HP variants
- The
- \fB\%cpio\fP
- implementation distributed with HPUX used XXXX but stored
- device numbers differently XXX.
- .SS Other Extensions and Variants
- Sun Solaris uses additional file types to store extended file
- data, including ACLs and extended attributes, as special
- entries in cpio archives.
- .PP
- XXX Others? XXX
- .SH SEE ALSO
- .ad l
- \fBcpio\fP(1),
- \fBtar\fP(5)
- .SH STANDARDS
- .ad l
- The
- \fB\%cpio\fP
- utility is no longer a part of POSIX or the Single Unix Standard.
- It last appeared in
- Version 2 of the Single UNIX Specification (``SUSv2'').
- It has been supplanted in subsequent standards by
- \fBpax\fP(1).
- The portable ASCII format is currently part of the specification for the
- \fBpax\fP(1)
- utility.
- .SH HISTORY
- .ad l
- The original cpio utility was written by Dick Haight
- while working in AT&T's Unix Support Group.
- It appeared in 1977 as part of PWB/UNIX 1.0, the
- ``Programmer's Work Bench''
- derived from
- At v6
- that was used internally at AT&T.
- Both the new binary and old character formats were in use
- by 1980, according to the System III source released
- by SCO under their
- ``Ancient Unix''
- license.
- The character format was adopted as part of
- IEEE Std 1003.1-1988 (``POSIX.1'').
- XXX when did "newc" appear? Who invented it? When did HP come out with their variant? When did Sun introduce ACLs and extended attributes? XXX
- .SH BUGS
- .ad l
- The
- ``CRC''
- format is mis-named, as it uses a simple checksum and
- not a cyclic redundancy check.
- .PP
- The binary formats are limited to 16 bits for user id, group id,
- device, and inode numbers. They are limited to 16 megabyte and 2
- gigabyte file sizes for the older and newer variants, respectively.
- .PP
- The old ASCII format is limited to 18 bits for
- the user id, group id, device, and inode numbers.
- It is limited to 8 gigabyte file sizes.
- .PP
- The new ASCII format is limited to 4 gigabyte file sizes.
- .PP
- None of the cpio formats store user or group names,
- which are essential when moving files between systems with
- dissimilar user or group numbering.
- .PP
- Especially when writing older cpio variants, it may be necessary
- to map actual device/inode values to synthesized values that
- fit the available fields.
- With very large filesystems, this may be necessary even for
- the newer formats.
|