What is Mac OS X?
Mac OS X Filesystems
Like most modern day operating system implementations, Mac OS X uses an object-oriented vnode layer. xnu's VFS layer is based on FreeBSD's, although there are numerous minor differences (for example, while FreeBSD uses mutexes, xnu uses simple locks; XNU's unified buffer cache is integrated with Mach's virtual memory layer, and so on).
Local Filesystems
HFS
HFS (Hierarchical File System) was the primary filesystem format used on the Macintosh Plus and later models, until Mac OS 8.1, when HFS was replaced by HFS Plus.
This section briefly describes the various filesystems supported by "stock" Mac OS X.
HFS+
HFS+ is the preferred filesystem on Mac OS X. It supports journaling, quotas, byte-range locking, Finder information in metadata, multiple encodings, hard and symbolic links, aliases, support for hiding file extensions on a per-file basis, etc. HFS+ uses B-Trees heavily for many of its internals.
Like most current journaling filesystems, HFS+ only journals meta-data. Journaling support was retrofitted into HFS+ via a simple VFS journaling layer in XNU that's actually filesystem independent. The journal files on an HFS+ volume are called .journal and .journal_info_block (type jrnl and creator code hfs+). HFS+, although not a cutting-edge filesystem, supports some unique features and has worked well for Apple.
Similar to HFS
HFS+ is architecturally similar to HFS, with several important improvements such as:
- 32 bits used for allocation blocks (instead of 16). HFS divides the disk space on a partition into equal-sized allocation-blocks. Since 16 bits are used to refer to an allocation-block, there can be at most 216 allocation blocks on an HFS filesystem. Thus, using 32 bits for identifying allocation blocks results in much less wasted space (and more files).
- Long file names up to 255 characters
- Unicode based file name encoding
- File/Directory attributes can be extended in future (as opposed to being fixed size)
- In addition to a System Folder ID (for starting Apple operating systems), a dedicated startup file that can easily be found (its location and size are stored in the volume header in a fixed location) during startup, is also supported so that non-Apple systems can boot from a HFS+ filesystem
- Largest file size is 263 bytes
Aliases
Aliases are similar to symbolic links in the sense that it allows multiple references to a file or directory. However, if you move the target (without replacing it), a symlink would break, while an alias would not. This is possible because under HFS+, each file/directory has a unique, persistent identity, which is stored along with the pathname. If one of the two (pathname or unique identity) is wrong (the file cannot be found using it), the alias updates it with the "right" one (using which the file could be found). This feature is the reason why you can keep moving applications to different places on your disk without having to worry about breaking their Dock "shortcuts".
In order to make use of aliases, an application must use either Carbon or Cocoa APIs, since this feature is not available through the POSIX API.
Optimizations
HFS+ also has a few specific optimizations. When a file is opened on an HFS+ volume, the following conditions are tested:
- The file is less than 20 MB in size
- The file is not already busy
- The file is not read only
- The file is fragmented (the eighth extent descriptor in its extend record has a non-zero block count)
- The system uptime is at least 3 minutes
If all the above are satisfied, the file is relocated (de-fragmented) - on-the-fly.
Another optimization is "Hot File Clustering". This is a multi-staged (the stages being DISABLED, IDLE, BUSY, RECORDING, EVALUATION, EVICTION and ADOPTION) clustering scheme that records "hot" files (except journal files, and ideally quota files) on a volume, and moves "hot" files to the "hot" space on the disk (0.5% of the total filesystem size located at the end of the default metadata zone - at the start of the volume). The scheme uses an on-disk B-Tree file for tracking (/.hotfiles.btree on a volume):
# ls -l /.hotfiles.btree
-rw------- 1 root admin 196608 17 Dec 10:09 /.hotfiles.btree
At most 5000 files, and only files less than 10 MB in size are "adopted" under this scheme.
Multiple Forks
HFS+/HFS files have two "forks" - traditionally called the data and resource forks, either of which may be empty. Historically, a resource fork has been used for various things such as custom icons, preferences, license information, etc. As expected, this is incompatible with traditional Unix filesystems, and care must be taken while moving files across filesystems. The article Command Line Archival in Mac OS X takes a brief look at the issue and some solutions. From a BSD command line, a file's resource fork can be accessed thus:
% ls -l Icon
128 -rwxrwx--- 1 amit amit 0 11 Jun 2003 Icon
# this means the data fork is empty
% ls -l Icon/rsrc
128 -rwxrwx--- 1 amit amit 65535 11 Jun 2003 Icon/rsrc
# the resource fork has is 65535 bytes
Thus, unlike aliases, multiple forks can be accesses via the POSIX API.
Device Driver Partitions
Although this is not related to HFS+, Mac OS X can load block device drivers from various places: the ROM, a USB or FireWire device and special partitions on a fixed disk. In order to support multiple operating systems or other features, a disk can have more than one device driver installed, each in its own partition. For example, viewing your disk drive's information in /Applications/Utilities/Disk Utility.app might tell you that Mac OS 9 drivers are installed on this disk. A "standard" partition scheme on the PowerBook might look like the following:
# pdisk /dev/rdisk0 -dump
/dev/rdisk0 map block size=512
#: type name length base ( size )
1: Apple_partition_map Apple 63 @ 1
2: Apple_Driver43*Macintosh 56 @ 64
3: Apple_Driver43*Macintosh 56 @ 120
4: Apple_Driver_ATA*Macintosh 56 @ 176
5: Apple_Driver_ATA*Macintosh 56 @ 232
6: Apple_FWDriver Macintosh 512 @ 288
7: Apple_Driver_IOKit Macintosh 512 @ 800
8: Apple_Patches Patch Partition 512 @ 1312
9: Apple_HFS Untitled 156299656 @ 1824 ( 74.5G)
0: Apple_Free 0+@ 156301480
In the above dump, the Apple_partition_map is a meta-data partition describing the partitions on the disk. The patch partition is a meta-data partition containing patches that must be applied to the system before it can boot. The various "Driver" partitions contain drivers (Apple_Driver43 contains SCSI Manager 4.3, for example).
Insensitive!
Note that HFS+ is a case preserving, case insensitive filesystem, which can be rather jarring in situations such as the following:
# tar -tf freebsd.tar
FreeBSD.txt
freebsd.txt
# The tar file contains two files
# tar -xvf freebsd.tar
FreeBSD.txt
freebsd.txt
# ls *.txt
freebsd.txt
The Apple Technical Note titled HFS Plus Volume Format describes HFS+ internals in great detail.
ISO9660
ISO9660 is a system-independent file system for read-only data CDs. Apple has its own set of ISO9660 extensions. Moreover, you would likely run into Mac HFS/ISO9660 hybrid discs that contain both a valid HFS and a valid ISO9660 filesystem. Both filesystems can be read on a Mac, while on "other" systems, you would typically read the ISO9660 data. Note that this doesn't mean there has to be redundant data on the disc: usually the data that needs to be accessed from both Macs and PCs is kept on the ISO9660 volume, and is aliased on the HFS volume.
MSDOS
Mac OS X includes support for MSDOS filesystem (FAT12, FAT16 and FAT32).
NTFS
Mac OS X includes read-only support for NTFS.
UDF
UDF (Universal Disk Format) is the filesystem used by DVD-ROM (including DVD-video and DVD-audio) discs, and by many CD-R/RW packet-writing programs. Note that at the time of this writing, Mac OS X "Panther" only supports UDF 1.5, and not UDF 2.0.
UFS
Darwin's implementation of UFS is similar to that on *BSD, as was NEXTSTEP's, but they are not really compatible. Currently, only NetBSD supports it. Apple's UFS is big endian (as was NeXT's) - even on x86 hardware. It includes the new Directory Allocation Algorithm for FFS (DirPref). The author of the algorithm offers more details, including some test results, on his site.
Network Filesystems
AFP
The Apple Filing Protocol (AFP) is an Apple proprietary protocol for file sharing over the network. A comparison of AFP and NFS is outside of the scope of this document, but there exists software that enables the two to co-exist (AFP shares can be made to look like NFS shares and vice-versa).
/usr/sbin/AppleFileServer is the AFP daemon, which is launched when you select the "Personal File Sharing" checkbox under System Preferences/Sharing.
FTP
The mount_ftp command mounts locally a directory on an FTP server. Note that this functionality is read-only currently. It works transparently through the Finder and the Web Browser.
mount_ftp ftp://user:password@hostname/directory/path node
NFS
Mac OS X includes NFS client and server support (version 3) from BSD, including the "NQ" (NFS with leases) extensions. The usual supporting daemons (rpc.lockd, rpc.statd, nfsiod, etc.) are present as well.
SMB/CIFS
Mac OS X "Panther" includes Samba 3.0 to support SMB/CIFS.
WebDAV
A WebDAV enabled directory located at a server specified by an appropriate URL can be mounted as a filesystem via the mount_webdav command. Since a .Mac account's iDisk is available through WebDAV, it can be mounted this way.
Note that Mac OS X has a (preliminary) system level event notification framework built around FreeBSD's kqueue/kevent. This can allow graceful mounts and unmounts of network volumes based on changes in network connectivity.
Other/Pseudo Filesystems
cddafs
The cdda filesystem is used to make the tracks of an audio CD appear as aiff files. Moreover, if the track names can be looked up successfully, the track "files" have corresponding names.
When you insert an audio CD in the drive, Mac OS X "mounts" it using cddafs by default, or you can manually do so as follows:
# mount_cddafs /dev/disk<N> /tmp/audiocd
deadfs
When the underlying filesystem is disassociated from a vnode (in the vclean() operation), its vnode operations vector is set to that of the dead filesystem. All operations in the dead filesystem fail, except for close().
deadfs essentially facilitates revocation (of access to the controlling terminal, to a forcibly unmounted filesystem, etc.) Consider a situation where you want a backgrounded job of a logged out user to finish what it's doing, and yet have no access to its (erstwhile) controlling terminal. This is achieved by detaching the terminal from the vnode and replacing it with deadfs.
devfs
devfs, the device filesystem, provides access to the kernel's device namespace in the global filesystem namespace. devfs is typically mounted on /dev and allows entries in there to be built automatically.
devfs, if enabled, is mounted from within the Mac OS X kernel during BSD initialization, although instances of it can be mounted later on using mount_devfs.
# mount -t devfs devfs /tmp/dev
fdesc
The fdesc filesystem is typically mounted on /dev/fd. It's functionality is similar to /proc/<pid>/fd (or simply /proc/self/fd) on Linux, that is, it provides a list of all active file descriptors for the currently running process. Note that a typical Linux system has /dev/fd symbolically linked to /proc/self/fd.
/etc/rc mounts the fdesc filesystem during system startup:
# mount -t fdesc -o union stdin /dev
fifofs
The purpose of fifofs is similar to specfs
loop*
Functionality similar to Linux's "loop" mounts (or "lofi" on Solaris) is available via the Finder (or simply on the Desktop) - simply double-clicking on a disk image file mounts its filesystem (if supported). The command line utility hdid can be used for a finer grained control of this functionality:
# hdid floppy.img
/dev/disk3
# hdid http://127.0.0.1/disk.img
/dev/disk4
The "disks" disk3 and disk4 can be accessed as regular disks. Note that if the disk image to be mounted using HTTP is a dual-fork file, then it is trickier to use it.
Moreover, hdid can be directed to use only a subset (a range of sectors) of a disk image. There is also support for encryption, and more importantly, shadowing, wherein a "shadow" file can be used to which all writes can be redirected. When a read occurs in such a case, blocks present in the shadow file have precedence over the ones in the image.
nullfs
The null mount filesystem is a stackable filesystem in 4.4BSD. It allows mounting of one part of the filesystem in a different location. This can be used to join together multiple directories into a new directory tree. Thus, filesystem hierarchies on various disks can be presented as one directory tree, subtrees of a writable filesystem can be made read-only, and so on.
Note that this is slightly different (less seamless) from a union mount (see below). While the latter essentially combines seamlessly the filesystems of the mount point and the mounted, nullfs simply intercepts VFS/vnode operations and passes them through (with the exception of vop_getattr(), vop_lock(), vop_unlock(), vop_inactive(), vop_reclaim(), and vop_print() to the original filesystem (of the mount point).
Note that the null filesystem layer also serves as a prototype filesystem, and new layers can be implemented by using the null layer as a template.
Finally, it should be noted that although nullfs is present in the bsd subtree of Darwin's kernel source, null mounts are not really used by Mac OS X.
ramfs
A ram filesystem can be created under Mac OS X as follows:
# hdid -nomount ram://1024
/dev/disk3
The above command creates a ram disk with 1024 sectors (sector size being 512), and prints the name of the resultant device on the standard output. Thereafter, a filesystem can be created on this device (the corresponding raw device, technically) as follows:
# newfs_msdos /dev/rdisk3
/dev/rdisk3: 985 sectors in 985 FAT12 clusters \
(512 bytes/cluster) \
bps=512 spc=1 res=1 nft=2 rde=512 sec=1024 mid=0xf0 \
spf=3 spt=32 hds=16 hid=0
The disk can be mounted as usual:
# mount -t msdos /dev/disk3 /tmp/msdos
# mount
/dev/disk3 on /private/tmp/msdos (local)
# df /tmp/msdos
Filesystem 512-blocks Used Avail Capacity Mounted on
/dev/disk3 987 2 985 0% /private/tmp/msdos
Finally, you can get rid of the ram disk as follows:
# hdiutil detach /dev/disk3
"disk3" unmounted.
"disk3" ejected.
specfs
Devices (the so called "special" files) and FIFOs can reside on any arbitrary filesystem (that can house such files). This means their names and attributes are maintained by this "host" filesystem. However, their operations cannot be handled by this filesystem - accesses to such device files need to be mapped to their underlying devices (more specifically, the respective device drivers). Moreover, device aliases (for example, same major/minor numbers, but different pathnames on a filesystem, or maybe even different filesystems) need to be detected and handled appropriately.
The specfs layer facilitates the above. Note that specfs is not a user visible filesystem, and it's not "mounted" anywhere.
synthfs
synthfs is a pseudo (in-memory) filesystem used to create arbitrary directory trees (if you wanted to "synthesize" mount points for random things, for example). synthfs is not derived from FreeBSD.
A synthfs mount is similar to a typical pseudo filesystem mount:
# mount -t synthfs synthfs /tmp/synthfs
union
A detailed description of 4.4BSD's "union" mounts, including a short history of similar filesystems, can be found in the USENIX paper titled Union Mounts in 4.4BSD-Lite. In the simplest terms, the union mount filesystem extends the null filesystem by not hiding the files in the "mounted on" directory. It merges the two directories (and their trees) into a single view. Note that duplicate names are suppressed and a lookup locates the logically topmost entity with that name.
Consider the following sequence of commands that illustrates the basic concepts of union mounts:
# hdiutil create /tmp/msdos1 -volname one \
-megabytes 1 -fs MS-DOS
...
created: /tmp/msdos1.dmg
# hdiutil create /tmp/msdos2 -volname two \
-megabytes 1 -fs MS-DOS
...
created: /tmp/msdos2.dmg
# hdid -nomount /tmp/msdos1.dmg
/dev/disk3
# hdid -nomount /tmp/msdos2.dmg
/dev/disk4
# mount -t msdos /dev/disk3 /tmp/union
# echo "msdos1: a" > /tmp/union/a.txt
# umount /dev/disk3
# mount -t msdos /dev/disk4 /tmp/union
# echo "msdos2: a" > /tmp/union/a.txt
# echo "msdos2: b" > /tmp/union/b.txt
# umount /dev/disk4
# mount -t msdos -o union /dev/disk3 /tmp/union
# mount -t msdos -o union /dev/disk4 /tmp/union
# ls /tmp/union
a.txt b.txt
# cat /tmp/union/a.txt
msdos2: a
# umount /dev/disk4
# ls /tmp/union
a.txt
# cat /tmp/union/a.txt
msdos1: a
# umount /dev/disk3
# mount -t msdos -o union /dev/disk4 /tmp/union
# mount -t msdos -o union /dev/disk3 /tmp/union
# cat /tmp/union/a.txt
msdos1: a
As a real-life example, /etc/rc mounts the "descriptor" filesystem as a union mount:
# mount -t fdesc -o union stdin /dev
volfs
volfs, the "volume" filesystem, is a virtual filesystem that exists over the HFS+ (or a filesystem that supports volfs) VFS and serves the needs of two differing APIs (POSIX/Unix pathnames and Mac OS <Volume ID><Directory><File Name>). It is there to support the Carbon File Manager APIs on top of the BSD filesystem.
The filesystems that support volfs are HFS+, HFS, ISO9660 and UDF.
Consider the following example:
# mount
/dev/disk0s9 on / (local, journaled)
devfs on /dev/ (local)
fdesc on /dev (union)
<volfs> on /.vol
...
# ls -l /.vol
total 0
dr--r--r-- 2 root wheel 64 25 Dec 12:45 234881033
The entry in /.vol is nothing but a representation of /. Each mounted volume (a partition, if you will), such as those on external storage devices, would have a representation under /.vol. Consider a file, say /mach_kernel:
# ls -li /mach_kernel
1045670 -rw-r--r-- 1 root wheel 3824080 11 Dec 16:20 /mach_kernel
This file would be accessed under /.volfs as follows (note that 1045670 is the file's inode number):
# ls -li /.vol/234881033/1045670
1045670 -rw-r--r-- 1 root wheel 3824080 11 Dec 16:20 /.vol/234881033/1045670
volfs is mounted during system startup (in /etc/rc):
# mkdir -p -m 0555 /.vol && chmod 0555 /.vol && mount_volfs /.vol
What about XYZ?
While Mac OS X supports many filesystems, you might run into some that are not supported. Linux's ext2/3 and Reiser, for example, are not supported, although you can find an open source implementation of ext2 for Mac OS X.
Interestingly, BootX, the Mac OS X bootloader, does understand the ext2 filesystem, and can load kernels from it.
An important (though not necessarily critical) omission is that of the proc filesystem. This issue, including description of a minimal port of /proc from FreeBSD to Mac OS X, is discussed in /proc on Mac OS X.