Windows File Systems

by Drew Robb

While Windows wrapped up control of the desktop market some years ago, it still has a ways to go before it achieves complete domination in the corporate server market. That's why the last few years have seen a rise in the number of server offerings from Microsoft as it moves toward higher-end computing. Windows 2000 represented a major improvement over NT in terms of enterprise stability and scalability. Windows 2000 saw the launch of Active Directory (AD), the Microsoft enterprise directory offering. The Windows Advanced Server Limited Edition offers 64-bit processing with up to 16TB of addressable memory. At the high end of the Microsoft operating system spectrum, Windows 2000 Datacenter Server and SQL Server 2000 support up to 32 processors. Thus Windows can now compete well in the midrange server marketplace if not at the highest end of the server market.

Along the way the company has also been upgrading its file systems to add additional features, reliability and security needed for corporate users. In this article we look at the various types of file systems now supported by Windows.

Magnetic Disk Systems
Windows supports two file systems for magnetic disks - FAT and NTFS. NTFS (short for NT File System) is the preferred system for Windows NT 4.0, 2000, XP and .NET server and workstation operating systems, but these O/Ses still support the older FAT (File Allocation Table) file system for use with floppy disks and with older versions of Windows in multiboot systems. To take full advantage of the security, disk quota and other features of the later software, however, you will need to be using NTFS.

FAT
FAT started being used with DOS and there are now three different versions of the File Allocation Table file system - FAT12, FAT16 and FAT 32. The numbers used in these versions designate the number of bits used to identify a cluster.

FAT12: The earliest version the file system, FAT12 allows a partition to contain up to 4096 (212)clusters. Since it supports clusters of one to sixteen sectors, the maximum partition size is 32MB. Windows 2000 uses FAT12 for floppy disks and for partitions of 16MB or smaller.

FAT16: FAT16 provides a sixteen-fold expansion in the number of clusters it identifies, supporting volumes containing 65,536 (216)clusters. It also expands the maximum cluster size to 128 sectors (64KB) and maximum volume size to 4GB.

FAT32: To address the need for even larger storage capacity, Microsoft introduced FAT32 with Windows 95 OSR2. Following the pattern of the earlier versions of the file system, it uses 32 bits to designate a cluster. However the last four bits are reserved. Each volume, therefore, can contain up to nearly 270 million (228)clusters which theoretically translates into a 8 Terabyte volume comprised of 32KB clusters. In practice, however, while Windows 2000 will manage larger volumes created in other operating systems, it limits the size of new FAT32 volumes to 32GB since the file system becomes quite inefficient beyond that size. If you do want to create a larger volume, you can dual boot the system into either Windows 95 or 98, establish the volume using that OS, and then manage the volume using Windows 2000.

The above types of FAT file system have key architectural elements in common. Several sections actually comprise a FAT volume, starting with the boot sector. Following this up are the file allocation table and, for protection, a duplicate file allocation table. Next comes the root directory and finally any other directories and files of the volume. FAT32 also stores a duplicate copy of the boot sector for added reliability.

The FAT boot sector starts with three bytes containing a jump instruction telling the CPU that the next several bytes are not executable. After that is an eight byte OEM ID which identifies the operating system that formatted the volume. Next comes the BIOS Parameter Block (BPB) which lists the parameters of the volume. These include such items as the number of bytes per sector, sectors per cluster, number of reserved sectors, number of file allocation table copies (generally two), number of possible entries in the root directory and number of sectors in the volume.

Each cluster in the volume has an entry in the file allocation table which indicates whether the cluster is bad, whether it contains a file and, if so, if that is the last cluster for that file. If the cluster contains a file which is continued in another cluster, the file allocation table entry will contain the number of that cluster. If the entire cluster is contained in a single cluster, or if that cluster is the final one for a file spanning multiple clusters, the entry will contain the value "0xfff" indicating that it is the last cluster.

In FAT12 and FAT16, the root directory or root folder contains 512 entries, each one being 32 bytes. FAT32 permits an unlimited number of entries in the root directory, and that directory can be located anywhere on a hard disk. Each entry begins with the file name in 8.3 format. Following this is the attribute byte which contains six binary flags indicating whether the file has been archived, if the entry is a directory or a file, if it is read-only, whether it is a hidden file, if it is a system file and if it is a volume label.

Next come eleven bytes telling when the file was created, last accessed and last modified. After that are two bytes identifying the location of the first cluster in the file (The locations of the rest of the clusters are stored in the file allocation.) The entry finishes a four byte number giving the size of the file in bytes, yielding a maximum file size of 4GB (232).

This root directory structure only natively supports file names using the 8.3 format. For long file names it uses a truncated version of the file name (e.g., FILENA~1.EXE) for the main entry and assigns additional lines in directory to list the long names using Unicode. Since FAT12 and FAT16 have a limit to the size of the root directory, this will reduce the total number of files or directories that can be listed in the root. This is not a problem with FAT32, however, since it doesn't have this directory-size limitation.

NTFS
While each of the FAT versions represents an incremental improvement over its predecessor, NTFS takes a completely different approach to the way that data is organized. It grew out of Microsoft's desire to increase its share of the corporate marketplace. FAT was a very simple system which worked for PCs, but it lacked the management and security features needed to compete with UNIX in a high-end, networked environment. The first attempt at a newer file system was the High Performance File System (HPFS) introduced with OS/2 Version 1.2. When the IBM/Microsoft partnership that created OS/2 fell apart, and Microsoft decided to create Windows NT, it incorporated some of the features of HPFS into its New Technology File System (NTFS).

NTFS is the native file system for Windows NT 4.0, 2000 and XP operating systems. While these O/Ses can operate with FAT, many of their features only work with NTFS, so NTFS should be used whenever possible. Fortunately, if you have one or more FAT partitions running on one of these operating systems, it is a simple matter to convert them to NTFS without losing any data.

NTFS, a log-based file system, addresses FAT's reliability and recoverability problems. A partition's clusters are numbered sequentially using a 64-bit logical cluster number (LCN). Theoretically this system would allow access to 16 exabytes (16 billion GB) which far exceeds current storage needs. For now, Windows 2000 limits volumes to 128 Terabytes, but later operating systems could take advantage of even larger storage capacities. Like FAT, it sets a default cluster size depending on the size of the partition, assigning a size of 4KB for anything over 2GB. Also like FAT, administrators can override the defaults and use drop-down box to specify sizes up to 64KB.

An NTFS partition is divided into four sectors: the Partition Boot sector, Master File Table, Filesystem Data and a backup copy of the Master File Table. The partition boot sector consists of two sections and occupies the first sixteen sectors. The first section holds the BIOS parameter block containing information on the layout of the volume and the structure of the file system similar to what is laid out above for FAT. The boot code to load Windows NT/2000/XP resides in the second section.

The next section is contains the Master File Table (MFT). When creating an NTFS partition, the system allocates a block, called the MFT Zone, containing 12.5 percent of the capacity of the volume. This size is the amount considered necessary to support a volume with an average file size of 8KB. If the volume will contain a large number of files in the 2 to 7KB range you can increase the size of the MFT Zone with the fsutil behavior set mftzone command. This command offers four options. Setting 1 is the same as the default, Setting 2 reserves 25 percent of the disk, Setting 3 reserves 37.5 percent and Setting 4, fifty percent. When you increase the size of the MFT Zone, NTFS doesn't immediately allocate additional space. But once the original space allocated to the MFT Zone fills up it will allocate additional space. Since this results in fragmenting the MFT and thereby inhibiting performance, it is best to set the desired size before creating the volume. Keep in mind also that resizing the MFT Zone setting will affect all the NTFS volumes on the computer. It can't be done for just a single volume.

The MFT consists of a series of 1KB records, one for each file in the partition. The first sixteen entries are reserved for the NTFS system files. Record 0 is the MFT itself. The next ten include a changes log file for system recovery, information about the volume, the index of the root folder and a bitmap showing cluster allocation information. The final five files are reserved for future use.

After the MFT, come the non-system files of the volume, followed by a backup copy of the MFT.

NTFS considers two types of attributes - resident and non-resident - that can be used to describe a record. Resident attributes are ones that fit within the MFT, while non-resident attributes are ones too large to fit in the MFT record.

Each resident MFT record contains four attributes. The first, Standard Information, contains the file attributes such as the archive bit, which shows whether or not the file has been backed up, and timestamps showing when the file was created, last modified and last accessed. The second contains the filenames. Each file can have multiple names, for example both a long and a short name, and both would be listed in this space. NTFS supports names up to 255 Unicode characters. The third, Security Descriptor, contains the Access Control List (ACL) data for the file. The final part, data, contains the data of the file itself. Small files, under 1K can therefore fit entirely within the MFT. This speeds up access since the system doesn't have to first read the MFT to find the location of the desired file and then go fetch the file from elsewhere on the disk. Instead, it is all done in one rapid action.

Most files, however, are far too extensive to fit into the MFT record. This can either be because the file itself is too large, or the ACL may be too big. In such a case, the data section of the record contains the locations in the main data portion of the volume where the file contents may be found. Each of these locations is defined by a virtual cluster number (VCN), logical cluster number (LCN) and number of clusters.

The VCN is a sequential number relating to each extent of consecutive clusters on the disk which contain the file, the LCN refers to the location of the first cluster of each extent. If, for example, the file was fragmented into four pieces, the MFT record would list four VCNs (1,2,3,4). If the first extent began on cluster 6097 and included clusters 6098 and 6099, the MFT would have VCN=1, LCN=6097, #c=3. The other extents would be similarly numbered. If the file becomes severely fragmented, additional records in the MFT will have to be used to list the additional extents.

Additional NTFS Features
In addition to supporting larger volumes than FAT, NTFS contains other features that make it better for corporate operations:

These and other features are not available with FAT.

Optical Disk Formats
In addition to the magnetic disk formats, Windows 2000 also supports two different optical file formats - CDFS and UDF.

CD-ROM File System (CDFS) is based on ISO 9660, a read-only file system standard written by an industry group called High Sierra. The group got that name from its initial meeting in 1985 at Del Webb's High Sierra Hotel and Casino at Lake Tahoe, Nevada, where company representatives began cooperation on developing an non-proprietary file system format for CD-ROM. The initial standard was called by the name of the group, High Sierra, but the International Organization for Standardization (ISO) requested that an international version of the standard be released and this was formalized as ISO 9660--Volume and File Structure of CD-ROM for Information Interchange. MacIntosh, DOS, Windows, UNIX and Linux all support the standard which calls for 2048-byte physical sectors.

In CDFS Level 1, like MS-DOS, follows the 8.3 file name format. Directories are limited to eight-character names and eight nested levels. Levels 2 and 3 permit longer file and directory names, up to thirty-two characters, and allow use of lower case letters.

Microsoft developed an extension to ISO 9660, called Joliet, which is supported in Windows 95/98/2000/Me/NT4.0/.NET. as well as Mac and Linux. Joliet allows the use of Unicode characters in filenames of up to 64 characters.

Universal Disk Format (UDF) is a later standard created in 1995 by the Optical Storage Technology Association and defined in ISO 13346. It permits 255 character names with a maximum path length of 1023 characters and can be used with CD-ROMs, CD-Rs, CD-RW and DVD-ROMs. While ISO 9660 calls for data to be written continuously to the disk, UDF utilizes packet writing. Windows 98 and later versions of Windows support the UDF format, although only for reading, not writing, files. To write to CDs or DVDs requires additional software.

Partitioning Strategies
With earlier file systems, partitioning was a means of avoiding the inherent size limits. If one is still using a version of FAT on a large disk, this can still pose a problem, and partitioning is a way of addressing that. With NTFS supporting huge multi-disk volume that is no longer a consideration. There are, however, other reasons to partition. These include:

Multiple OS Support: This can include supporting multiple versions of Windows, or adding a boot partition for Linux or another OS.

Separating the OS from the Data Files: Putting these two in separate partitions can make it easier to administer. By limiting access to the drive containing the OS to those with administrative privileges, while letting the users save and access the data files can eliminate the problems connected with users accessing files they shouldn't.

Optimizing Cluster Sizes: Having the right sized clusters for the type of files can improve overall performance. When you have some both large and small files residing on the same partition it is hard to find a correct size cluster that optimizes performance. A large database or graphics file, for example, could benefit from using 64KB clusters since that would mean fewer read/write operations and less fragmentation. A bunch of text files, on the other hand would do better with small clusters since they would waste most of the space in a large file. Putting these in separate partitions, with appropriate cluster sizes, would boost system performance for both types of operations.


About the Author
Drew Robb is a Los Angeles-based freelance writer specializing in technology issues. He has had over 100 articles published in the past two years, both under his own name and ghostwritten for corporate executives too busy to take care of the writing themselves. He is author of Server Disk Management in a Windows Environment.


Related Books

The ABCs of LDAP: How to Install, Run, and Administer LDAP Services is for network and systems administrators who want to begin using LDAP more extensively. It delivers the theoretical background needed to understand how these servers work, resulting in clear, concise examples of implementations in both commercial and OpenLDAP environments. Topics include major LDAP APIs, such as PHP, Perl, and Java, as well as distributed command line tools. The book covers ways to integrate LDAP into existing systems, and provides hands-on examples within working implementations.
Server Disk Management in a Windows Environment explains the basic elements of disks and disk architectures, and explores how to successfully manage and maintain functionality within a Windows environment. The author focuses on critical issues that are often ignored by other books on this subject, issues including disk quotas, fragmentation, optimization, hard drive reliability, asset management, software deployment, and system forensics.

Designing a Total Data Storage Solution defines and explains the components that make up the total cost of ownership along with the impact of integrating current technology changes. It considers what data is stored, how it is accessed, security, volume, and growth. Reviews current storage technology, including storage area networks (SAN) and network storage, optical disk, RAID, removable storage in its many forms, SCSI and Fibre Channel, and more. Learn how to develop an effective and efficient data storage plan that reduces the cost of storage.