Operating System Learning Notes (6) Implementation of File Management System

Thinking Before Reading

  • What method can be used to find a file in the directory?
  • What is the difference between the logical and physical structure of a file? Are there any constraints between the logical and physical structure of an individual file?

The above file system basics introduced the logical structure of directories and files, now let’s introduce their physical structure.

Hierarchy of file systems

Modern operating systems have multiple file system types, such as FAT32, NTFS, ext2, ext3, ext4, etc., so the hierarchy of file systems is also different. The following is a more general hierarchy.

Pasting blocks outside Docs is not supported.

  1. User call interface

The file system provides users with calls related to files and directories, such as creating, opening, reading and writing, closing, deleting files, creating, deleting directories, etc. This layer consists of several modules, each module corresponds to a system call, and when the user issues a system call, control is transferred to the corresponding module.

  1. File directory system

The main function of the file directory system is to manage the file directory, and its tasks are:

  • Manage the active file directory table.
  • Manage read and write status information tables.
  • Manage user processes to open tables.
  • Manage and organize the directory structure of files on storage devices.
  • invoke the next level access control module.
  1. Access control verification module

The realization of file protection is mainly completed by this level of software, which compares the user’s access requirements with the access control permissions indicated in the FCB to confirm the legitimacy of the access.

  1. Logical file system and file information buffer

According to the logical structure of the file, the logical record to be read and written by the user is converted into the corresponding block number in the logical structure of the file.

  1. Physical file system

Convert the relative block number of the logical record into the actual physical address.

  1. Auxiliary distribution module

The main function is to manage auxiliary storage space, that is, to allocate free auxiliary storage space and recycle auxiliary storage space

You should go back and contact the memory allocation here.

  1. Facility Management Program Module

Allocating devices, allocating device read and write buffers, disk scheduling, starting devices, processing device interrupts, releasing device read and write buffers, releasing devices, etc.

Implementation of directory

The basic methods of directory implementation are linear list and hash table. It should be noted that the implementation of directory is to search, so linear list corresponds to linear search, and hash table corresponds to hash search.

  1. Linear list

The simplest way to implement a directory is to use a linear table that stores file names and data block pointers.

When creating a new file, you must search the directory for a file with the same name, then add a directory entry to the directory table, delete the file to search the directory according to the given file name, and then free up its space.

There are many ways to reuse directory entries.

  • Catalog entries can be marked as no longer in use.
  • Add it to the free directory entry table.
  • You can copy the last directory entry in the directory table to a free location, reducing the length of the directory table.

The idea of the free directory table here is a bit like the page allocation strategy of memory (fixed allocation local replacement, variable allocation global replacement, variable allocation local replacement).

The deletion time can be reduced by using the linked list structure, and the advantage is that it is simple to implement, but it is time-consuming due to the particularity of the linear table.

  1. Hash table

The biggest difficulty is the fixed length of the hash table and the dependence of the hash function on the table length.

Directory query is done through repeated search on the disk, which requires continuous I/O operations and is expensive.

Therefore, in order to reduce I/O operations, copy the currently used file directory to memory.

Document implementation

The implementation of files studies the physical structure of files, that is, how file data is distributed and organized on physical storage devices.

There are two aspects to this problem:

  • File allocation method, talking about the management of non-free disk blocks.
  • File storage space management, management of free disk blocks.

Distribution of files

Continuous distribution

The continuous allocation method requires each file to occupy a set of contiguous blocks on the disk. As shown in the figure below, the disk address defines a linear ordering on the disk. This ordering minimizes the number of seek and seek times required for jobs to access the disk.

Consecutive allocation of files can be defined by the disk address of the first block and the number of consecutive blocks. The directory entry of a file contains the address of the starting block and the length of the area allocated to the file.

img

Continuous allocation supports ** sequential access and direct access **, which has the advantages of simple implementation and fast access speed. The disadvantage is that the length should not be dynamically increased, because the disk block at the end of a file may have been allocated to other files. Once it needs to be increased, it is necessary to find a larger contiguous memory and then copy the data. Therefore, this method is only suitable for files of fixed length.

Link allocation takes the form of discrete allocation, eliminates external fragmentation, significantly improves disk utilization, and files can easily grow dynamically without prior guidance on file size.

Link allocation can be divided into two forms: implicit links and explicit links.

  1. Implicit links

Implicit link as shown, each file corresponds to a linked list of disk blocks, disk blocks distributed anywhere on the disk, excluding the last disk block, each disk block has a pointer to the next disk block.

img

When creating a new file, an entry is added to the directory. Each directory entry has a pointer to the first block of the file. The pointer is initialized to NULL for an empty file, and the size field is 0. Writing to the file will find the space block through the free space management system, link the block to the end of the file, and then write.

The disadvantage of implicit links is that disk blocks cannot be accessed directly, files can only be accessed sequentially through pointers, and disk block pointers will consume a certain amount of space, and their stability is also a problem. Once a pointer is lost, file data will be lost.

  1. Explicit linking

The pointers used to link each physical block of the file are extracted from the end of each physical block and explicitly stored in a linked list in memory. This table is set only once in the entire disk (that is, not every file has its own table), which is called the File Allocation Table (FAT). Each table entry stores the next block link pointer of the corresponding block, that is, the next disk block number. The first disk block number of the file is recorded in the directory, and the subsequent disk block numbers are found by checking FAT.

img

Note that the disk block number in the above figure is just an implicit sequence number and does not occupy space, just like the index of an array.

FAT entries correspond to all disk blocks one-to-one. Use a special number -1 to represent the end of the file, and -2 to represent free blocks.

FAT is read into memory at system start-up.

Index allocation

Link allocation solves the problem of continuous allocation of external fragments and file size management, but link allocation does not effectively support direct access (except FAT). Index allocation solves this problem by combining all disk block numbers of each file to form an index block.

img

Each file has its own index block. This is an array of disk block addresses. The i-th entry of the index block points to the i-th block of the file. The directory entry includes the index block address. To read the i-th block, find and read the required block through the pointer to the i-th entry of the index block.

When creating a file, all pointers of the index block are set to null. When writing the i-th block for the first time, take a block from the free block space and write its address to the i-th entry of the index block.

The size of the index block is an important issue. Every file must have an index block, so the index block should be as small as possible. However, if the index block is too small, it cannot support large files. We have several solutions to solve this problem:

  • Linking scheme. An index block is usually a disk block, so it can be read and written directly. In order to handle large files, multiple disk blocks can be linked together.
  • Multi-layer indexing. Multi-layer indexing makes the first layer index block point to the second layer index block, and the second layer index block points to the file block. This method can continue to the third or fourth layer according to the requirements of the maximum file size.
  • Hybrid index. Allocation method that combines multiple index methods. For example, the system uses both direct address, click index allocation method or secondary index allocation method.
Accessing the nth recordAdvantagesDisadvantages
Continuous allocationNeed to access the disk onceFast speed when accessing sequentially, and random access according to the file starting address and record length when the file is fixed length.File storage requires continuous storage space, which will generate fragmentation, which is not conducive to dynamic expansion of files.
Link allocationNeed to access disk n timesCan solve the problem of external memory fragmentation, improve the utilization of external memory space, and dynamic growth is more convenient.Can only be accessed in the order of the pointer chain of the file, the search efficiency is low, and the storage of pointer information consumes extra space.
Index allocationm level requires access to disk m + 1 timescan be accessed randomly, files are easy to add and deleteindex table increases the cost of storage space, and the search strategy of index table has a great impact on file system efficiency.

All three schemes require the address of the index block stored in the directory entry or in the index node (i-node).

File storage space management

Partitioning and initialization of file storage space

Generally speaking, a file is stored in a file volume. A file volume can be part of a physical disk or the entire physical disk. A file volume that supports oversized files can also be composed of multiple physical disks.

img

In a file volume, the space for file data information (file area) and the space for storing file control information FCB (directory area) are separated. Due to the existence of many types of file representation and storage formats, modern operating systems generally have many different file management modules through which files in different formats of logical volumes can be accessed.

Logical volume before providing file services, must be initialized by the corresponding file program, a good directory area and the file area, the establishment of a free space management table and a logical volume information storage superblock.

File storage space management

File storage devices are divided into many physical blocks of the same size and exchange information in blocks. Therefore, the management of file storage devices is essentially the organization and management of free blocks. It contains the organization, allocation and recycling of free blocks.

This physical block division idea is somewhat like the division of pages in memory, called pages in processes, and page boxes in memory.

Idle table method

The free table method is a continuous allocation method, which is similar to the dynamic allocation scheme of memory, allocating a contiguous storage space for each file.

The system establishes a free disk block table for all free areas on the external memory. Each free area corresponds to a free table entry, including the entry number, the first block number of the free area, the number of disk blocks in the free area, etc. Then arrange all free areas in the order of increasing their starting disk block numbers.

First of all, here belongs to the continuous allocation, refers to the organization of the free extent is continuous manner.

Secondly, after looking at this method, there will be a familiar feeling, much like the page allocation of memory (the strategy has the best adaptation algorithm, etc.).

Idle linked list method

Pull all free extents into a free chain, depending on the basic elements used to form the chain, the linked list can be divided into two forms, free disk block chain and free extent chain.

The free disk block chain pulls all the free space on the disk into a chain in units of disk blocks. When a user requests to allocate space due to creating a file, the system starts at the beginning of the chain and removes the appropriate number of free disk blocks in turn to allocate to the user. When the user frees up storage space due to deleting files, the system inserts the recovered disk blocks into the end of the free disk block chain in turn. The advantage of this method is that the process of allocating and recycling a disk block is very simple, but the operation may be repeated multiple times when allocating disk blocks for a file.

The free extent chain is a chain of all free extents on a disk (each extent may contain multiple blocks). In addition to containing a pointer to indicate the next free extent, there should also be information indicating the size of the extent on each extent. The method of allocating extents is similar to dynamic partitioning of memory. The first adaptation algorithm is usually used. When reclaiming extents, the reclaimed area should also be merged with adjacent free extents.

Free table method, the free sector chain is similar, similar to the dynamic partition of memory.

Potential diagram method

Use one bit in binary to represent the usage of a disk block in the disk, and all disk blocks on the disk have a binary corresponding to it. When its value is 0, it means that the disk block is idle, and when it is 1, it means that it has been allocated.

Group linked list method

Neither the free list method nor the free linked list method is suitable for large file systems, because this will make the free list or free linked list too large.

In UNIX, the group linked list method is used, which combines two methods of free list and free linked list to overcome the disadvantage of too large table

The general idea is: the sequence of n free sector address stored in the first free sector, followed by a free sector address is stored in another sequence of free sectors, and so on until all free sectors are linked.

Assuming that the disk is initially all free sectors, which are linked into groups as shown, in this way a large number of free block addresses can be quickly found.

img

This group linked list method is not exactly the same as the idea of multi-level.

The “bit vector” table representing the free space of the file storage or the first group chained block, as well as the directory area in the volume. The file area division information needs to be stored in the secondary storage, usually in the volume header position, which is called a superblock in UNIX.

Before operating on files in the volume, the superblock needs to be read into the system’s free main memory in advance, and the consistency between the main memory superblock and the secondary memory superblock in the volume is often maintained.

Answer the opening question

  1. What is the way to find a file in the directory?

You can use linear list method or hash table method. Linear list organizes file names into a linear table, which is compared with each table entry in the linear table in turn when searching.

If the file names are sorted in order, the dichotomy can reduce the average search time, but creating new files will increase the cost of maintaining the linear table.

The hash table uses the file name to obtain a pointer to the file through the hash function. This method is very fast, but be aware of hash conflicts.

  1. What is the difference between the logical structure and the physical structure of a file? Is there some kind of restrictive relationship between the logical structure and the physical structure of a single file?

The logical structure of the file is the visible structure of the user, that is, the structure of the file used by the user.

The physical structure of the file is the organizational structure of the file on the memory. It indicates the placement, linking, and cataloging method of a file on the auxiliary memory. It is closely related to the access method of the file and the characteristics of the auxiliary memory device.

Although there is no obvious constraint or correlation between the logical structure and the physical structure of a single file, if the physical structure is chosen inadvertently, it is difficult to reflect the characteristics of the logical structure. For example, a logical structure is a sequential structure, but the physical structure is an implicitly linked structure file. Even if it can be found quickly in theory, it still needs to be found piece by piece on the disk block.

Note again that logical structure refers to the management between file records, such as sorting by keyword order.