Three Modes of CPU Work
The most important thing in hardware is the CPU, which is the core component of executing programs. The computer we commonly use is the x86 platform, so we need to have some basic understanding of the x86 CPU. According to the order of CPU function upgrade iteration, the working modes of the CPU include real mode, protected mode, and long mode. The way the CPU executes programs in these working modes is very different. Let’s discuss these working modes together.
Please think about it. If the following application code can run successfully, what will be the consequences?
1 | int main() |
The above code first turns off the CPU interrupt, makes the CPU stop responding to the interrupt signal, then enters an infinite loop, and finally starts writing 0 from the memory 0 address.
You will immediately think that this code only does two things: one is to lock the CPU, and the other is to empty the memory. You may think that if such code could run normally, it would be terrible.
However, if it was in real mode, such code could indeed run normally. Because a long time ago, there were too few computer resources and too little memory, and it was only a single program that was executed. Most of the programs were written and debugged by professionals before they could make an appointment to run on the computer. There is no concept of a modern operating system.
Later, there was a DOS operating system, which was also a single-channel program system and did not have the ability to execute multiple programs, so the CPU mode could also work well.
Real mode
Real mode is also called real address mode, real, that is, real, this real is divided into two aspects, one aspect is to run the real instruction, the action of the instruction does not distinguish, ** direct execution of the real function of the instruction **, on the other hand It is ** that the address sent to the memory is real **, and it is sent to the memory without restrictions on any address.
Real mode register
Because the CPU completes the corresponding function according to the instruction, for example: ADD AX, CX; this instruction completes the addition operation, AX, CX are the operands of the ADD instruction, which can be understood as two parameters of the ADD function, and its function is to add the data in AX and CX.
The operands of the instruction can be registers, memory addresses, constants, in fact, usually registers, AX, CX is the register in the x86 CPU.
Let’s take a look at the registers of the x86 CPU in real mode. Each register in the table is 16 bits
Access memory in real mode
Although there are registers, the data and instructions are placed in memory. Usually, data needs to be loaded into the register before it can be operated, and there are instructions to obtain the register. These need to access memory, and accessing memory requires knowing the address of the memory.
So how is this address calculated?
Combined with the above figure, it can be found that all memory addresses are shifted 4 bits to the left by the segment register, plus a value or constant in the general purpose register to form an address, and then access the memory from this address.
This is the famous ** segmented memory management model ** (this memory management model should only be used in real mode, modern operating systems use paging or segmented paging). However, it should be noted here that the ** code segment is determined by CS and IP, and the stack segment is determined by SS and SP segments ** (CS, IP register is conceptually equivalent to what we often say PC register).
Next we write a DOS Hello World application, which is a work in real mode assembly code program, a total of 16 bits, the specific code is as follows
1 | Data SEGMENT; Define a data segment to store Hello World! |
The structural model in the above code is also in line with the segmented memory management mode in the real mode of the CPU. After they are converted into binary data by the assembler, they also exist in the form of segments. The comments in the code are very clear, and you should be able to understand them easily. Most of them are operation registers, where LEA is the address instruction and MOV is the data transfer instruction, which is the INT interrupt. You may not understand it yet, let’s study it.
Real mode interrupt
Interrupt is to stop the execution of the current program, and then jump to another specific address to run specific code. In real mode, its implementation process is to save the CS and IP registers first, and then load the new CS and IP registers, so how is the interrupt generated?
In the first case, the interrupt controller sends an electronic signal to the CPU, and the CPU will respond to this signal. The interrupt controller will then send the interrupt number to the CPU, which is a hardware interrupt.
The second situation is that the CPU executes the INT instruction, which will be followed by a constant, which is the soft interrupt number. This situation is a software interrupt.
Both hardware and software interrupts are a way for the CPU to respond to external events.
In order to implement interrupts, an interrupt vector table needs to be placed in memory. The address and length of this table are pointed to by the specific register IDTR of the CPU. In real mode, an entry in the table consists of code segment addresses and intra-segment offsets, as shown in the following figure
The meaning shown in the figure above, the IDTR register saves the start address and length of the interrupt vector table, which stipulates where the interrupt vector can be stored. When an interrupt number is received, it can first compare whether the length is exceeded. If not, then add the interrupt number and the start address to obtain the entry of the current interrupt.
With the interrupt number, the CPU can calculate the entries in the interrupt vector based on the information in the IDTR register, and then load the CS (load code segment base address) and IP (load code segment offset) registers, and finally respond to interrupt.
Protection mode
As the size of the software continues to increase, higher computation and larger memory capacity are required. With a large memory, the first problem to be solved is the addressing problem, because the 16-bit register can only represent 216 addresses at most, so the CPU’s registers and arithmetic units must be expanded to 32 bits.
However, although expanding the number of bits of CPU internal devices solves the calculation and addressing problems, it still does not solve the problem in the previous real mode scenario. There are two reasons for the problem in the previous scenario.
First, the CPU executes any instruction indiscriminately.
Second, the CPU does not restrict the address of accessing memory.
Based on these reasons, the CPU implements a protected mode. That is, virtual abstraction is performed on the CPU.
Protected mode register
Compared with the real mode, the protected mode adds some control registers and segment registers, expands the bit width of the general purpose register, all the general purpose registers are 32 bits, and the lower 16 bits can be used alone. Split into two 8-bit registers, as shown in the table below.
Protected mode privilege level
In order to distinguish which instructions (such as in, out, cli) and which resources (such as registers, I/O ports, memory addresses) can be accessed, the CPU implements privilege levels.
Privilege levels are divided into 4 levels, R0~ R3, each privilege level executes a different number of instructions, R0 can execute all instructions, R1, R2, R3 in descending order, they can only execute a subset of the number of instructions at the previous level. The access to memory is achieved by the cooperation between the segment descriptor and the privilege level mentioned later.
R0 has the most power and can access resources with low privilege levels, but not vice versa.
Protected mode segment descriptor
Due to the expansion of the CPU, the 32-bit segment base address and intra-segment offset, as well as some other information, the 16-bit segment register must not be placed. If you can’t put it down, you need to find memory to borrow space, and then encapsulate the information describing a segment into a segment descriptor in a specific format and put it in memory. The format is as follows.
A segment descriptor has 64 bits and 8 bytes of data, which contains the segment base address, segment length, segment permission, segment type (which can be system segment, code segment, data segment), whether the segment is readable and executable, etc. Although the data distribution is a bit messy, this is due to historical reasons.
A segment descriptor has 64 bits and 8 bytes of data, which contains the segment base address, segment length, segment permission, segment type (which can be system segment, code segment, data segment), whether the segment is readable and executable, etc. Although the data distribution is a bit messy, this is due to historical reasons.
Multiple segment descriptors form a global segment descriptor table in memory. The base address and length of this table are indicated by the CPU and GDTR registers. As shown in the figure below.
** We can see at a glance that the segment register no longer stores the segment base address, but the index of the specific segment descriptor. When accessing a memory address, the index in the segment register will first find the segment descriptor in memory in conjunction with the GDTR register., and then judge whether the access is successful according to the segment information **.
Protected mode segment selector
If you think that CS, DS, ES, SS, FS, GS these segment registers, which is stored inside a memory segment descriptor index, then you can be sloppy, in fact, they are composed of shadow registers, segment descriptor index, descriptor table index, permission level. As shown in the figure below
The shadow register in the above figure is operated by hardware and is not visible to system programmers. It is a cache of a segment descriptor designed by hardware to reduce performance loss. Otherwise, every memory access has to go to the memory to look up the table. The performance loss is huge, and the shadow register is exactly 64 bits, which stores 8-byte segment descriptor data.
The reason why the lower three bits can put TI and RPL is because the segment descriptor is 8-byte aligned, and the lower three bits of each index are 0 (this part means that because the segment descriptor is 8-byte aligned, its index is All are integer multiples of 8, so the lower three bits are 0. Similar to 1000, 10000, 11000. So the lower three bits can be used for other things, here we use TI and RPL.), we don’t need to pay attention to LDT, we just need to use the GDT global descriptor table, so TI is always set to 0.
Usually, RPL in CS and SS constitutes CPL (current permission level), so it is often RPL = CPL, and then CPL indicates what permission the initiator wants to access the target segment. When the CPL is greater than the target segment DPL, then The CPU prohibits access, and only when the CPL is less than or equal to the target segment DPL can it be accessed.
Protected mode flat model
There are many flaws in the segmentation model, which will be introduced in detail in the memory management course later. In fact, modern operating systems will use the paging model (this will be discussed in the MMU class later).
** However, the x86 CPU cannot directly use the paging model, but must decide whether to turn on paging as needed under the premise of the segmented model. ** Because this is a hardware regulation, programmers cannot change it. But we can simplify the design to make segmentation a “dummy”, which is the flat model of protected mode.
According to the previous description, we found that the CPU 32-bit registers can only generate addresses of up to 4GB size, and the length of a segment can only be 4GB, so we set the base address of all segments to 0, and the maximum length of the segment is set to 0xFFFFF, and the granularity of the segment length is set to 4KB, so that all segments point to the same (0~ 4GB-1) byte-sized address space.
Let’s take a look at the previous Hello OS middle segment descriptor table, as shown below
1 | GDT_START: |
The comments in the above code are already clear. The segment length needs to be matched with the G bit. If the G bit is 1, the segment length is equal to 0xfffff 4KB **. The DPL = 0 of the above segment descriptor means that the highest privilege is required, that is, CPL = 0 to access.
Protected mode interrupt
Because the CPU does not need to do permission checking in real mode, it can load the CS: IP register directly from the values in the interrupt vector table.
The interrupt in protected mode requires permission checking and privilege level switching, so it is necessary to expand the information of the interrupt vector table, that is, each interrupt is represented by an interrupt gate descriptor, which can also be referred to as an interrupt gate. The interrupt gate descriptor still has its own format, as shown in the figure below.
Similarly, to implement an interrupt in protected mode, there must also be an interrupt vector table in memory, which is also pointed to by the IDTR register, but the entry in the interrupt vector table becomes the interrupt gate descriptor, as shown in the following figure.
After an interrupt is generated, the CPU will first check whether the interrupt number is greater than the last interrupt gate descriptor. The x86 CPU supports a maximum of 256 interrupt sources (ie, interrupt numbers: 0~ 255), and then check the descriptor type (whether it is an interrupt gate or a trap gate), whether it is a system descriptor, and whether it exists in memory.
Next, check the segment descriptor in the interrupt gate descriptor to which the segment selector points.
Finally, do a permission check. If the CPL is less than or equal to the DPL of the interrupt gate, and the CPL is greater than or equal to the DPL of the segment descriptor pointed to by the segment selector in the interrupt gate, it points to the DPL of the segment descriptor.
Further, CPL is equal to the DPL of the segment selector in the interrupt gate pointing to the segment descriptor, which means that the same level permission does not perform stack switching, otherwise stack switching will be performed. If you perform a stack switch, you also need to load the SS and ESP of the specific permissions from the TSS, and of course check the segment descriptor pointed to by the segment selector in the SS. After this series of checks, the CPU will load the target code segment selector in the interrupt gate descriptor into the CS register, and load the target code segment offset into the EIP register.
Long mode
Long mode, also known as AMD64, because this standard was first defined by AMD, it enables the CPU to have 64-bit processing power on the existing basis, which can not only complete 64-bit data operations, but also address 64-bit address space. This is still important on large computers, because their physical memory usually has several hundred GB.
Compared with protected mode, long mode adds some general purpose registers and expands the bit width of the general purpose register. All general purpose registers are 64 bits, and the lower 32 bits can also be used alone. This lower 32 bits can be split into a lower 16-bit register, and the lower 16 bits can be split into two 8-bit registers.
Long mode still has most of the features of protected mode, such as privilege level and permission checking. The same parts will not be repeated, and only the differences between long mode and protected mode will be explained here.
Long mode segment descriptor
In long mode, the CPU no longer checks the segment base address and segment length, but only checks the DPL. This check process is the same as in protected mode. When L = 1 and D/B = 0 in the descriptor, it is a 64-bit code segment, and the DPL is still a privilege level of 0~ 3. Then there are multiple segment descriptions in memory to form a global segment descriptor table, which is also pointed to by the CPU’s GDTR register.
Long mode interrupt
In order to achieve the protection mode interrupt permission check, to achieve the interrupt gate descriptor, stored in the interrupt gate descriptor corresponding to the segment selector and its offset within the segment, as well as DPL permissions, if the permission check is passed, then with the corresponding segment selector and its offset within the segment load CS: EIP register.
If you remember the interrupt gate descriptor, you will find that the in-segment offset is only 32 bits, but long mode supports 64-bit memory addressing, so we need to modify and expand the interrupt gate descriptor. Let’s take a look at the format of the interrupt gate descriptor in long mode.
Firstly, in order to support 64-bit addressing, the interrupt gate descriptor is added by 8 bytes on the original basis to store the high 32-bit value of the target segment offset. Secondly, the code segment descriptor corresponding to the target code segment selector must be a 64-bit code segment. The IST in the last one is the IST pointer in 64-bit TSS, because we do not use this feature, so we will not introduce it in detail. Long mode also has an interrupt gate descriptor table in memory, but the entries in the table (as shown in the figure above) are 16 bytes in size, supporting up to 256 interrupt sources, and the response to interrupts and related permissions are checked and protected. Mode is the same
Reference link: https://time.geekbang.org/column/article/375278