Computer Components (2) - Representation of Data in Computers

Posted on 2022-03-19 Edited on 2025-07-02 In ComputerOrganization Waline:

We all know that the core of the computer is the CPU, and the CPU is the arithmetic unit and controller, and its role is to perform mathematical operations.

If you want to understand the working principle of the CPU, you must start from two aspects, on the one hand, how the data in the computer is represented, and on the other hand, how the operation rules are and how the computer uses the circuit to represent this operation.

Let’s first understand the representation of data in a computer.

Within a computing system, all information is encoded in binary for the following reasons:

Binary has only two states, using a physical device with two stable states can represent each bit of the binary number at low cost.
Binary 1 and 0 correspond exactly to the “true” and “false” of logical values, which provides convenience for computers to implement logical operations and logical judgments in programs.
The operation rules of binary coding are simple, and arithmetic operations can be easily realized through logic gates.

Carry counting

The value represented by each digit is equal to multiplying the digit itself by a constant related to its digit, which is called ** bit weight **. Is the numerical size of a carry number its digits? Adding up the weights, the value of a base r number can be expressed as:

K_nr^n + K_{n-1}r^{n-1}+ ... + K_0r^0 + K_{-1}r_{-1} + ... + K_{-m}r^{-m} = \sum_{i=n}^{-m}K_ir^i

Based on this formula, we can obtain the conversion formula between different bases.

We commonly use binary, octal, decimal and hexadecimal.

Among them, binary, octal, and hexadecimal are all integer powers of 2. There are simpler ways to convert between them, and the conversion between these three types of binary and decimal should be deduced through the numerical representation above.

The first is the conversion between ** binary, octal, and hexadecimal, which is actually carried out through binary bit bridges. One octal can be represented by three binary numbers, and one hexadecimal number can be represented by four binary numbers. The conversion between octal and hexadecimal needs to be converted to binary first.

And ** any binary conversion to decimal **, only need to multiply the digits and bit weight on the line, in fact, is the above formula, but the left side of the formula is a string of numbers added, and the right side although the number, with how many binary can be, but we are accustomed to using decimal to express.

The decimal conversion to any binary is divided into two cases:

For the integer part, the base division method is used. The lowest bit of the remainder digit obtained first, the highest bit of the remainder digit obtained last, and the quotient ends at 0.
For the fractional part, the multiplication basis rounding method is used. The fractional part is rounded by the basis, the first integer obtained is the highest digit of the number, and the last integer obtained is the lowest digit of the number. It stops when the product is 1.0 or when the accuracy requirement is met.

There may be some confusion here, why integer part division, fractional part division, which can be seen by combining the formula we just saw, we have the integer part of the formula just now

\frac {K_nr^n + K_{n-1}r^{n-1}+ ... + K_0r^0} r

K0 is the remainder, that is, the first remainder we get, it should be the lowest bit, then divide by r, the remainder is k1, continue to divide, you will get kn, so the last remainder kn is the highest bit.

And for the fractional part, we want to get a number with a bit weight of r ^ 0, we have to keep multiplying by r

Coded representation of fixed-point number

Depending on whether the position of the decimal is fixed or not, there are two data formats in the computer: fixed-point representation and floating-point representation.

In the computer, ** usually with fixed-point complement integer represents the integer, fixed-point original decimal represents the floating-point number of bits, with frameshift represents the order of the floating-point number part **.

Fixed-point representation of number of machines

Fixed-point notation is used to represent fixed-point decimals and fixed-point integers:

Fixed-point decimal. Fixed-point decimal is a pure decimal, and the decimal point is agreed to be after the sign bit and before the highest bit of the effective value part. If the data X is of the form $X = x_0 x_1x_2… x_n $, where $x_0 $is the sign bit and $x_1~ x_n $is the effective part of the value.
Fixed-point integer. Fixed-point integers are pure integers, and it is agreed that the decimal point is after the lowest bit of the effective value part. If the data X is in the form of $X = x_0x_1x_2 x_n $, where $x_0 $is the sign bit, and $x_1~ x_n $is the effective part of the value.

Note that whether it is a fixed-point decimal or a fixed-point integer, whether it is a signed number or an unsigned number, it is a string of binary in the register, that is, the machine number, and how to interpret this string of binary is a matter of the program. For example, if the type is unsignint, it is to be interpreted by unsigned integers.
It is precisely because of this that no matter the integer decimal, signed and unsigned are the same operation method, so we propose the complement code, because only this way of representation can achieve this.

Original code, reverse code, complement code and frameshift code

First, give a few more confusing conclusions:

The original code, inverse code, and complement code of positive numbers are all the same. Pay attention to the complement operation of positive and negative numbers, and do not use the method of converting negative numbers to complement numbers to positive numbers.
The original code and frameshift of positive numbers are different.
Since we are talking about negative numbers, it must be signed numbers, so let’s not talk about unsigned numbers.
A positive number is not equal to an unsigned number, but the inverse complement of the original code of the unsigned number is the same.

Personal speculation, before the data enters the adder, there is one thing that must be done. If it is not done by the program compiler, it is done by a special circuit, that is, if it is a signed negative number, it will be converted into complement code and enter the adder. Unsigned numbers and signed positive numbers directly enter the adder. It feels more likely to have a dedicated circuit to do this.

Original code

The highest bit of the machine number is used to represent the sign of the number, and the rest of the bits represent the absolute value of the number. The definition of the original code is as follows:

The original code definition of pure decimal:

[x]_{原} = \begin{cases} x & 1\gt x\ge0\\ 1 - x = 1 + |x| & 0 \gt x\gt-1 \end{cases}

X is the true value, and $[x] _ {original} $is the number of original code machines.

If $x_1 =+ 0.1101, x_2 = -0.1101 $and the word length is 8 bits, the original code is represented as: $[x_1] _ original = 0.1101000, [x_2] _ original = 1- (-0.1101) = 1.1101000 $, where the highest bit is the sign bit

In fact, there is another knowledge point implied here: the original code is filled with 0.

If the word length is n + 1, then the decimal representation range of the original code is: $- (1-2 ^ {-n}) \ le x\ le 1 - 2 ^ {-n} $, symmetric about the origin.

The original code definition of pure integers:

[x]_{原} = \begin{cases} 0,x & 2^n\gt x\ge0\\ 2^n - x = 2^n + |x| & 0 \gt x\gt-2^n \end{cases}

The representation range of n + 1-bit original code positive numbers is: $- (2 ^ n-1) \ le x\ le 2 ^ n-1 $, symmetry about the origin

For the original code, there are positive 0 and negative 0 two representations, namely + 0 = 00000, -0 = 10000.

Complement

The original code addition and subtraction operation rules are more complicated. For the addition of two different signed numbers (or the subtraction of the same signed number), the absolute value size of the two numbers must be compared first, and then the large absolute value is subtracted from the small absolute value. Finally, it is more troublesome to choose the appropriate symbol for the result, and the addition and subtraction operation of the supplementary code is unified.

Complement definition of pure decimal:

[x]_{补} = \begin{cases} x & 1\gt x\ge0\\ 2 + x = 2 - |x| & 0 \gt x\ge-1 \end{cases} \ mod \ 2

Complement definition of pure integers:

[x]_{补} = \begin{cases} 0,x & 2^n\gt x\ge0\\ 2^{n+1} + x = 2^{n+1} - |x| & 0 \gt x\ge-2^n \end{cases} \ mod\ 2^{n+1}

The number of representations of the complement code is one more than the original code, that is, the minimum number that can represent one more number. The reason is that the positive 0 and negative 0 of the complement code are the same, so we define the original -0 as $-2 ^ n $

Complement is the most important representation. First of all, we need to understand why it is called complement. Because the quantitative relationship between the original code and the complement is complementary. For negative numbers, the original code + complement corresponding to a positive number is equal to 2 dollars ^ {n + 1} $

As for the latter modulo 2 is a natural operation of the computer, the number of machines is a total of 8 bits, that is, when n = 7, if you add 2 dollars ^ 8 $, that is, there is an extra 1 before the highest bit, but this bit In fact, it will overflow, which is equivalent to modulo 2 dollars ^ {n + 1} $.

So why do we have to go around in such a big circle?

We use the definition of pure integers, assuming that the machine number is 8 bits long, that is, n = 7 of the above formula, there is a negative number -9, we usually calculate 126 plus -9, which can be directly calculated, that is 117.

But if you add 2 to the 8th power (which is 256), that’s 256-9 = 247, ** we’ll put 126-9 as bit (126 + 247) mod 256, or 117 **

After making a circle like this, we can also express subtraction as addition.

** When x is negative, the complement is 2 dollars ^ {n + 1} + x\ mod\ 2 ^ {n + 1} $is a numerical representation, indicating that the result of the calculation is correct, but in fact we find the complement by 2 dollars ^ {n + 1} - | x | $**

Reverse code

The inverse code is relatively simple and does not have much meaning. It is just an intermediate state in which the original code becomes a complement. The inverse of the original code except the sign bit is the inverse code, and the inverse code plus 1 is the complement code.

Note that the above set is for negative numbers, and the positive inverse code is the same as the original code.

Frameshift

The definition of frameshift is:

[x_{移} = 2 ^ n + x]

His name is frameshift, which is actually equivalent to the whole representation of the original code that was symmetrical on the number line, and the whole is moved to the right to the positive part of the number line. Its function is to compare the frameshift bit by bit, and the big one is the big one. Unlike signed numbers, the first bit of 0 is smaller than 1, but in fact 0 is a positive sign.

And it is interesting that the shift code and the complement code are opposite except for the symbol.
That is to say, the process of converting the original code to the inverse code and the process of transferring the complement code are the same, both the symbols are invariant, and the others are inverted.