Introduction to the HTTPS Protocol

The HTTPS protocol is a combination of HTTP + SSL/TLS, and HTTP is the protocol used to obtain information from the World Wide Web server, so let’s start with the World Wide Web.

The World Wide Web and HTTP

The World Wide Web (WWW) is not a special kind of computer network. The World Wide Web is a large-scale, online information repository, referred to as the web in English. The World Wide Web uses links to make it very convenient to access another site from one site on the Internet.

Every site on the World Wide Web contains many docs. The text in some of these docs is displayed in a special way. When we move the mouse over these places, the mouse arrow changes into the shape of a hand, which indicates that there is a link here. If we click on these links, we can link from this doc to another doc that may be far away.

It is precisely because of the emergence of the Internet that the Internet has changed from being used by a few computer experts to being used by ordinary people.

We usually open web pages from the browser to obtain information from the World Wide Web server, using the HTTP protocol. But when we go to play LOL, the communication between the Client and the server can not be HTTP protocol.

The World Wide Web is a distributed hypermedia. It is an extension of the hypertext system. The so-called hypertext refers to the text that contains links to other docs. A hypertext is made up of multiple sources of information that are distributed around the world.

** The Client program of the World Wide Web is the browser **, and the host where the World Wide Web doc is stored is the World Wide Web server, which runs the World Wide Web server program. ** The client program sends a request to the server program, and the server program sends the World Wide Web doc required by the client program **. The World Wide Web doc displayed in the browser is called a page.

Question

From the above description, it can be seen that the World Wide Web must solve several problems.

  • How to mark the World Wide Web network distributed throughout the network.
    What protocol is used to implement various links on the World Wide Web?
  • How to make different styles of World Wide Web docs created by different authors can be displayed on various hosts on the Internet, and make it clear to users where links exist.
  • How to make it easy for users to find the information they need.

To solve the first problem, the World Wide Web uses Uniform Resource Locator URLs to identify various docs on the World Wide Web and make each doc have a unique identifier URL throughout the entire Internet.

In order to solve the second problem, the interaction between the client program of the World Wide Web and the World Wide Web server adheres to a strict protocol, which is the HyperText Transfer Protocol HTTP (HyperText Transfer Protocol), which is an application layer protocol that uses TCP for reliable transmission.

To solve the third problem, the World Wide Web uses the Hypertext Markup Language (HTML).

The solution to the last problem is search engines.

Uniform Resource Locator URL

A URL is equivalent to a network-wide extension of a file. It is a pointer to any accessible object on a machine connected to the Internet (not only for HTTP). Since different protocols are used to access different objects, it is also necessary to indicate the protocol required to read a certain object.

** The general form of a URL consists of four parts: < protocol >://< host >: < port >/< path > **

** Now some browsers for the convenience of users, when entering the URL, you can bar the front of the HTTP://or even www to omit **.

Hypertext Transfer Protocol HTTP

HTTP is the most transaction-oriented application layer protocol, which is an important foundation for reliable file delivery on the World Wide Web.

Each World Wide Web server has a server process that constantly listens on TCP port 80 to discover if a browser has made a connection establishment request to it. Once a connection establishment request is listened for and a TCP connection is established, the browser sends a request for a page like a World Wide Web server, and the server returns the requested page as a response.

HTTP specifies that each interaction between an HTTP client and an HTTP server consists of a request consisting of an ASCII string and a similar universal Internet extension, known as a “MIME-like” response.

HTTP uses connection-oriented TCP as the transport layer protocol to ensure reliable data transmission. HTTP does not have to consider how to retransmit data after it is discarded during transportation.

When the user clicks the mouse to link a World Wide Web doc, the HTTP protocol first establishes a TCP connection with the server, which requires a three-way handshake. When the first two times of the three-way handshake to establish a TCP connection are completed, the browser can use the HTTP request message network packet as the The data of the third message network packet is sent over. After the server receives the request, it returns the requested doc to the client as a response message network packet.

The above process is HTTP1.0, which requests a doc in the amount of time it takes to transmit the doc plus twice the RTT, one RTT for connecting the TCP connection, and another RTT for requesting and receiving the doc.

But the main disadvantage of 1.0 is that every request for a doc has twice the RTT overhead. If an object with many links on the homepage requires one connection, each download must have 2RTT. Another overhead is that the browser and server have to allocate caches and variables every time a new TCP link is established. In particular, the server needs to serve a large number of browsers, so this kind of non-persistent connection will burden the World Wide Web server heavily

The HTTP/1.1 protocol adopts a persistent connection, which maintains this connection for a period of time after the server returns a response, so that the same client and server can continue to send subsequent HTTP requests and responses on this connection. 1.1 has two working methods, one is non-pipelined and the other is pipelined. The first method requires waiting for the response from the previous request to arrive before sending the next request, while the pipelined method does not.

Proxy server

A proxy server is a network entity, also known as a web cache, that temporarily stores some recent requests and responses in the cache.

HTTP message network packet structure

HTTP has two types of message network packets:

  • Request message network packet
  • response message network packet

HTTPS://res.cloudinary.com/dvtfhjxi4/image/upload/v1587259061/computer_network/微信截图_20200419091617_aku8ru.png

HTTP request and response message network packets are composed of three parts:

  • Start line: used to distinguish between request message network packet and response message network packet, the request message is called the request line, and the response message is called the status line
    Header line: Some information used to describe the body of a browser, server, or message network packet, which can have several lines or not
  • Entity body: It is generally not used in the request message, and this field may not be available in the message you want to use.

This is just a brief introduction to HTTP. In practice, HTTP is very complex, and there are many types of headers used.

SSL/TLS

Before understanding this content, you need to understand非对称加密,数字签名和证书

Role

HTTP communication without SSL/TLS is unencrypted communication. All information transmission in plaintext brings three major risks.

(1)

(2)

(3)

The SSL/TLS protocol is designed to address these three major risks, hoping to achieve:

(1)

(2)

(3)

The Internet is an open environment, and both parties to the communication are unknown, which brings great difficulty to the design of the protocol. Moreover, the protocol must also be able to withstand all unimaginable attacks, which makes the SSL/TLS protocol extremely complex.

Basic flow

The basic idea of the SSL/TLS protocol is to use public key encryption, that is to say, the Client first asks the server for the public key, and then encrypts the information with the public key. After the server receives the ciphertext, it decrypts it with its own private key.

However, there are two problems here.

** (1) How to ensure that the public key is not tampered with? **

Solution: Put the public key in the [digital certificate] (HTTP: //en.wikipedia.org/wiki/Digital_certificate). As long as the certificate is trusted, the public key is trusted.

** (2) Public key encryption calculation is too large, how to reduce the time consumption? **

Solution: For each session, the client and server generate a “session key”.

Therefore, the basic process of the SSL/TLS protocol is as follows:

(1)

(2)

(3)

The first two steps of the above process are also known as the “handshake stage” (handshake).

Detailed process

The “handshake phase” involves four communications, let’s look at them one by one. It should be noted that all communications in the “handshake phase” are plaintext.

** 4.1 Client sends a request (ClientHello) **

First, the client (usually the browser) first sends a request for encrypted communication to the server, which is called a ClientHello request.

In this step, the client mainly provides the following information to the server.

(1)

(2)

(3)

(4)

It should be noted here that the information sent by the Client does not include the domain name of the server. That is to say, in theory, the server can only contain one website, otherwise it will be unclear which website’s digital certificate should be provided to the Client. This is why usually a server can only have one digital certificate.

For users of virtual hosts, this is of course very inconvenient. In 2006, the TLS protocol joined aServer Name Indication扩展Allow the Client to provide the requested domain name to the server.

** 4.2 Server Response (SeverHello) **

After receiving the Client request, the server sends a response to the Client, which is called SeverHello. The server’s response includes the following content.

(1)

(2)

(3)

(4)

In addition to the above information, if the server needs to confirm the identity of the client, it will include a request for the client to provide a “Client certificate”. For example, Financial Institution often only allows authenticated clients to connect to its own network, and will provide official clients with a USB key, which contains a Client certificate.

** 4.3 Client Response **

After the client receives the server response, it first verifies the server certificate. If the certificate is not issued by a trusted authority, or the domain name in the certificate does not match the actual domain name, or the certificate has expired, a warning will be displayed to the visitor, who can choose whether to continue communication.

If there is no problem with the certificate, the Client will retrieve the server’s public key from the certificate. Then, send the following three pieces of information to the server.

(1)

(2)

(3)

The random number of the first item above is the third random number that appears in the entire handshake stage, also known as the “pre-master key”. With it, the Client and the server have three random numbers at the same time, and then both parties use the previously agreed encryption method to generate the same “session key” for this session.

As for why it is necessary to use three random numbers to generate “session keys”,dog250Well explained:

"Whether it is a client or a server, random numbers are required so that the generated key will not be the same every time. Since certificates are static in the SSL protocol, it is necessary to introduce a random factor to ensure the negotiated key randomness.

For the RSA key exchange algorithm, the pre-master-key itself is a random number, coupled with the randomness in the hello message, the three random numbers ultimately derive a symmetric key through a key derivator.

pre

In addition, if the server requested a Client certificate in the previous step, the Client will send the certificate and related information in this step.

** Last response from 4.4 server **

After the server receives the third random number pre-master key from the client, it calculates the “session key” used to generate this session. Then, finally send the following information to the Client.

(1) Code change notice, indicating that subsequent information will be sent using the encryption method and key agreed by both parties.

(2) The server handshake end notification indicates that the handshake phase of the server has ended. This item is also the hash value of all the content sent earlier, which is used for verification by the Client.

At this point, the entire handshake phase is over. Next, the Client and server enter encrypted communication, using the normal HTTP protocol, but using the “session key” to encrypt the content.

Reference article:

HTTPS://www.ruanyifeng.com/blog/2014/02/SSL_TLS.html