web instant messaging solution finishing

Posted on 2023-04-08 Edited on 2025-07-02 In Sundry Waline:

Recently, I was curious about the protocol that chatgpt can return with streaming requests, so I went to learn about it. At first, I thought it was HTTP2, or the WS protocol, but later found out that it was not, but a protocol called SSE (Server-Sent Events).) protocol.

Instant messaging protocol classification

HTTP and HTTP2

HTTP protocol is our most common front-end and back-end instant messaging protocol, the more common is HTTP1, and later in order to solve some of its problems, we proposed HTTP2, but there is no particular change in nature.

Disadvantages of Http1

Line head blocking: The method is that several requests are queued for serial single-threaded processing, and the latter requests wait for the return of the previous request to get the opportunity to execute. Once a certain request timed out, etc., the subsequent request can only be blocked, and there is no way., which is often called thread head blocking;
Insufficient use of TCP links: In HTTP 1.x, if you want to make multiple requests, you must use multiple TCP links, and the browser will limit 6-8 TCP link requests to a single domain name in order to control resources

Advantages of HTTP2

Multiple paths to reuse: The most valuable advantage solves the problem of thread head blocking, allows a single http2 connection to send multiple requests and responses, and makes full use of TCP. This makes resource sub-domain names, sprite maps, internal connection styles, etc. no longer applicable.
Header compression: HTTP2.0 can maintain static dictionaries and dynamic dictionaries on the client and server side to compress and differentially update HTTP headers, greatly reducing the traffic generated by header transmission. Headers in non-two dictionaries can be compressed using Huffman compression.
New binary format: http1.x is a text format transfer, http2 is a binary format transfer.
server level push: The server side can actively push resources to the Client.

Here I will focus on this server level push. This push is different from websocket. Its server level push here means that the server level can actively push other resources when sending the HTML of the page, without waiting for the browser to parse to the corresponding location and initiate the request. For example, the server level can actively push JS and CSS files to the Client without the need for the Client to parse HTML and then send these requests.

The server level can actively push, and the Client also has the right to choose whether to receive it. If the resources pushed by the server level have been cached by the browser, the browser can reject them by sending RST_STREAM frames. Active push also abides by the same-origin policy, and the server will not casually push third-party resources to the Client.

WebSocket

In the new generation of html standard html5, a network technology Websocket for full-duplex communication between browsers and servers is provided. From the Websocket draft, Websocket is a new and independent protocol, based on the TCP protocol, which is compatible with the http protocol, but will not be integrated into the http protocol, but only as part of html5. So the script is given another ability: to initiate websocket requests. We should be familiar with this method because Ajax does it this way, but the difference is that Ajax initiates http requests only.

Different from the request/response mode of the http protocol, Websocket has a Handshake (Opening Handshake) process before establishing a connection, and a Handshake (Closing Handshake) process before closing the connection. After establishing the connection, the two parties can communicate in both directions.

From the perspective of browser support, WebSocket is just around the corner, but there is still a long way to go, especially in China, a country where IE6, 7, and 8 are still prevalent. It will take a long time for the demise of old versions of browsers., Comet technology may still be the best solution before full browser compatibility is fully realized. However, there are currently some mature packaging solutions to solve this compatibility limitation

SSE

SSE (Server-Sent Event) is an HTML5 technology that allows the server level to push new data to the client. This is a better solution than having the client pull new data from server level polling every few seconds.

Compared to WebSocket, it can also push data from the server level to the client. So how do you decide whether to use SSE or WebSocket? In general, what WebSocket can do, SSE can do, and vice versa, but they each have their own advantages in completing certain tasks.

WebSocket is a more complex server level implementation technology, but it is a true two-way transmission technology, which can push data from the server level to the client and from the client to the server level.

WebSocket and SSE have similar browser support rates, and most mainstream desktop browsers support both. In Android 4.3 and earlier, the system default browser does not support either, while Firefox and Chrome fully support both; in Android 4.4, the system default browser supports both; Safari supports SSE since 5.0 (iOS systems from 4.0), but does not properly support WebSocket until 6.0 (the WebSocket protocol implemented by Safari before 6.0 has security issues, so some mainstream browsers have disabled implementations based on this protocol).

Compared to WebSocket, SSE has some significant advantages. Personally, I think its biggest advantage is convenience: you don’t need to add any new components, and you can continue to use it in any backend language and framework you are used to. You don’t have to worry about creating a new virtual machine, getting a new IP or a new Port Number, it’s as simple as adding a new page to an existing website. I like to call this an existing infrastructure advantage.

The second advantage of SSE is the simplicity of the server. Relatively speaking, WebSocket is very complex and cannot be handled without the help of auxiliary class libraries.

Because SSE works on the existing HTTP/HTTPS protocol, it can run directly on existing proxy servers and authentication technologies. For WebSocket, proxy servers require some development (or other work) to support, and at the time of writing this book, many servers do not have it (although this situation will improve). SSE has another advantage: it is a text protocol, and script debugging is very easy.

However, this leads to a potential advantage of WebSocket over SSE: WebSocket is a binary protocol, while SSE is a text protocol (usually encoded in UTF-8). Of course, we can transfer binary data over an SSE connection: in SSE, there are only two characters with special meaning, CR and LF, and it is not difficult to transcode them. However, when transmitting binary data with SSE, the data will become larger. If you need to transfer a large amount of binary data from the server level to the Client, it is best to use WebSocket.

** The biggest advantage of WebSocket over SSE is that it is two-way communication, which means that sending data to the server level is as easy as receiving data from the server level. When using SSE, data is generally transmitted from the Client to the server level through a separate Ajax request. ** Compared to WebSocket, using Ajax in this way increases the overhead, but it is only a little more. As a result, the question becomes “When do you need to care about this difference?” If you need to transfer data to the server level at a frequency of 1 times per second or faster, you should use WebSocket. The frequency of 0.2 times per second to 1 time per second is a gray area, and there is little difference between using WebSocket and using SSE; but if you expect heavy loads, it is necessary to determine the reference point. When the frequency is lower than 0.2 times per second or so, there is little difference between the two.

What is the performance of transferring data from the server level to the Client? If it is text data instead of binary data (as mentioned earlier), there is no difference between SSE and WebSocket. They both use TCP/IP sockets and are both lightweight protocols. There is no difference in latency, bandwidth, server load, etc., unless… eh? Unless what?

When you take advantage of SSE’s existing infrastructure and set up a web server between client and server level scripts, the difference becomes apparent. An SSE connection not only uses a socket, but also takes up an Apache thread or process. If you use PHP, it will create a new PHP instance specifically for this connection. Apache and PHP use a lot of memory, which limits the number of parallel connections the server can support. Therefore, to achieve the same data transmission performance as WebSocket with SSE, you need to write your own backend server. Of course, those who will use their own server and use Node.js in any case will find this strange.

Let’s talk about the compatibility of WebSocket on older browsers. Currently, more than 2/3 of browsers support these new technologies, and the support rate of mobile end browsers will be lower. By convention, Flash is used whenever a two-way socket is needed, and backwards compatibility of WebSocket is usually done with Flash, which is quite complicated. If Flash is not available on the browser, the situation is even worse. In general, WebSocket is difficult to compatible, SSE is easy to compatible.

Instant messaging method overview

In 1996, the IETF HTTP Working Group released version 1.0 of the HTTP protocol. Until version 1.1, which is commonly used now, the HTTP protocol has undergone 17 years of development. This distributed, stateless, TCP-based request/response protocol, which is widely used today when the Internet is prevalent, seems to have made slow progress compared to the rapid development of the Internet. From the rise of the Internet to the present, it has experienced the web1.0 era when portal websites were prevalent, and then with the emergence of ajax technology, it has developed into the web2.0 era when web applications were prevalent, and now it is moving towards web3.0. On the other hand, the development of the http protocol from version 1.0 to 1.1, in addition to default long connections, is a painless improvement in cache handling, bandwidth optimization, and security. It has always retained a stateless, request/response model, and never seems to realize that this should change.

Fortunately, the era of HTML5 has arrived, bringing WebSocket and SSE (Server-sent Events) to the realization of instant messages on the Web

Ajax short polling: script sends http request

In order for a traditional web application to interact with a server, it must submit a form. The server receives and processes the incoming form, and then returns a brand new page. Because most of the data of the two pages before and after is the same, this process It transmits a lot of redundant data and wastes bandwidth. So Ajax technology came into being.

Ajax is short for Asynchronous JavaScript and XML, first proposed by Jesse James Garrett. This technology pioneered by allowing browser scripts (JS) to send http requests. Used by the Outlook Web Access team in 1998 and soon became part of IE4.0, but the technology remained niche until early 2005, when Google used it extensively in its goole groups, gmail and other interactive applications, making Ajax rapidly accepted by everyone.

The emergence of Ajax makes the transmission of data between the Client and the server much less and much faster, and also meets the needs of the initial development of the web2.0 era characterized by rich User Experience, but slowly it also exposes its drawbacks. For example, it cannot meet the requirements of real-time update data for rich interactive applications such as instant messaging. This browser-side technology is still based on the http protocol after all, and the request/response mode required by the http protocol cannot be changed unless the http protocol itself changes.

Comet: A Hack Technology (Still HTTP Protocol)

The traditional polling-based method can no longer meet the low latency requirements of web applications represented by instant messaging, and it will also bring bad User Experience. So a “server push” technology based on http long connections was hacked. This technology was named Comet, a term first proposed by Alex Russell, project director of Dojo Toolkit, in the blog post Comet: Low Latency Data for the Browser, and has been used ever since.

In fact, server push has existed for a long time and is widely used in the classic client/server model. It’s just that browsers are too lazy and do not provide good support for this technology. But the advent of Ajax made it possible to implement this technology on browsers, and the integration of Google’s gmail and gtalk first used this technology. With the resolution of some key issues (such as the loading and display issues in IE), this technology was quickly recognized, and there are now many mature open source Comet frameworks.

The following is a comparison of typical Ajax and Comet data transmission methods. The difference is simple and clear. The typical Ajax communication method is also the classic use of the http protocol. To obtain data, you must first send a request. In web applications with high Low Latency requirements, the frequency of server requests can only be increased. Comet is different. The client maintains a long connection with the server, and only when the data required by the client is updated, the server actively pushes the data to the client.

There are two main ways to implement Comet, long-polling based on Ajax and http streaming based on Iframe and htmlfile.

Long polling based on Ajax

The browser sends an XMLHttpRequest request. After receiving the request, the server side will block the request until there is data or timeout before returning it. The browser JS sends a request again after processing the request return information (timeout or valid data) to re-establish the connection. During this period, new data may have arrived on the server side, and the server will choose to save the data until the connection is re-established. The browser will retrieve all the data at once.

Streams based on Iframe and htmlfile

Iframe is an html tag. The src attribute of this tag will keep the long connection request to the specified server, and the server can keep returning data. Compared with the first method, this method is closer to the traditional server push.

In the first way, the browser will call the JS callback function directly after receiving the data, but how to respond to the data in this way? You can embed a JS script in the returned data, such as’ < script type = “text/javascript” > js_func (“data from server”) ', the server side will return the data as the parameter of the callback function, and the browser will execute this JS script after receiving the data.

WebSocket

WebScoket is a new protocol and naturally a new solution

SSE

SSE is a new protocol and naturally a new solution

Simple principle analysis of instant message scheme

Communication Principles of Traditional Web

The browser itself, as a thin Client, does not have the function of communicating directly with another Client browser in a remote place through system calls. This is different from the way our desktop applications work. Usually, desktop applications can establish a TCP connection with a process on the other end of the remote host through a socket, thus achieving full-duplex instant communication.

Since the birth of the browser, the client requests the server and the server returns the result, and there has been no change even though it has developed so far. So it is certain that in order to achieve communication between two clients, information must be forwarded through the server. For example, if A wants to communicate with B, A should first send the information to the IM application server, and the server forwards it to B according to the recipient carried in A’s information. Similarly, B to A is also in this mode.

Problems that need to be solved to realize IM application in traditional communication mode

We recognize that implementing IM software based on the web still requires the browser to request the server model. In this way, the development of IM software needs to solve the following three problems:

Duplex communication: that is, the browser pulls (pull) server data, and the server pushes (push) data to the browser;
Low latency: that is, the information sent by the browser A to B should be quickly forwarded to B through the server, and the information of B should also be quickly handed over to A, in fact, it requires any browser to quickly request the data of the server, and the server can Quickly push data to the browser;
Support cross-domain: Usually the Client browser and the server are in different locations of the network, the browser itself does not allow direct access to servers under different domain names through scripts, even if the IP Address is the same and the domain name is different, and the domain name is the same and the port is different. No, this is mainly for security reasons.

Full duplex low latency solution

Client browser short polling

This is the simplest solution. The principle is that the Client sends a request to the server every short period of time through Ajax, the server returns the latest data, and then the Client updates the interface according to the obtained data, so that It indirectly realizes instant communication. The advantage is simplicity, but the disadvantage is that it puts a lot of pressure on the server and wastes bandwidth traffic (usually the data does not change).

Long polling

In the above polling solution, since a request is sent every time, the server level sends the data regardless of whether the data changes or not, and the connection is closed after the request is completed. A lot of the communication passing through this is unnecessary, so there is a long-polling method. In this way, the client sends a request to the server, the server checks if the data requested by the client has changed (whether there is the latest data), and responds immediately if there is a change, otherwise the connection is maintained and the latest data is checked regularly until a data update or connection timeout occurs. At the same time, once the client connection is disconnected, the request is issued again, which greatly reduces the number of client requests to the server in the same time.

Based on http-stream communication

There is also a communication method based on http-stream stream. The principle is to keep the client connected to the server level continuously during a request, and then the server level continuously transmits data to the client, just like the data stream, it does not send all the data to the client at once. The difference between it and the polling method is that the client only sends a request during the entire communication process, and then the server level maintains a long connection with the client, and uses this connection to send back data to the client.

This scheme is divided into several different data stream transmission methods.

Streaming method based on XHR objects

The idea of this method is to construct an XHR object, by listening to its onreadystatechange event, when its readyState is 3, get its responseText and then process it. The readyState is 3, which means that the data transmission is in progress, and the entire communication process has not yet ended. So it is still getting the data sent by the server level. It means that the data is sent until the readyState is 4, and the communication process ends. In this process, the data passed by the server level to the client is sent to the Client in the form of stream multiple times, and the Client is also obtained in the form of stream, so it is called http-streaming data stream mode.

Here, since the data received by the Client is sent in segments, it is best to define a cursor received to obtain the latest data and discard the data that has been received before, and print out the latest data received each time through this cursor., and print out the entire responseText after the communication ends.

Based on iframe data stream

Since earlier versions of IE do not allow the responseText property of XHR to be obtained when the readyState is 3, in order to achieve the use of this technology on IE, an iframe-based data stream communication method has emerged. Specifically, it is dynamically loading an iframe in the browser, so that its src attribute points to the URL of the requested server. In fact, it sends an http request to the server, and then creates a function for processing data on the browser side. At the server level, the data is output to the Client through the long connection between the iframe and the browser, but the returned data is not general data, but a function similar to ‘< script type =\ "text/javascript" > parent.process (’ “+ randomNum.toString ()+”')</ script > 'The way the script is executed, the browser receives This data will be parsed into js code and find the specified function on the page to execute, in fact, the server level indirectly uses its own data to indirectly call the client’s code, to achieve the purpose of real-time update client.

Data stream communication based on htmlfile

A new problem has arisen again. In IE, the iframe is used to request the server level. Before the server level keeps the communication connection and does not all return, the browser title has been loaded, and the bottom also shows that it is loading. This is for a product. User Experience is not good, so Google’s geniuses came up with a hack method. That is, in IE, dynamically generate an htmlfile object, which is a com component in the form of ActiveX. It is actually an HTML doc implemented in memory. By adding the generated iframe to the HTMLfile in this memory, and using the data flow of the iframe The communication method achieves the above effect. At the same time, because the HTMLfile object is not directly added to the page, it does not cause the browser to display the phenomenon of loading.

SSE

In order to solve the problem that the browser can only unidirectional transmission of data to the server level, HTML5 provides a new technology called server push event SSE, which can realize the client request server level, and then the server level uses the communication connection established with the client to push data to the client, the client receives the data and processes it. From an independent point of view, SSE technology provides the function of pushing data from the server to the browser in one way, but with the active request of the browser, it actually realizes the two-way communication between the client and the server. Its principle is to construct an eventSource object in the Client, which has the readySate property, which is represented as follows:

0: Connecting to the server.
1: The connection is opened.
2: The connection is closed.

At the same time, the eventSource object will maintain a long connection with the server, and will be automatically reconnected after disconnection. If you want to force the connection, you can call its close method. It can listen to the onmessage event, the server level follows the format of SSE data transmission to the Client, and the Client can receive data when the onmessage event is triggered, so as to perform some processing

Browser Native API

WebSocket

In the above solutions, they are all hack technologies formed by combining the browser’s one-way request to the server or the server’s one-way push of data to the browser. In HTML5, in order to enhance the functionality of the web, websocket technology is provided. It is not only a web communication method, but also an application layer protocol. It provides native dual full-duplex cross-domain communication between the browser and the server. Through the establishment of a websocket connection (actually a TCP connection) between the browser and the server, it can achieve Client-to-server and server-to-client data transmission at the same time. For the principle of this technology, please refer to: “WebSocket Detailed Explanation (1): Preliminary Understanding of WebSocket Technology”, “WebSocket Detailed Explanation (2): Technical Principle, Code Demonstration and Application Cases”, “WebSocket Detailed Explanation (3): In-depth WebSocket Communication Protocol Details”, I will not go into details here, and give the code directly. Before looking at the code, you need to understand the entire working process of websocket.

The first is Client new, a websocket object, which will send an http request to the server level. The server level finds that this is a webscoket request, and will agree to the protocol conversion and send back a response with a 101 status code to the Client. The above process is called a handshake. After this handshake, the Client establishes a TCP connection with the server level. On this connection, the server level and the Client can communicate in two directions. At this time, the two-way communication at the application layer is the ws or wss protocol, which has nothing to do with http. The so-called WS protocol requires the Client and server levels to send data messages network packets (frames) in a certain format before the other party can understand.

Reference article:
Web端即时通讯技术盘点：短轮询、Comet、Websocket、SSE
新手入门贴：史上最全Web端即时通讯技术原理详解