JavaScript Execution Mechanism (8) Garbage Collection Mechanism

As we all know, JavaScript is a language for automatic garbage collection, which means that we don’t need to manually collect garbage data. All of this is done by V8’s garbage collector. In order to collect garbage more efficiently, V8 introduced two garbage collectors, each targeting different scenarios.

Let’s analyze the usage scenarios and working principles of these two recycling mechanisms together.

How is junk data generated?

For example, the following code:

1
2
3
function gc() {
const a = new Object();
}

This code will first create the function execution context on the stack, then create a new object on the heap, and then store the pointer to that object in the context of’gc '.

Then when the’gc 'function exits, the a in the stack is deleted, and the object corresponding to the pointer stored in the a will become garbage data in the heap, which is invalid.

From this example, we can also see that the garbage collection algorithm targets the heap.

Garbage collection algorithm steps

So how is garbage collection achieved? It can be roughly divided into the following steps:

The first step is to tag the active objects and inactive objects in the GC Root space.

The current reachability algorithm used by V8 determines whether objects in the heap are active objects. Specifically, this algorithm uses some GC Roots as a set of initially viable objects, starting from the GC Roots object, and traversing all objects in the GC Root:

This is a bit like multi-source BFS/DFS. Starting from multiple root nodes, search the entire tree. If it can be searched, it means that the corresponding memory address is still in use, and the remaining space in the heap is unused.

  • Through the GC Root traverse to the object, we consider the object is accessible (reachable), then we must ensure that these objects should be retained in memory, we also call the accessible object as the active object;

  • Objects that are not traversed by GC Roots are unreachable, so these unreachable objects may be reclaimed. We call inaccessible objects inactive objects.

In the browser environment, there are many GC Roots, usually including the following (but more than these):

  • the global window object (located in each iframe);

  • doc DOM tree, consisting of all native DOM nodes that can be reached by traversing the doc;

  • Store variables on the stack.

The second step is to reclaim the memory occupied by the inactive object. In fact, after all the marks are completed, all the objects marked as recyclable in the memory are cleaned up uniformly.

** The third step ** is to do memory defragmentation. Generally speaking, after frequent collection of objects, there will be a large amount of discontinuous space in memory. We call these discontinuous Memory Spaces memory fragments. When a large number of memory fragments appear in the memory, if a large contiguous memory needs to be allocated, there may be insufficient memory, so the last step needs to ** defragment these memory fragments **. But this step is actually optional, because some garbage collectors do not generate memory fragments, such as the secondary garbage collector we will introduce next.

Two types of garbage collectors

V8 currently uses two garbage collectors, ** the main garbage collector - Major GC and the secondary garbage collector - Minor GC (Scavenger) **. The reason why V8 uses two garbage collectors is mainly influenced by The Generational Hypothesis

The intergenerational hypothesis is an important term in the field of garbage collection, which has the following two characteristics:

  • The first is that most objects are “live and die”, which means that most objects live in memory for a short time, such as variables declared inside a function, or variables in a block-level scope. When the function or Code Block ends, the variables defined in the scope will be destroyed. Therefore, once this type of object is allocated memory, it quickly becomes inaccessible;

  • The second is immortal objects that will live longer, such as global window, DOM, Web API, etc.

For those objects with short lifetime, due to their short lifetime characteristics, we need to recycle them frequently, so we first do not need to apply for much space for such objects, because they will be recycled soon, and secondly, Due to frequent recycling, if you have to defragment the memory every time, it will consume performance.

In view of these two points, we need to apply a small Memory Space for this variable, called the new generation, the new generation of garbage collector, is the secondary collector, which is characterized by not generating memory fragments.

For objects that survive for a long time, a large Memory Space is needed, called the old generation. The garbage collector of the old generation is the main collector, which is characterized by memory fragmentation and requires memory sorting, but because it is almost Will not die, so there is almost no need to sort out memory.

Secondary garbage collector

The secondary garbage collector is mainly responsible for garbage collection of the new generation. Usually, most small objects are allocated to the new generation, so although this area is not large, garbage collection is still relatively frequent. The garbage data in the new generation is processed by the Scavenge algorithm. The so-called Scavenge algorithm divides the new generation space into two regions in half, ** half is the object area (from-space) and half is the free area (to-space) **, as shown in the figure below:

Newly added objects will be stored in the object area. ** When the object area is almost full, a garbage cleaning operation needs to be performed. **

In the garbage collection process, the garbage in the object area must first be marked; after the marking is completed, it enters the garbage cleaning stage. The secondary garbage collector will copy these surviving objects to the free area, and it will also arrange these objects in an orderly manner, so this copying process is equivalent to completing the memory sorting operation, and there will be no memory fragmentation in the free area after copying.

After copying, the object area and the free area perform role flipping, that is, the original object area becomes the free area, and the original free area becomes the object area. This completes the garbage object collection operation, and at the same time, this role reversal operation can also make these two areas in the new generation infinitely reused.

Each time the sub-garbage collector performs a clean-up operation, it is necessary to copy the surviving object from the object area to the free area. The copy operation requires time cost. If the space in the new area is set too large, the time for each clean-up will be too long. Therefore, in order to perform efficiently, the space in the general new area will be set relatively small.

It is precisely because the space in the new area is not large, so it is easy to fill the entire area with surviving objects. Once the secondary garbage collector is full of monitoring objects, it will perform garbage collection. At the same time, the secondary garbage collector will also use the Object Promotion Strategy, which is to move objects that are still alive after two garbage collections to the old generation.

Main garbage collector

The main garbage collector is mainly responsible for garbage collection in the old generation. In addition to objects promoted in the new generation, some large objects are directly assigned to the old generation. Therefore, objects in the old generation have two characteristics:

  • One is that the object occupies a large space;

  • The other is that the object survives for a long time.

Since the objects of the old generation are relatively large, if you want to use the Scavenge algorithm for garbage collection in the old generation, it will take more time to copy these large objects, resulting in inefficient collection execution and wasting half of the space. Therefore, the main garbage collector uses the Mark-Sweep algorithm for garbage collection.

The first is the marking process stage. The marking stage starts from a set of root elements, and recursion traverses this set of root elements. During this traversal process, the elements that can be reached are called active objects, and the elements that cannot be reached can be judged as garbage data.

The next step is the garbage removal process. It is completely different from the garbage removal process of the secondary garbage collector. The main garbage collector will directly clean up the data marked as garbage.

Mark the garbage data and then clear it. This is the mark-clear algorithm. However, after executing the mark-clear algorithm multiple times on a piece of memory, a large number of discontinuous memory fragments will be generated. Too much fragmentation will cause large objects to fail to allocate enough contiguous memory, so another algorithm is introduced - Mark-Compact.

The labeling process of this algorithm is still the same as in the labeling-clearing algorithm. The recyclable object is marked first, but the subsequent steps are not to clean up the recyclable object directly, but to move all the surviving objects to one end, and then clean up directly. Remove the memory outside this end. You can refer to the following figure:

How to improve garbage collection efficiency with V8

Since JavaScript runs on the main thread, once the garbage collection algorithm is executed, it is necessary to pause the executing JavaScript script and resume the script execution after the garbage collection is completed. We call this behavior Stop-The-World.

A complete garbage collection is divided into two stages: marking and cleaning. After the garbage data is marked, V8 will continue to perform cleaning and sorting operations. Although the main garbage collector and the secondary garbage collector have slightly different processing methods, they are all main threads. During the garbage collection process, other tasks on the main thread will be suspended. The specific execution effect of the full pause is shown in the following figure:

It can be seen that executing garbage collection will occupy the time of the main thread. If the garbage collector occupies the main thread for too long during the process of executing garbage collection, as shown in the picture above, it takes 200 milliseconds. During these 200 milliseconds, the main thread cannot do other things. For example, the page is executing a JavaScript animation, and because the garbage collector is working, the animation cannot be executed within these 200 milliseconds, resulting in poor page Jank and poor user experience.

In order to solve the User Experience problem caused by the full pause, the V8 team has worked hard for many years to add garbage collection technologies such as parallelism, concurrency, and increment to the existing garbage collector, and has also achieved some results. These technologies mainly solve the garbage collection efficiency problem from two aspects:

  • First, split a complete garbage collection task into ** multiple small tasks **, thus eliminating a single long garbage collection task;

  • second, transfer tasks such as marking objects and moving objects to the background thread, which will greatly reduce the time of the main thread to pause, improve the problem of page Stuttering, and make animation, Scrolling and user interaction smoother.

Parallel recovery

Since it is time-consuming to perform a complete garbage collection process, the first idea to solve the efficiency problem is to introduce multiple auxiliary threads to process in parallel when the main thread performs the task of garbage collection, which will accelerate the execution speed of garbage collection. Therefore, the V8 team introduced a parallel collection mechanism

The so-called parallel collection means that during the execution of the garbage collector on the main thread, it will also open multiple assistant threads to perform the same collection work at the same time.

When using parallel collection, the time consumed by garbage collection is equal to the time consumed by the total helper threads (the number of helper threads multiplied by the time consumed by individual threads), plus some synchronization overhead time. This approach is relatively simple because during the execution of garbage marking, the main thread does not execute JavaScript code at the same time, so the JavaScript code will not change the collection process. ** So we can assume that the memory state is static **, so just make sure that only one helper thread is accessing the object at the same time.

** V8’s secondary garbage collector adopts a parallel strategy **. During the process of executing garbage collection, it starts multiple threads to be responsible for garbage cleaning operations in the new generation. These threads also move the data in the object space to the free area. Since the address of the data has changed, the pointers that refer to these objects need to be updated synchronously.

Incremental recycling (dismantling tasks)

Although the parallel strategy can increase the efficiency of garbage collection and optimize the secondary garbage collector well, it is still a full-stop garbage collection method. The auxiliary thread will only be started when the main thread executes the collection work, which will still exist. Efficiency issues. For example, the old generation stores some large objects, such as window and DOM, and it will still take a long time to fully execute the garbage collection of the old generation. These large objects are all part of the main garbage collector, so in 2011, V8 introduced incremental markup, which we call incremental garbage collection.

The so-called incremental garbage collection refers to the garbage collector breaking the labeling work into smaller blocks and interspersing it between different tasks of the main thread. When using incremental garbage collection, it is not necessary for the garbage collector to perform the complete garbage collection process at once, but only a small part of the entire garbage collection process is executed each time.

Incremental markup algorithm is slightly more complex than the full pause algorithm, mainly because incremental collection is concurrent (concurrent). To achieve incremental execution, two requirements need to be met:

  • Garbage collection can be paused and restarted at any time. ** When paused, you need to save the scan results at that time **, and you can continue to start it after the next wave of garbage collection comes.

  • During the pause, marked garbage data needs to be handled correctly by the garbage collector if it has been modified by JavaScript code.

Three-color mark

Here we need to know that V8 uses black and white to label data before using the incremental algorithm. Before performing a full garbage collection, the garbage collector will set all data to white to indicate that it has not been marked yet, and then the garbage collector will start from GC Roots and mark all accessible data as black. After the traversal, the data marked black is the active data, and the white data is the garbage data.

If the data in memory has only two states, black and white, then when you pause the current garbage collector and resume the garbage collector again, the garbage collector will not know where to continue execution. For example, after the garbage collector performs a short incremental collection, it is paused by V8, and then the main thread executes a piece of JavaScript code, and then the garbage collector is restored, so the memory state when it resumes is shown in the following figure

** If you simply think of continuing to execute the marker from black, which black node should you start from? ** From A? B? Or other black nodes, or even start from scratch?

In order to solve this problem, V8 adopts a three-color marking method, in addition to black and white, gray is also introduced:

  • Black indicates that this node is referenced by GC Root, and the sub-nodes of this node have been marked

  • Gray indicates that this node is referenced by GC Root, but the sub-node has not been marked for processing by the garbage collector, and it also indicates that this node is currently being processed;

  • White indicates that this node has not been visited. If it is still white at the end of this round of traversal, then this piece of data will be recovered.

After introducing the gray mark, the garbage collector can judge whether the entire mark is complete based on whether there are gray nodes in the current memory. If there are no gray nodes, the cleaning work can be carried out. If there are still gray marks, the next time the garbage collector is restored, execution will continue from the gray node.

Write barrier

Simple three-color markers are actually still a bit problematic. For example, the following code:

1
2
3
window.a = Object()
window.a.b = Object()
window.a.b.c=Object()

Then another code was executed, which is as follows:

1
window.a.b = Object() //d

After execution, the garbage collector resumes the incremental marking process. Since b repoints to the d object, the connection between the b and c objects is disconnected. The application of the code at this time is shown in the following figure:

This shows a problem. When the garbage collector marks a node as black, and then the black node is followed by a white node, then the garbage collector will not mark the white node as a black node again, because It has already traveled this path. But this new white node is indeed referenced, so we still need to find a way to mark it as black.

To solve this problem, the incremental garbage collector adds a constraint: black nodes cannot point to white nodes. Usually we use the write-barrier mechanism to implement this constraint, that is, when a black node references a white node, the write barrier mechanism will force the referenced white node to turn gray, thus ensuring that black Nodes cannot point to white nodes constraints.

This method, also known as strong tricolor invariance, guarantees that the garbage collector can correctly recycle data, because all white objects at the end of the marker are unreachable to the garbage collector and can be safely released.

Comprehensive garbage collection strategy for V8

It can be seen that the main garbage collector uses these three strategies at the same time.

  • First of all, the main garbage collector mainly uses concurrent tags. We can see that when the main thread executes JavaScript, the auxiliary thread starts to execute the tagging operation, so the tagging is done in the auxiliary thread.

  • After marking is complete, perform parallel cleanup operations. While the main thread is performing cleanup operations, multiple worker threads are also performing cleanup operations.

In addition, the main garbage collector also uses incremental markup, and the cleaning task will be interspersed between various JavaScript tasks.