JavaScript Execution Mechanism (9) Phase Summary

How Google V8 executes a piece of JavaScript code

This article is mainly a simple and unified summary of some old-fashioned JavaScript features, and a certain explanation of how V8 implements these features.

The main purpose is to introduce the execution flow of V8 and string together familiar features to form an overall understanding and why JavaScript is designed the way it is.

There will be some thoughts interspersed in the middle, such as some problems we should avoid in coding.

How to Execute JavaScript in V8

Two ways of high-level language execution

A piece of code must eventually become binary code that the CPU can recognize before it can run.

Therefore, the code of the high-level language must be compiled before it can be executed.

But there are two ways of execution, the first is interpretation execution and the other is compile execution.

Note that the interpreted execution and compiled execution here are not exactly the same as the usual interpreted language and compiled language: [Difference between interpreted language and compiled language

The so-called interpreted execution means that the parser parses the high-level language into intermediate code, and then the virtual machine that comes with the language simulates the characteristics of the CPU stack to execute this intermediate code.

img

The so-called compile execution means that the compiler compiles the intermediate code into binary code, and then directly hands it over to the CPU for execution.

img

It looks like we can only choose one, but V8 says that only children make choices, and I want them all.

V8 adopts a JIT (Just In Time) technology that mixes compiler and interpreter. This is a trade-off strategy, because each of these two methods has its own advantages and disadvantages. Explanation execution starts fast, but the execution speed is slow, while compile execution starts slowly, but the execution speed is fast. You can refer to the following complete flowchart of V8 executing JavaScript:

img

After V8 is started, Memory Space for the heap and stack will be applied from memory. The storage of the stack is contiguous, while the data storage in the heap can be discontinuous.

Then V8 will initialize the global context, which contains a lot of information, such as the variable environment of the current context and the variables in the global scope in this variable environment.

Initialize the event loop system. In fact, it is a message queue, and each function in this message queue is what we usually call a macro task.

Why does V8 use bytecode?

The so-called bytecode refers to the intermediate code in the compile process. You can think of bytecode as an abstraction of machine code. In V8, bytecode has two functions:

  • The first is that the interpreter can directly interpret the execution bytecode;
  • The second is that the optimization compiler can compile bytecode into binary code and then execute binary machine code.

Although the current architecture used bytecode, the early V8 was not designed in this way. At that time, the V8 team believed that this method of “generating bytecode and then executing bytecode” had an extra intermediate link, and the extra intermediate link would sacrifice the execution speed of the code.

In order to improve the execution speed of the code, the early V8 directly compiled the JavaScript source code into binary machine code without optimization. If a certain piece of binary code is executed too frequently, V8 will mark it as hot code, and the hot code will be Optimized by the compiler, the optimized machine code execution efficiency is higher.

When JavaScript code is executed in the browser, it needs to be compiled by V8 first. Early V8 would compile JavaScript into unoptimized binary machine code, and then execute these unoptimized binary code. Usually, compile occupies a large part of the time. The following is a graph of the compile and execution time of a piece of code:

img

As can be seen from the figure, the time consumed by compile and the time consumed by execution are about the same. Imagine if the same page is opened again in the browser, when the JavaScript file in the page is not modified, then after compiling again The binary code will also remain unchanged, which means that the compile step is a waste of CPU resources, because it has already been compiled once before.

This is why Chrome browsers introduced binary code caching, which eliminates redundant compiles by keeping binaries in memory and reusing them for subsequent calls, thus saving the time to compile again.

However, with the popularity of mobile devices, the V8 team gradually discovered that there are two fatal problems in compiling JavaScript source code directly into binary code.

  • Time problem: compiling time is too long, affecting the code startup speed;
    Space issues: The compiled binary code takes up more memory after caching.

These two problems will undoubtedly hinder the popularity of V8 on mobile devices, so the V8 team refactored the code on a large scale and introduced bytecode in the middle. The advantages of bytecode are as follows:

  • Solve the startup problem: the time to generate bytecode is very short;
  • Solve the space problem: bytecode does not occupy much memory, and caching bytecode will greatly reduce memory usage;
  • Clear code architecture: Using bytecode can simplify the complexity of the program and make it easier to port V8 to different CPU architecture platforms.

Variables: How to store and find them quickly

Scope promotion: If there is no block-level context, then no block-level scope is designed

Javascript执行机制(一)变量提升

The reason for the scope increase is that JavaScript takes a short time to create, and only the global scope and function scope are relatively simple for compile.

Why do you say that? We can take a look at the stack to understand that there are only global context and function context in the JavaScript call stack, which can naturally be mapped to global scope and function scope.

Without a block-level context, a block-level scope is not designed.

Since the scope of this variable is at the function level, the entire function must be accessible, so the variable is promoted

“Variable promotion” means that the declarations of variables and functions are physically moved to the front of the code, as we simulated. However, this is not accurate. ** In fact, the position of variable and function declarations in the code does not change, and they are put into memory by the JavaScript engine during the compile stage **.

https://res.cloudinary.com/dvtfhjxi4/image/upload/v1615119851/origin-of-ray/微信截图_20210307201954_os3t8h.png

Scope: Add block-level scope, how to find variables in the current context

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function foo(){
var a = 1
let b = 2
{
let b = 3
var c = 4
let d = 5
console.log(a)
console.log(b)
}
console.log(b)
console.log(c)
console.log(d)
}
foo()

img

Javascript执行机制(二)作用域

Scope chain: How to find variables in the call stack if they cannot be found in the current context

1
2
3
4
5
6
7
8
9
function bar() {
console.log(myName)
}
function foo() {
Var myName = "big tree"
bar()
}
var myName = "bigTree"
foo()

After finding a variable, how to find a property on a variable itself?

Javascript执行机制(七)如何快速查找对象上的属性

Prototype chain: If the variable itself does not have this property, where to find it

Javascript执行机制(五)用公式讲清楚原型链

1
2
function Parent() {}
const p1 = new Parent();

const a = new A(); 那么 a.__proto__ = A.prototype(Object.prototype.__proto__ = null special)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
const p1 = new Parent(); // 因为
p1.__proto__ = Parent.prototype; // 所以

const Parent = new Function();
Parent.__proto__ = Fuction.prototype;

Function = new Object();
Function.__proto__ = Object.prototype

All prototypes are ordinary objects, that is, they come from new Object ()
Parent.prototype.__proto__ = Object.prototype;
Function.prototype.__proto__ = Object.prototype;

Because Object, Function can be placed after new, it is also a constructor function, that is, Object = new Function ()
Object.__proto__ = Function.prototype
Function.__proto__ = Function.prototype

Thinking: What code security issues will the prototype chain bring?

前端原型链污染漏洞竟可以拿下服务器shell?

Where do variables exist? heap? stack?

Ordinary types exist in the call stack, objects, functions exist in the heap???

If normal types exist on the stack, how can closures be implemented?

This

This is a pointer that stores the address of the variable, just like a normal object variable stores the address of the object in the heap

Javascript执行机制(四)this指向的是什么?

Garbage collection mechanism

The first step is to determine whether the objects in the heap are active objects through the “reachability” algorithm currently used by V8 for active and inactive objects in the GC Root markup space.

The second step is to reclaim the memory occupied by the inactive object. In fact, after all the markers are completed, all the objects marked as recyclable in the memory are cleaned up uniformly.

The third step is to do memory defragmentation. Generally speaking, after frequent collection of objects, there will be a large amount of discontinuous space in memory. We call these discontinuous Memory Spaces memory fragments. When a large number of memory fragments appear in the memory, if a large contiguous memory needs to be allocated, there may be insufficient memory, so the last step is to defragment these memory fragments. But this step is actually optional, because some garbage collectors do not generate memory fragments, such as the secondary garbage collector we will introduce next.

V8 currently uses two garbage collectors, the main garbage collector - Major GC and the secondary garbage collector - Minor GC (Scavenger). The reason why V8 uses two garbage collectors is mainly influenced by The Generational Hypothesis.

  • The first is that most objects are “live and die”, which means that most objects live in memory for a short time, such as variables declared inside a function, or variables in a block-level scope. When the function or Code Block ends, the variables defined in the scope will be destroyed. Therefore, once this type of object is allocated memory, it quickly becomes inaccessible;
  • The second is immortal objects that will live longer, such as global window, DOM, Web API, etc.

Therefore, in V8, the heap will be divided into two areas: the new generation and the old generation. The objects stored in the new generation are objects with short survival time, and the objects stored in the old generation have long survival time.

The new generation usually only supports 1~ 8M capacity, while the old generation supports much larger capacity. For these two regions, V8 uses two different garbage collectors to implement garbage collection more efficiently.

Secondary garbage collector - Minor GC (Scavenger), mainly responsible for garbage collection of the new generation. Main garbage collector - Major GC, mainly responsible for garbage collection of the old generation.

Javascript执行机制(八)垃圾回收机制

Thinking: Several common memory problems

Memory problems can be defined in the following three categories:

  • Memory leak, which can cause the performance of the page to deteriorate.
  • Memory bloat, which can cause page performance to remain poor;
    Frequent garbage collection, which can cause pages to be delayed or paused frequently.

Memory leak

Essentially, a memory leak can be defined as: when the process no longer needs some memory, the memory that is no longer needed is still not reclaimed by the process. In JavaScript, the main cause of memory leaks is that memory data that is no longer needed (useless) is still referenced by other objects.

  1. Mount data to window
1
2
3
4
5
6
7
function foo() {
Create a temporary temp_array
temp_array = new Array(200000)
/**
* Use temp_array
*/
}

When this code is executed, since the object in the function body is not declared with the keywords var, let, const, V8 will replace temp_array with this. temp_array.

In browsers, by default, this points to the window object, and the window object is resident in memory, so even if the foo function exits, the temp_array is still referenced by the window object, so temp_array will still be the same as the window object, will be resident in memory. Because temp_array is no longer used object, but is still referenced by the window object, which causes temp_array leakage.

  1. closure
1
2
3
4
5
6
7
8
9
10
11
12
function foo(){  
var temp_object = new Object()
temp_object.x = 1
temp_object.y = 2
temp_object.array = new Array(200000)
/**
* Use temp_object
*/
return function(){
console.log(temp_object.x);
}
}

As you can see, foo function uses a local temporary variable temp_object, temp_object object has three properties, x, y, and a very memory-intensive array property. Finally, foo function returns an anonymous function that references temp_object. Then after calling foo function, because the returned anonymous function references temp_object in foo function, this will cause temp_object to be destroyed, even if it only references temp_object, it will cause the entire temp_object object to remain in memory.

To solve this problem, I need according to the actual situation, to determine the closure of the function returned in the end need to reference what data, do not need to reference the data will never reference, because the above example, the return function only needs temp_object value, so we can transform this code:

1
2
3
4
5
6
7
8
9
10
11
12
13
function foo(){  
var temp_object = new Object()
temp_object.x = 1
temp_object.y = 2
temp_object.array = new Array(200000)
/**
* Use temp_object
*/
let closure = temp_object.x
return function(){
console.log(closure);
}
}
  1. detached "Node

Due to memory leaks caused by JavaScript referencing DOM nodes, a DOM node will only be garbage collected if both the DOM tree and JavaScript code do not reference it. If a node has been removed from the DOM tree, but JavaScript still references it, we call this node “detached”.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
let detachedTree;
function create() {

var ul = document.createElement('ul');

for (var i = 0; i < 100; i++) {

var li = document.createElement('li');

ul.appendChild(li);

}

detachedTree = ul;

}

create()

Memory inflation

There are some differences between memory expansion and memory leakage. Memory expansion is mainly manifested in the unscientific memory management of programmers. For example, only 50M memory is needed, but some programmers spend 500M memory.

Excessive use of additional memory may be due to insufficient utilization of the cache, or it may load some unnecessary resources. It is usually manifested as a rapid growth of memory for a certain period of time, and then reaches a stable peak to continue running.

Frequent garbage collection

In addition to memory leaks and memory inflation, there is another type of memory problem, that is, frequent use of large temporary variables, resulting in the new generation space being filled quickly, which frequently triggers garbage collection. Frequent garbage collection operations will make you feel the page Stuttering. For example, the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
function strToArray(str) {
let i = 0
const len = str.length
let arr = new Uint16Array(str.length)
for (; i < len; ++i) {
arr[i] = str.charCodeAt(i)
}
return arr;
}


function foo() {
let i = 0
let str = 'test V8 GC'
while (i++ < 1e5) {
strToArray(str);
}
}


foo()

This code will frequently create temporary variables, which will soon cause the new generation of memory to fill up, which will frequently trigger garbage collection. To solve the problem of frequent garbage collection, you can consider setting these temporary variables to variables in the upper scope.

Function

Why functions are first class citizens in JavaScript.

If a function in a programming language can do the same thing as a data type in that language, we call the function in that language a first-class citizen.

  • function as an object, it has its own properties and values, so function is associated with the underlying properties and values;
  • The reason why function is a special object is that function can be “called”, so when a function is called, it also needs to be associated with the relevant execution context.

Based on the design of function as a first-class citizen, JavaScript is very easy to implement some features, such as closures, and functional programming, which are more difficult to implement in other languages.

Delayed resolution and closure

Delay resolution

During the process of compiling JavaScript code, V8 does not parse all JavaScript into intermediate code at once. This is mainly based on the following two points:

  • first of all, if all the JavaScript code is parsed and compiled at once, too much code will increase the compile time, which will seriously affect the speed of executing the JavaScript code for the first time and make users feel stuck. Because sometimes the JavaScript code of a page has more than 10 Megabytes, if all the code is parsed and compiled at one time, it will greatly increase the waiting time of users;
  • Secondly, the parsed bytecode and compiled machine code will be stored in memory. If all JavaScript codes are parsed and compiled at once, these intermediate codes and machine codes will always occupy memory, especially in the popularity of mobile phones. In the era, memory is a very valuable resource. Based on the above reasons, all mainstream JavaScript virtual machines have implemented lazy parsing. The so-called lazy parsing means that during the parsing process, if the parser encounters a function declaration, it will skip the code inside the function, and will not generate AST and bytecode for it, but only generate AST and bytecode of the top-level code
1
2
3
4
5
6
7
8
function foo(a,b) {
var d = 100
var f = 10
return d + f + a + b;
}
var a = 1
var c = 4
foo(1, 5)

When this piece of code to V8 processing, V8 will parse this code from top to bottom, in the parsing process will first encounter foo function, since this is just a function declaration statement, V8 at this stage only need to convert the function into a function object, as shown in the following figure: Note that here is only the function declaration into a function object, but did not parse and compile the internal code of function, so it will not generate a Syntax Tree for the internal code of foo function.

After the code parsing is completed, V8 will execute the executable code from top to bottom in order. First, it will execute the two assignment expressions “a = 1” and “c = 4”, and then execute the call of foo function. The process is to extract the function code from the foo function object. Then, like compiling the top-level code, V8 will compile the code of foo function first. When compiling, it also needs to be compiled into Syntax Tree and bytecode, and then interpreted and executed.

Basis of closure: functioning as first-class citizens

JavaScript language allows defining new functions inside functions.

Variables defined in parent function can be accessed in inner function

Because function is a first-class citizen, function can be used as a return value

Thinking: How to achieve closure under delayed analysis

1
2
3
4
5
6
7
8
function foo() {
var d = 20
return function inner(a, b) {
const c = a + b + d
return c
}
}
const f = foo()

We can analyze the execution process of the above code:

When foo function is called, foo function returns its inner function to the global variable f.

  • Then the execution of the foo function ends and the execution context is destroyed by V8;
    Although the execution context of the foo function is destroyed, the inner function that is still alive references the variable d in the scope of the foo function.

Therefore, if the variable of closure is stored in the call stack, it will be destroyed when the function call ends

To solve this problem, we need the preparser + to copy variables from the stack to the heap

Event

Prior knowledge:

Process, thread, coroutine

Electron 多进程方案

Message queue (macro task)

Micro-tasks (solved the problem of uncontrollable macro task execution timing)

Macro tasks need to be placed in the message queue first. If the execution time of some macro tasks is too long, it will affect the execution of the macro tasks behind the message queue, and this impact is uncontrollable because you cannot know the previous How long does the macro task take to execute? So JavaScript introduced microtasks, which will be executed when the current task is about to be executed. Using microtasks, you can control the execution timing of your callback function more accurately.

V8 maintains a microtask queue for each macro task. When V8 executes a piece of JavaScript, an environment object is created for this piece of code, and the microtask queue is stored in the environment object. When you generate a microtask through Promise.resolve, the microtask will be automatically added to the microtask queue by V8. When the entire code is about to be executed, the environment object will also be destroyed. However, before destruction, V8 will first process the microtasks in the microtask queue.

  • First of all, if a microtask is generated in the current task, it will be triggered by Promise.resolve () or Promise.reject (), and the triggered microtask will not be executed in the current function, so when executing the microtask, it will not lead to infinite expansion of the stack;
    Secondly, unlike asynchronous calls, microtasks will still be executed before the current task execution ends, which means that other tasks in the message queue cannot be executed before the current microtask execution ends.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
function bar(){
console.log('bar')
Promise.resolve().then(
(str) =>console.log('micro-bar')
)
setTimeout((str) =>console.log('macro-bar'),0)
}


function foo() {
console.log('foo')
Promise.resolve().then(
(str) =>console.log('micro-foo')
)
setTimeout((str) =>console.log('macro-foo'),0)

bar()
}
foo()
console.log('global')
Promise.resolve().then(
(str) =>console.log('micro-global')
)
setTimeout((str) =>console.log('macro-global'),0)

1
2
3
4
5
6
7
8
9
foo
bar
global
micro-foo
micro-bar
micro-global
macro-foo
macro-bar
macro-global

Advance of asynchronous syntax

JavaScript 异步编程语法

Think: Many APIs in Node provide synchronous and asynchronous functions. What is the difference?

Take readFile as an example

1
2
3
4
5
6
7
8
var fs = require('fs')

var data = fs.readFileSync('test.js')
function fileHanlder(err, data){
data.toString()
}

fs.readFile('test.txt', fileHanlder)

Read files asynchronously

For example, when executing readFile on the main thread of Node, the main thread will submit the file name and callback function of readFile to the file reading and writing thread for processing. The specific process is as follows:

After the file reading and writing thread completes the file reading, it will encapsulate the result and callback function as new events and add them to the message queue. For example, the file thread stores the read file content in memory and points the data pointer to the memory, and then the file reading and writing thread will encapsulate the data and callback function as new events and throw them into the message queue

Synchronous read file

However, there are always some people who think that asynchronous read and write file operations are too complicated. If the read file is not large or the bottleneck of the project is not in the file reading and writing, then the mode of still using asynchronous calls and callback functions is a bit overly complicated. Therefore, Node also provides a set of synchronous read and write APIs. The readFileSync in the first code is implemented synchronously. The synchronization code is very simple. When libuv reads the readFileSync task, it performs the read and write operations directly on the main thread, waits for the end of the read and write, and returns the result of the read and write directly. This is also an application of synchronous callbacks. Of course, during the reading and writing process, other tasks in the message queue cannot be executed.

Summary

Implementation process

  • Before executing a piece of code, V8 will first initialize the running environment, such as initializing the stack space, message queue, etc.
  • first compile the entire code to generate AST, after generating AST:
  • Promote the variable declared by var and place it in the variable environment of the global context.
  • For let, const declared variables are added to the lexical environment of the global context
  • If the variable is an ordinary variable, it is stored directly in the call stack. If it is an object or function, it is stored on the heap and a pointer is stored in the context of the heap. If it is a function, it is not directly generated in the context, but simply pre-parsed. If there is a closure situation, then copy the variable to the heap.
  • Throw the entire function into the message queue as a macro task, start the executable code, if you encounter a function, parse the function, generate the context on the stack, and continue to perform the above three steps, but replace the global context with the function context.
  • If a macro task callback is encountered during function execution, put it at the end of the message queue. If a microtask callback is encountered, put it in the microtask queue of the global context.
  • The global up and down is executed before exiting, and the function in the microtask will be continuously removed.
  • After the microtask is executed, the global context is destroyed and the next macro task execution is removed from the message queue.

Coding attention

  • Do not easily change the structure of the object
  • Pay attention to memory issues