Unity Memory Best Practices

Recently, I found that the Unity project consumes a lot of memory during operation, and there will be frequent GC causing the game’s CPU to overheat. So I went to read the official best practice doc, learned some knowledge of Unity’s memory management, and summarized it a little.

What is the concept of escrow

When understanding Unity memory management, I often hear it once, called managed heap, and I often hear some words before, such as managed code, managed thread, so what does this managed mean?

In layman’s terms, Unity itself is written in C ++, the reason why you can write scripts in C # is because the bottom of a virtual machine, the virtual machine code, memory is managed.

C #has its own memory collection mechanism, so in C #we can only new, do not care how to delete, C #uses gc to clean up the memory, this part of the memory is managed memory, most of the time we work in the C #environment, are using managed memory, but C #after all runs on C ++, sometimes, (for example, we may need to introduce some third-party C ++ or native code library, in the Unity3d development is very common) we need to directly Manipulate unmanaged code in C #, these non-managed memory we need to handle their application and release C # provides some interfaces to complete the conversion between managed and unmanaged, as well as operations on this part of memory.

For example, if we create a new object in C #without deleting, but if we call Resources. Load to load a resource, we need to manually unload it.

Managed heap

Another common problem faced by many Unity developers is unexpected expansion of the managed heap. In Unity, expansion of the managed heap is much easier than contraction. In addition, Unity’s garbage collection strategy tends to fragment memory and therefore may prevent contraction of large heaps.

A “managed heap” is a piece of memory that is automatically managed by the memory manager of the project script runtime (Mono or IL2CPP). All objects created in managed code must be allocated on the managed heap (__ note: __ strictly speaking, all non-null reference type objects and all boxed value type objects must be allocated on the managed heap).

In the above figure, the white boxes represent the amount of memory allocated to the managed heap, while the colored boxes in it represent the data values stored in the managed heap’s Memory Space. When more values are needed, more space will be allocated from the managed heap.

The garbage collector runs periodically (__ note: __ depending on the platform). All objects on the heap are scanned, and any that are no longer referenced are marked for deletion. Unreferenced objects are then deleted, freeing up memory.

Crucially, Unity’s garbage collection (using the Boehm GC algorithm) is non-generational and non-compressed. “Non-generational” means that GC must scan the entire heap when performing each pass of collection, so its performance degrades as the heap expands. “Non-compressed” means that memory addresses are not reallocated for objects in memory to eliminate gaps between objects.

The generation here should be the generation of the new generation and the generation of the old generation. There are two algorithms for garbage collection in JS. For the temporary variables, the generational garbage collection algorithm is used. If you want to know more, you can see my other blog: https://sunra.top/posts/29695/

The image above shows an example of memory fragmentation. When an object is freed, its memory is freed. However, the freed space is not consolidated as part of the overall pool of “available memory”. Objects located on either side of the freed object may still be in use. Therefore, the freed space becomes a “gap” between other memory segments (this gap is indicated by the red circle in the image above). Therefore, the newly freed space can only be used to store data for objects of the same size or smaller as the freed one.

When allocating objects, please note that the allocation address of the object in the Memory Space must always be a contiguous space block.

This leads to the core problem of memory fragmentation: although the total amount of available space in the heap may be large, there may be small “gaps” between some or all of the allocatable space objects. In this case, even if the total amount of available space is higher than the amount of space to be allocated, the managed heap may not find a contiguous block of memory large enough to meet the allocation requirements.

However, if a large object is allocated, but there is not enough contiguous free space to accommodate the object (as shown above), the Unity memory manager will perform two operations.

First, run the garbage collector if it is not already running. This tool attempts to free up enough space to satisfy the allocation request.

** If after GC runs, there is still not enough contiguous space to meet the requested amount of memory, the heap must be expanded. The exact amount of expansion of the heap depends on the platform; however, most Unity platforms will double the size of the managed heap. **

There are two core issues with managed heap scaling:

  • Unity does not often free the memory pages allocated to the managed heap after it expands the managed heap; it optimistically preserves the expanded heap, even when most of the heap is empty. This is to prevent the need to re-expand the heap when a large allocation occurs again.

  • On most platforms, Unity will eventually release the pages used by the empty portion of the managed heap back to the operating system. The interval between this behavior is indeterminate, so don’t expect to free memory this way.

The address space used by the managed heap is never returned to the operating system.

  • For 32-bit programs, if the managed heap expands and contracts multiple times, it may cause address space exhaustion. If a program’s available memory address space is exhausted, the operating system will terminate the program.

For 64-bit programs, the address space is large enough to run programs that exceed the average human lifespan, so it is extremely unlikely that the address space will be exhausted.

Temporary allocation

Many Unity projects have tens or hundreds of KB of temporary data allocated to the managed heap per frame. This situation is often extremely detrimental to the performance of the project. Consider the following math:

If a program allocates one kilobyte (1 KB) of temporary memory per frame and runs at a rate of 60 frames per second, it must allocate 60 KB of temporary memory per second. In one minute, this adds 3.6 MB of garbage to memory. Calling the garbage collector once per second may have a detrimental impact on performance, but allocating 3.6 MB of memory per minute is a problem for devices with insufficient memory.

In addition, consider load operations. If a large number of temporary objects are generated during a large resource load operation, and references to these objects persist until the operation is completed, the garbage collector cannot release these temporary objects, and the managed heap needs to be expanded, even if many of the objects it contains will be released shortly after.

Basic memory saving methods

In fact, this part of the content is not only applicable to C #, many methods and JS write the same precautions, such as the reuse of sets and arrays, if you need to repeatedly use an auxiliary array in a loop, it is better to apply for memory outside the loop, and then reuse within the loop to prevent frequent GC; there are empty methods, empty array reuse, if you need to return an empty array in many cases, it is better to define an empty array in advance, in all cases All return this empty array; there is also a property of the object that is frequently accessed in the loop, it is best to declare a new variable outside the loop Point to this property, so you don’t need to look for it every time you loop.

Collection and Array Reuse

When using a C # collection class or array, consider reusing or pooling the allocated collection or array whenever possible. The collection class exposes a Clear method that eliminates the values within the collection, but does not free up the memory allocated to the collection.

1
2
3
4
5
6
7
8
9
10
11
void Update() {

List<float> nearestNeighbors = new List<float>();

findDistancesToNearestNeighbors(nearestNeighbors);

nearestNeighbors.Sort();

//... use a sorted list in some way...

}

This is especially useful when allocating a temporary “helper” collection for complex calculations. The following code is a very simple example:

In this example, in order to collect a set of Data Points, memory is allocated once per frame for the nearestNeighbors List. It is very simple to promote this List from the method to the containing class, which avoids allocating memory for the new List every frame:

1
2
3
4
5
6
7
8
9
10
11
12
13
List<float> m_NearestNeighbors = new List<float>();

void Update() {

m_NearestNeighbors.Clear();

findDistancesToNearestNeighbors(NearestNeighbors);

m_NearestNeighbors.Sort();

//... use a sorted list in some way...

}

In this version, the memory of the List is reserved and reused between multiple frames. New memory is allocated only when the List needs to be expanded.

Closure and anonymity methods

There are two things to note when using closures and anonymous methods.

First of all, all method references in C # are reference types, so allocations are made on the heap. Temporary memory can be easily allocated by passing a method reference as an argument. This allocation occurs regardless of whether the method passed is an anonymous method or a predefined method.

Second, after converting an anonymous method to a closure, the amount of memory required to pass the closure to the method receiving the closure increases significantly.

Please refer to the following code:

1
2
3
4
5
6
7
List<float> listOfNumbers = createListOfRandomNumbers();

listOfNumbers.Sort( (x, y) =>

(int)x.CompareTo((int)(y/2))

);

This code uses a simple anonymous method to control the sorting order of the list of numbers created on the first line. However, if programmers want to make this code snippet reusable, it is easy to think of replacing the constant 2 with a variable in local scope, as follows:

1
2
3
4
5
6
7
8
9
List<float> listOfNumbers = createListOfRandomNumbers();

int desiredDivisor = getDesiredDivisor();

listOfNumbers.Sort( (x, y) =>

(int) x.CompareTo ((int) (y / desiredDivisor))

);

Anonymous methods now require the method to be able to access the variable state outside the scope of the method, so it has become a closure. The desiredDivisor variable must be passed to the closure somehow so that the actual code of the closure can use the variable.

To do this, C # generates an anonymous class that holds the outer-scope variables required by the closure. When the closure is passed to the Sort method, a copy of this class is instantiated and initialized with the value of the desiredDivisor integer.

Because performing a closure requires instantiating a copy of the closure-generated class, and all classes are reference types in C # , performing a closure requires allocating objects on the managed heap.

In general, avoid closures in C # whenever possible. Anonymous methods and method references should be minimized in performance-sensitive code, especially code that needs to be executed every frame.

IL2CPP

Currently, by looking at the code generated by IL2CPP, we know that the declaration and assignment of a variable of type System. Function will assign a new object. This is true whether the variable is explicit (declared in a method/class) or implicit (declared as a parameter of another method).

Therefore, using the anonymous method under the IL2CPP script backend necessarily allocates managed memory. This is not the case under the Mono script backend.

In addition, due to the different declaration of method parameters, IL2CPP will show a large difference in the amount of managed memory allocation. As expected, each call to closure consumes the most memory.

Predefined methods, when passed as arguments under the IL2CPP script backend, __ allocate almost as much memory as closure __, but this is not very intuitive. Anonymous methods generate a minimum amount of temporary garbage (one or more orders of magnitude) on the heap.

Therefore, if you plan to release your project on the IL2CPP script backend, there are three main recommendations:

It is best to choose a coding style that does not require passing methods as parameters.

When unavoidable, it is better to choose anonymous methods rather than predefined methods.

Avoid using closures regardless of the script backend.

Packing

Boxing is one of the most common sources of unexpected temporary memory allocation in Unity projects. Boxing occurs whenever a value of a value type is used as a reference type; this most often occurs when passing variables of primitive value types (such as int and float) to methods of object types.

In the very simple example below, the integers in x are boxed to pass to the object. Equals method, because the Equals method on object requires object to be passed as a parameter to it.

1
2
3
4
5
int x = 1;

object y = new object();

y.Equals(x);

The C # IDE (Integrated Development Environment) and compiler usually do not issue warnings about boxing, even when it results in unexpected memory allocations. This is because the design concept of the C # language believes that small temporary allocations can be effectively handled by generational garbage collectors and memory pools that are sensitive to allocation size.

While Unity’s allocator actually uses different memory pools for small and large allocations, Unity’s garbage collector is “not” generational, so it cannot effectively clean up small, frequent temporary allocations generated by boxing.

When writing C # code for the Unity runtime, you should avoid boxing as much as possible.

Identification packing
Boxing appears in the CPU trace as calls to one of several specific methods, depending on the script backend used. These calls typically take one of the following forms, where < some class > is the name of another class or struct and… is some parameter:

1
2
3
4
5
<some class>::Box(…)

Box(…)

<some class>_Box(…)

Dictionaries and enumerations

A common reason for boxing is to use an enum type as the key of a dictionary. Declaring enum creates a new value type that is treated as an integer in the background, but implements type safety rules at compile time.

By default, calling Dictionary.add (key, value) results in a call to Object.getHashCode (Object). This method is used to get the corresponding hash code for the keys of the dictionary and is used in all methods that accept keys, such as: Dictionary.tryGetValue, Dictionary.remove, etc.

The Object.getHashCode method is a reference type, but the enum value is always a value type. Therefore, for enumerated key dictionaries, each method call will cause the key to be boxed at least once.

The following code snippet shows a simple example that illustrates this boxing problem.

1
2
3
4
5
enum MyEnum { a, b, c };

var myDictionary = new Dictionary<MyEnum, object>();

myDictionary.Add(MyEnum.a, new object());

To solve this problem, you need to write a custom class that implements the IEqualityComparer interface and specifies an instance of that class as a comparator for the dictionary (__ note: __ this object is usually stateless, so it can be reused with different dictionary instances to save memory).

The following is a simple example of the above code snippet IEqualityComparer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public class MyEnumComparer : IEqualityComparer<MyEnum> {

public bool Equals(MyEnum x, MyEnum y) {

return x y;

}

public int GetHashCode(MyEnum x) {

return (int)x;

}

}

An instance of the above class can be passed to the dictionary constructor function.

Foreach

In Unity’s version of the Mono C # compiler, using a foreach loop forces Unity to box a value each time the loop terminates (__ note: __ boxes the value once each time the entire loop completes in its entirety, not once in each iteration of the loop, so the memory usage remains the same whether the loop is run twice or 200 times). This is because the IL generated by Unity’s C # compiler constructs an enumerator of generic value types to traverse the collection of values.

This enumerator implements the IDisposable interface; it must be called when the loop terminates. However, calling interface methods on value-type objects, such as structures and enumerators, requires them to be boxed.

Please refer to the following very simple example code.

1
2
3
4
5
6
7
int accum = 0;

foreach(int x in myList) {

accum += x;

}

The above code will generate the following intermediate language after running through Unity’s C # compiler:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
.method private hidebysig instance void 

ILForeach() cil managed

{

.maxstack 8

.locals init (

[0] int32 num,

[1] int32 current,

[2] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32> V_2

)

// [67 5 - 67 16]

IL _ 0000: ldc.i4.0

IL _ 0001: stloc.0 / / num

// [68 5 - 68 74]

IL_0002: ldarg.0 // this

IL_0003: ldfld class [mscorlib]System.Collections.Generic.List`1<int32> test::myList

IL_0008: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0/*int32*/> class [mscorlib]System.Collections.Generic.List`1<int32>::GetEnumerator()

IL_000d: stloc.2 // V_2

.try

{

IL _ 000e: br IL _ 001f

// [72 9 - 72 41]

IL _ 0013: ldloca.s V _ 2

IL_0015: call instance !0/*int32*/ valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::get_Current()

IL_001a: stloc.1 // current

// [73 9 - 73 23]

IL _ 001b: ldloc.0 / / num

IL_001c: ldloc.1 // current

IL _ 001d: add

IL _ 001e: stloc.0 / / num

// [70 7 - 70 36]

IL_001f: ldloca.s V_2

IL_0021: call instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::MoveNext()

IL _ 0026: brtrue IL _ 0013

IL _ 002b: leave IL _ 003c

}//. end of try

finally

{

IL_0030: ldloc.2 // V_2

IL_0031: box valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>

IL_0036: callvirt instance void [mscorlib]System.IDisposable::Dispose()

IL_003b: endfinally

}//finally ended

IL _ 003c: ret

}//Method test :: IL Foreach ends

}//test class ends

The most relevant code is the finally near the bottom { … } Code Block. The callvirt directive discovers the location of the IDisposable. Dispose method in memory before calling it and requires the enumerator to be boxed.

In general, foreach loops should be avoided in Unity. The reason is not only that these loops do boxing, but also that method calls to traverse collections through enumerators are more expensive and often much slower than manual iterations through for or while loops.

Unity

One of the more harmful and less obvious reasons for virtual array allocation is repeated access to the Unity API that returns an array. All Unity APIs that return arrays create a new copy of the array every time they are accessed. Accessing array values unnecessarily from the Unity API is extremely inappropriate.

For example, the following code will blur to create four copies of the vertices array each iteration of the loop. Assignments occur each time the .vertices property is accessed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
for(int i = 0; i < mesh.vertices.Length; i++)

{

float x, y, z;

x = mesh.vertices[i].x;

y = mesh.vertices[i].y;

z = mesh.vertices[i].z;

// ...

DoSomething(x, y, z);

}

By capturing an array of vertices before entering the loop, it can be simply refactored into a single array allocation regardless of the number of loop iterations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
var vertices = mesh.vertices;

for(int i = 0; i < vertices.Length; i++)

{

float x, y, z;

x = vertices[i].x;

y = vertices[i].y;

z = vertices[i].z;

// ...

DoSomething(x, y, z);

}

Although the CPU cost of accessing properties once is not very high, repeated accesses within a compact cycle can overheat CPU performance. In addition, repeated accesses can cause unnecessary expansion of the managed heap.

This issue is extremely common in mobile end because the Input.touches API behaves similarly to the above. It is extremely common for projects to contain code similar to the following, in which case an assignment occurs every time the .touches property is accessed.

1
2
3
4
5
6
7
8
9
for ( int i = 0; i < Input.touches.Length; i++ )

{

Touch touch = Input.touches[i];

// …

}

Of course, this problem can be easily improved by promoting the array allocation from the loop condition.

1
2
3
4
5
6
7
8
9
10
11
Touch[] touches = Input.touches;

for ( int i = 0; i < touches.Length; i++ )

{

Touch touch = touches[i];

// …

}

However, there are now many versions of the Unity API that do not result in memory allocation. If you can use these versions, please try to choose this version.

1
2
3
4
5
6
7
8
9
10
11
int touchCount = Input.touchCount;

for ( int i = 0; i < touchCount; i++ )

{

Touch touch = Input.GetTouch(i);

// …

}

Converting the above example to a non-allocated Touch API is simple:

Note that the property access (Input.touchCount) remains outside the loop condition in order to save the CPU cost of calling the get method of the property.

Empty array reuse

When an array value method needs to return an empty set, some development teams prefer to return an empty array instead of null. This coding pattern is common in many managed languages, especially C # and Java.

In general, when returning a zero-length array from a method, it is much more efficient to return a pre-allocated singleton instance of a zero-length array than to repeatedly create an empty array __ note: __ of course, the exception is when the array is resized after returning the array).

Footnote

(1) This is because reading back from GPU memory is extremely slow on most platforms. Reading textures from GPU memory into temporary buffers for use by CPU code such as Texture. GetPixel would be very inefficient.

Strictly speaking, all non-null reference type objects and all boxed value type objects must be allocated on the managed heap.

(3) The specific running time depends on the platform.

(4) Note that this is not the same as the number of bytes temporarily allocated during a given frame. The profiler displays the number of bytes allocated in a particular frame, regardless of whether some/all of the allocated memory has been reused in subsequent frames.

(5) Of course, adjusting the size of the array after returning the array is an exception.

Reference article

了解托管堆