Part 2: Memory Optimization in Embedded Systems in the C Language – RAM

We already know that the process of optimizing memory consumption in embedded systems is worth starting with a look at the memory of the microcontroller and the choice of tools to streamline the process and help ensure security. The first part of our series of articles on this topic was devoted to these elements. Now it's time to analyze specific cases in which RAM can be optimally used.
 
Read-only variables in RAM
Symbols of the .data type are initialized variables that are stored in the RAM as well as in the ROM. They are stored in RAM because they act as variables, just like .bss symbols. Meanwhile, their initial values are stored in ROM. For this reason, the size of .data type variables should be considered twice - separately for ROM and separately for RAM.
//RAM
char manufacture_name[12] = "Company Name";
//ROM
const char manufacture_name[12] = "Company Name";
If you see data in the .data section that never changes while the application is running, move it to ROM by marking the variable as const (constant).
Static variables in a function
Variables declared in the body of a function usually end up on the stack. The stack is part of RAM, but the life of a variable on the stack ends after leaving the function. Static variables in the function body "live" for the life of the application, much like global variables, but with limited access. If it is not needed, it is not worth using static variables in a function.
Padding or structure completion in C
The compiler can insert extra bytes between structure elements to provide address alignment for faster memory access. A 32-bit field of a structure requires an address divisible by four, while a 16-bit field requires an address divisible by two. If the compiler detects that another element of the structure would be located at a suboptimal address, it will be inserted further. Unused bytes between fields of a structure are called padding. Suboptimal organization of structure elements means wasted memory.
//You can run this example on <https://www.onlinegdb.com/>
#include <stdio.h>
#include <stdint.h>
typedef struct
{
uint8_t sign1; // 1B
// padding 3B
uint32_t number1; // 4B
uint8_t sign2; // 1B
// padding 3B
uint32_t number2; // 4B
uint8_t sign3; // 1B
uint8_t sign4; // 1B
// padding 2B
} type_1;
typedef struct
{
uint32_t number1; // 4B
uint32_t number2; // 4B
uint8_t sign1; // 1B
uint8_t sign2; // 1B
uint8_t sign3; // 1B
uint8_t sign4; // 1B
} type_2;
int main()
{
printf("Size of type_1 is %liB\n",sizeof(type_1));
printf("Size of type_2 is %liB\n",sizeof(type_2));
return 0;
}
/* Output:
Size of type_1 is 20B
Size of type_2 is 12B
*/
In the example above, the size of the useful data in type_1 and type_2 is the same. They contain two uint32_t elements and four uint8_t elements. The size of this data is 12B. Comments indicate where the paddings are. The size of these structures depends on how their members are arranged. With optimal placement, type_2 has a size of 12B, with suboptimal placement type_1 has a size of 20B.
Eight bytes is not a huge loss, but with large structures, bad organization can cause us to waste more memory. Especially when we store structures in arrays, the loss is multiplied by the number of elements.
"Packed" attribute
The "packed" attribute tells the compiler not to put padding in the structure. On the surface, this looks like a solution to the problem, but it comes at a cost. Padding provides addresses alignment to speed up memory access. Its absence results in slower access.
//You can run this example on <https://www.onlinegdb.com/>
#include <stdio.h>
#include <stdint.h>
typedef struct __attribute__((__packed__))
{
uint8_t sign1; // 1B
uint32_t number1; // 4B
uint8_t sign2; // 1B
uint32_t number2; // 4B
uint8_t sign3; // 1B
uint8_t sign4; // 1B
} type_1; //sizeof(type_1): 12B
typedef struct __attribute__((__packed__))
{
uint32_t number1; // 4B
uint32_t number2; // 4B
uint8_t sign1; // 1B
uint8_t sign2; // 1B
uint8_t sign3; // 1B
uint8_t sign4; // 1B
} type_2; //sizeof(type_2): 12B
int main()
{
printf("Size of type_1 is %liB\n",sizeof(type_1));
printf("Size of type_2 is %liB\n",sizeof(type_2));
return 0;
}
/* Output:
Size of type_1 is 12B
Size of type_2 is 12B
*/
However, you can quickly check how much memory the padding consumes. Just replace typedef struct with typedef struct __attribute__((__packed__)) in the project files and compare the sizes of the two versions of the application. The resulting difference will be higher than what you will actually get after manually rearranging the structures. However, it will allow you to estimate whether it is worth dealing with the padding. You can use this method on a smaller scale on a particular structure. It will give you quick feedback on whether it is worth bothering with.
Dynamic allocation
Embedded systems use dynamic memory allocation. The allocated memory is called the heap. It is completely managed by the developer and its maximum size is assumed before compilation. It is good practice to use dynamic allocation only at the initialization stage. This makes it possible to estimate the maximum amount of memory required by the application for correct operation, and to avoid memory fragmentation due to continuous allocation and deallocation. This eliminates specific and difficult to replicate cases where an application tries to allocate memory and resources do not allow it. In practice, such allocation is not exactly dynamic - it occurs at run-time during initialization, which is why we call it dynamic, but after initialization its size remains constant.
Modules in embedded applications often need their own runtime memory to store context or buffers for data that cannot reside on the stack. Such data can be stored either as global variables or as variables on the heap. It is useful to find modules that are independent of each other and not running at the same time, which can share the same area of RAM on the heap.
For example, if a device operates in send or receive mode and cannot perform two functions simultaneously, it can share the memory area used for receiving and sending data.
Duplicating data between layers
Applications are often divided into abstraction layers that perform different tasks. Data are passed between layers. Sometimes some layers only act as intermediaries. If the application architecture allows it, it is worth making sure that the data between layers is not copied, but passed by a pointer.

Bitfields
The C language allows the developer to specify the size of a variable in bits. This allows efficient use of memory in situations where not all bits of a type are used. An extreme case is a bool type variable. One bit is enough to store the TRUE/FALSE value, and it takes up an entire byte of memory. One way to save a few bytes is to use bitfields and group variables so that every bit of memory is used.
//You can run this example on <https://www.onlinegdb.com/>
#include <stdio.h>
#include <stdbool.h>
typedef struct {
bool f0;
bool f1;
bool f2;
bool f3;
bool f4;
bool f5;
bool f6;
bool f7;
} customData_t;
typedef struct {
bool f0:1;
bool f1:1;
bool f2:1;
bool f3:1;
bool f4:1;
bool f5:1;
bool f6:1;
bool f7:1;
} customData2_t;
int main()
{
printf("Size of customData_t is %luB\n", sizeof(customData_t));
printf("Size of customData2_t is %luB\n", sizeof(customData2_t));
return 0;
}
/* Output:
Size of customData_t is 8B
Size of customData2_t is 1B
*/
Using bitfields is not free – it costs time. It takes more processor instructions to read and change the value of such a field. Therefore, you should use this option only in cases where you do not care about the execution time of the operation.
Structures vs. unions
It is worth remembering the difference between a structure and a union in C. The size of a structure is the sum of the sizes of all its elements (plus padding). The size of a union, on the other hand, is determined by the size of its largest element.
//You can run this example on <https://www.onlinegdb.com/>
 
#include <stdio.h>
#include <stdint.h>
 
typedef struct {
    uint32_t var1;
} configurationV1_t;
 
typedef struct {
    uint32_t var1;
    uint32_t var2;
} configurationV2_t;
 
typedef struct {
    uint32_t var1;
    uint32_t var2;
    uint32_t var3;
} configurationV3_t;
 
typedef union {
    configurationV1_t confV1;
    configurationV2_t confV2;
    configurationV3_t confV3;
} configurationUnion_t;
 
typedef struct {
    configurationV1_t confV1;
    configurationV2_t confV2;
    configurationV3_t confV3;
} configurationStruct_t;
 
typedef struct {
    configurationUnion_t conf;
    uint32_t version;
} configuration1_t;
 
typedef struct {
    configurationStruct_t conf;
    uint32_t version;
} configuration2_t;
 
int main()
{
    printf("Size of configurationV1_t is %luB\n", sizeof(configurationV1_t));
    printf("Size of configurationV2_t is %luB\n", sizeof(configurationV2_t));
    printf("Size of configurationV3_t is %luB\n", sizeof(configurationV3_t));
    printf("Size of configurationUnion_t is %luB\n", sizeof(configurationUnion_t));
    printf("Size of configurationStruct_t is %luB\n", sizeof(configurationStruct_t));
    printf("Size of configuration1_t is %luB\n", sizeof(configuration1_t));
    printf("Size of configuration2_t is %luB\n", sizeof(configuration2_t));
    return 0;
}
 
/* Output:
**Size of configurationV1_t is 4B
Size of configurationV2_t is 8B
Size of configurationV3_t is 12B
Size of configurationUnion_t is 12B
Size of configurationStruct_t is 24B
Size of configuration1_t is 16B
Size of configuration2_t is 28B**
*/
For example, one place where you can see the advantage of union over structure is when storing device data with backward compatibility. As the product grows, so do the features in the application. A good ecosystem should be compatible with previous generations of devices. To support each device, you need storage for the largest possible context. 
For such an application, a union will work perfectly. In the example above, we see an example of storing version-dependent configurations using union and structure. configuration1_t uses union, and configuration2_t uses structure. The application logic should depend on the value of the version variable.
Depth of the stack

The stack, heap, .bss and .data are all part of RAM. If your application lacks memory for global variables (.bss and .data) or for dynamic allocation (heap), you should consider changing the amount of memory allocated to the stack.
The stack is operational memory used by the program to store local variables, function arguments, and stack frames, among other things. The developer determines its maximum size. It is a good practice to measure the depth of the stack, which is the minimum size that allows the application to function properly. Such a measurement will allow you to safely reduce the size of the stack in favor of the heap or variable space.
GCC allows static stack analysis thanks to the --fstack-usage flag. This allows the developer to get stack usage information for each function separately. Unfortunately, one function can call another, making it impossible to estimate the maximum stack usage for complex systems.
One method of estimating the maximum stack usage is called "stack painting". It involves filling the stack with a known string of data. Then, you should perform a series of system operations that you want to analyze, read the memory, and see how much of it was overwritten.
If the measured maximum stack depth is too high, optimization can be attempted. GDB allows you to read the current stack pointer. You can use the debugger to step through the entire code (single-stepping) and calculate the stack size line by line. This is an accurate way to detect extremes, which is a good place to start optimization.
If you want to reduce the stack usage of called functions, there are several ways:
- Converting local function variables to static or global variables. This moves the variable from the stack to the .bss section, reducing stack usage at the expense of the .bss section.
- Dynamically allocating local function variables (if the application uses dynamic memory allocation). This causes them to be placed on the heap. Memory for dynamic allocation is reserved on the heap, so if it is not being used by other modules, it is worth using.
- Using inline functions. Each function call involves the creation of a stack frame containing the information needed to execute the function. Using inline functions reduces the number of function calls, and thus the load on the stack.
- Passing a large structure to the function via a pointer. Function arguments also go to the stack, so passing by pointer is usually a better solution. It reduces stack usage because a pointer usually has a smaller size than a structure (4 bytes on 32-bit architecture). It also improves performance because copying a pointer to the stack is faster than copying a structure.
Key Takeaways
Structure optimization, smart heap management, minimization of stack usage, and the other methods described above are proven ways for software developers to optimize RAM memory usage. It is worth remembering them when working on embedded systems due to limited resources. In the third part, we will look at another optimization option, this time using ROM.
Author
Mateusz Szpiech
Embedded Software Engineer at Comarch