web123456

ARM Learning (31) Compiler Support for Overlay Methods

ARM Learning (31) Compiler vs.overlayApproach support

1、Introduction of overlay

overlay: overlap means that the space can be reused, usually in memory. For exampleWindowsOperating system, for example, its storage space (ROM/FLASH) is generally relatively large, but the memory is relatively small, the memory to load more data above the Flash, it is necessary to reuse the space above.

For example, a game inside a lot of dynamic link library dll, memory is limited, can only load a part of the dll library, when some libraries are used, some libraries will be covered, and then call these libraries, of course, there are a lot of replacement algorithms here, such as LRU, last recently used, the least recently used will be replaced, due to the replacement of libraries are not sure, so the address of the load is not sure. Because the library to be replaced is uncertain, the address to be loaded is uncertain, which requires that the dll can be loaded dynamically, according to the address to be loaded for offset addressing, which is PIC, location-independent, the code inside the dll uses relative addressing, so it can be loaded to any address can be used (theoretically).

embedded systemIn order to run efficiently, the code is often executed on top of the ram (relative to Flash), so the ram has to hold both the code and the data, which is relatively tight, and it reuses the memory space.

Embedded systems often use absolute address addressing, do not use the relative address, the author here to not with theLinuxThe operating system is an example of an application scenario. So if you need to reuse memory space, you need to identify a reused address, and then all code loaded into this block needs to be compiled using this address, i.e., specify the absolute address in the link script.
在这里插入图片描述
The above figure as an example, there are four function code 1-4, all need to run to the dynamic memory address, then the author needs to compile these four addresses are compiled to the same dynamic memory address, and then need to which function when the function is moved to the corresponding address, and then jump over to the execution.

2. Compilerarmcc/armclang support for overlay

The armcc/armclang compilers support overlay, mainly for the linking scripts piece, and usually the

  • If two .o files are placed at the same address, the
  • Or if both functions are placed at the same address, an error will be thrown.
    As follows: there is an overlap between the two overlay regions, because the job of the linker is to assign the runtime address, which of course can't overlap, otherwise how should I place the code and execute the code?
LR_OVERLAY0 0x30000000  0x1000
{
  ER_OVERLAY0 0x2001E000    0x1000 
  {
    overlay0.o(BANK_SEC, +FIRST)
    overlay0.o(+RO)
    overlay0.o(.text)
  }
}

LR_OVERLAY1 0x30001000  0x1000
{
  ER_OVERLAY1 0x2001E000   0x1000 
  {
    overlay1.o(BANK_SEC, +FIRST)
    overlay1.o(+RO)
    overlay1.o(.text)
  }
}

LR_OVERLAY2 0x30002000  0x1000
{
  ER_OVERLAY2 0x2001E000   0x1000 
  {
    overlay2.o(BANK_SEC, +FIRST)
    overlay2.o(+RO)
    overlay2.o(.text)
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
"", line 32 (column 17): Warning: L6329W: Pattern overlay0.o(RO) only matches removed unused sections.
"", line 33 (column 16): Warning: L6314W: No section matches pattern overlay0.o(.text).
"", line 42 (column 17): Warning: L6329W: Pattern overlay1.o(RO) only matches removed unused sections.
"", line 43 (column 16): Warning: L6314W: No section matches pattern overlay1.o(.text).
"", line 52 (column 17): Warning: L6329W: Pattern overlay2.o(RO) only matches removed unused sections.
"", line 53 (column 16): Warning: L6314W: No section matches pattern overlay2.o(.text).
Error: L6221E: Execution region ER_OVERLAY0 with Execution range [0x2001e000,0x2001e080) overlaps with Execution region ER_OVERLAY1 with Execution range [0x2001e000,0x2001e074).
Error: L6221E: Execution region ER_OVERLAY0 with Execution range [0x2001e000,0x2001e080) overlaps with Execution region ER_OVERLAY2 with Execution range [0x2001e000,0x2001e074).
Error: L6221E: Execution region ER_OVERLAY1 with Execution range [0x2001e000,0x2001e074) overlaps with Execution region ER_OVERLAY2 with Execution range [0x2001e000,0x2001e074).
Finished: 0 information, 6 warning and 3 error messages.
make: *** [out/AdvancedClock.axf] Error 1
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

In order for the linker to recognize this situation and place multiple functions at the same address, a keyword must be added, and the author found the keyword overlay above the manual.

As described in the following example, the error will not be reported as long as the overlay keyword is added to the description of the domain space where multiple addresses are intended to be executed.
在这里插入图片描述
The author did try it sure enough.

LR_OVERLAY0 0x30000000  0x1000
{
  ER_OVERLAY0 0x2001E000 OVERLAY   0x1000 
  { 
    (OVERLAY_SEC, +FIRST)
    (+RO)
    (.text)
  }
}

LR_OVERLAY1 0x30001000  0x1000
{
  ER_OVERLAY1 0x2001E000 OVERLAY  0x1000 
  {
    (OVERLAY_SEC, +FIRST)
    (+RO)
    (.text)
  }
}

LR_OVERLAY2 0x30002000  0x1000
{
  ER_OVERLAY2 0x2001E000 OVERLAY  0x1000 
  {
    (OVERLAY_SEC, +FIRST)
    (+RO)
    (.text)
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29

在这里插入图片描述
Two points need to be noted:
1, if it is set independently of the loading domain, you need to declare the entry function as the root attribute, otherwise the address of the jump is abnormal, and may run away and fly, etc.
2. Make sure the function declaration is USED, otherwise the linker will stripped the function inside the overlay because it is not used.
3, pay attention to the declaration of the OVERLAY attribute should be placed in front of the implementation of the domain length attribute, otherwise it will report errors
4, because the author used the cm4 architecture, jump when you need to pay attention to the use of odd address, otherwise it may run away from the fly.

LR_OVERLAY0 0x30000000  0x1000
{
  ER_OVERLAY0 0x2001E000 OVERLAY   0x1000 
  { 
    (+RO)
    (.text)
  }
}

LR_OVERLAY1 0x30001000  0x1000
{
  ER_OVERLAY1 0x2001E000 OVERLAY  0x1000 
  {
    (+RO)
    (.text)
  }
}

LR_OVERLAY2 0x30002000  0x1000
{
  ER_OVERLAY2 0x2001E000 OVERLAY  0x1000 
  {
    (+RO)
    (.text)
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
An instance of the overlay1 function, which has no root attribute, only the used attribute.
__attribute__((used))  static  void overlay_handler(u8 overlay_id, u8 func_id)
{
    switch(func_id)
    {
        case 1:
        {
            rt_kprintf("this is overlay func,overlay id=%d func id=%d\r\n", overlay_id, func_id);
        }break;
        default:
        rt_kprintf("this is overlay func,func id=%d,err\r\n", func_id);
        break;
    }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

I tried if you don't add the root attribute, then overlay1 and overlay will be the entry function is the compiler generated code, it will be an exception.
在这里插入图片描述
The code generated by the compiler is as follows: it is not operated by the stack, and the entry address becomes 2001E00C, so it may result in a runtime.
在这里插入图片描述
For the generation of veneer code, I found that after searching for information: "veneer code" refers to a special kind of code, which is used to realize function calls and branch jumps.
The following scenarios are more common:

  1. Jump tables: veneer code can be used to implement jumps from one place to another when the address of the target function or code fragment is large. This is because ARM processors have different address modes, such as 32-bit and 64-bit. In some modes, direct branching instructions may not cover the entire address space and veneer code helps to jump between addresses.
  2. Exception Handling: In the exception handling and interrupt service routines, veneer code is used to simplify and optimize the jump logic so that jumps and resumptions are performed quickly and efficiently, thus ensuring system stability and responsiveness.
  3. ABI (Application Binary Interface): veneer code can help with compatibility between different ABIs, allowing code generated by different compilers to call each other.

This code is usually generated by a compiler or linker and is used to optimize and manage theARM architectureCode jumps and calls under.

The normal code should look like the following:
在这里插入图片描述

If you don't add the used and root attributes, none of the symbols are linked in, because the overlay function itself needs a runtime state to decide which function to run, so the compiler doesn't know which one to link in during static compilation, but if you don't specify the used attribute, then all of the symbols will be strpped out.

 static  void overlay_handler(u8 overlay_id, u8 func_id)
{
    switch(func_id)
    {
        case 2:
        {
            rt_kprintf("this is overlay func,overlay id=%d func id=%d\r\n", overlay_id, func_id);
        }break;
        default:
        rt_kprintf("this is overlay func,func id=%d,err\r\n", func_id);
        break;
    }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

在这里插入图片描述
If the overlay property is misplaced, the following error is reported.

"", line 29 (column 36): Error: L6228E: Expected '{', found 'O...'.
"", line 29 (column 36): Error: L6228E: Expected '}', found 'EOF'.
Not enough information to list the image map.
Finished: 1 information, 0 warning and 2 error messages.
make: *** [out/AdvancedClock.axf] Error 1
  • 1
  • 2
  • 3
  • 4
  • 5

The rest of the link script about overlay is written as shown below:

  • If region1 is not an overlay attribute, then the address of region2 is the end address of the address of region1.
  • region1 is the overlay attribute and offset is 0, then region2 has the same address as region1
  • If region1 is an overlay and offset is not 0, then region2 is the last address of region1 + offset.
    在这里插入图片描述
    Use odd addresses when jumping or you will get an error because cm4 uses thumb instructions that
overlay_handler_fun overlay_handler_func = (overlay_handler_fun)(overlay_EXEC_ADDR+1);
  • 1

在这里插入图片描述

The author has written a reference example below:
overlay manager:
set_overlay_id, which will request a switch of the current bank.
overlay_process, which will process the current request and execute the function.

#include ""


#define overlay_EXEC_ADDR 0x2001E000


#define overlay0_SAVE_ADDR  0x08020000
#define overlay1_SAVE_ADDR  0x08020400
#define overlay2_SAVE_ADDR  0x08020800
#define overlay_FLASH_BASE  overlay0_SAVE_ADDR


typedef void (*overlay_handler_fun)(u8 overlay_id,u8 func_id);

u8 current_overlay_id_g = 0;
u8 set_overlay_id_g = 0;
void overlay_init()
{
    u32 current_overlay_flash_addr = overlay_FLASH_BASE + current_overlay_id_g*0x400;
    STMFLASH_Read(current_overlay_flash_addr, (u32*)overlay_EXEC_ADDR, 0x400);
    overlay_handler_fun overlay_handler_func = (overlay_handler_fun)(overlay_EXEC_ADDR+1);
    (*overlay_handler_func)(current_overlay_id_g, current_overlay_id_g);
}

void set_overlay_id(u8 req_overlay_id)
{
    set_overlay_id_g = req_overlay_id;
}

void overlay_process()
{
    if(set_overlay_id_g != current_overlay_id_g)
    {
        current_overlay_id_g = set_overlay_id_g;
        u32 current_overlay_flash_addr = overlay_FLASH_BASE + current_overlay_id_g*0x400;
        STMFLASH_Read(current_overlay_flash_addr, (u32*)overlay_EXEC_ADDR, 0x400);
        overlay_handler_fun overlay_handler_func = (overlay_handler_fun)(overlay_EXEC_ADDR+1);
        (*overlay_handler_func)(current_overlay_id_g, current_overlay_id_g);
    }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
#include ""


overlay0.c
__attribute__((section("overlay_SEC"),used))  static void overlay_handler(u8 overlay_id, u8 func_id)
{
    switch(func_id)
    {
        case 0:
        {
            rt_kprintf("this is overlay func,overlay id=%d func id=%d\r\n", overlay_id, func_id);
        }break;
        default:
         rt_kprintf("this is overlay func,func id=%d,err\r\n", func_id);
        break;
    }
}
overlay1.c
__attribute__((section("overlay_SEC"),used))  static  void overlay_handler(u8 overlay_id, u8 func_id)
{
    switch(func_id)
    {
        case 1:
        {
            rt_kprintf("this is overlay func,overlay id=%d func id=%d\r\n", overlay_id, func_id);
        }break;
        default:
        rt_kprintf("this is overlay func,func id=%d,err\r\n", func_id);
        break;
    }
}
overlay2.c
 __attribute__((section("overlay_SEC"),used)) static  void overlay_handler(u8 overlay_id, u8 func_id)
{
    switch(func_id)
    {
        case 2:
        {
            rt_kprintf("this is overlay func,overlay id=%d func id=%d\r\n", overlay_id, func_id);
        }break;
        default:
        rt_kprintf("this is overlay func,func id=%d,err\r\n", func_id);
        break;
    }
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46

The actual results are as follows:
在这里插入图片描述
在这里插入图片描述
If you have a Trace32 debugger, you can debug with Trace32 support for overlay.
trace32 setup command:

  1. ON
  2. Automatic ID Recognition
  3. See which overlay you are currently in
    From the figure below you can see that the author's overlay is at overlay1, according to the printout on the right, and then the trace32 debugger also shows overlay1.
    在这里插入图片描述
    The author cuts to overlay2, then the corresponding debugger shows overlay2.
    在这里插入图片描述

3. Reference

armcc official manual
DUI0472M_armcc_user_guide
DUI0474M_armlink_user_guide