Basic use of the arm-none-eabi toolchain or what happens before main

_lazor_ 18 Nov 2018 17:26 31 11478 Cool? (+32)

📢 Listen (AI):

Introduction

Tutorials that help in developing software for microcontrollers focus mainly on the use of peripherals of the microcontroller itself, which is understandable, but unfortunately they rarely focus on the part of the program that executes before the main function.
The following tutorial aims to introduce this part of the program based on the GNU ARM toolchain and the stm32F334 microcontroller with the cortex-m4 core.

Tools used:
- GNU Arm Embedded Toolchain 7-2018-q2-update
https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/downloads

- OpenOCD
- cygwin (with make)
- puTTs
- drivers required by STLINK

Install the above programs and add the path to the bin folder for the toolchain to the Path environment variable (for our convenience, otherwise you have to specify the relative path to the tools every time).

1. Startup

Where to start writing code? From searching for information on how the core manufacturer (in this case, ARM) envisioned the processor startup:

https://www.keil.com/pack/doc/CMSIS/Core/html/startup_s_pg.html

Our code starts at address 0x00000000 with the value of the stack pointer, and then the program starts at address 0x00000004 with the beginning of the vector table whose first element is the pointer for the "Reset_Handler" function.
It is possible to change the memory address from which our program runs. This is possible by physically interfering with the pins marked BOOT0 and nBOOT1. Interestingly, we can start the program from "System memory" where the "Embedded boot loader" is located, which is loaded during the production of the system and is not possible to modify.
More information about the Embedded boot loader can be found at the following link:
https://www.st.com/content/ccc/resource/techn...0/0c/CD00167594.pdf/files/CD00167594.pdf/jcr: content/translations/en.CD00167594.pdf

We can now proceed to writing our own startup.S code. What is this file? It is a file written in assembly language called assembler. The assembly language consists of two sets of instructions: ARM and thumb (thumb-2). You can read more about it in this thread:
https://stackoverflow.com/questions/28669905/...e-arm-thumb-and-thumb-2-instruction-encodings
In our case, we use the thumb-2 instruction set:
http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf

1. the .global directive makes the _start symbol visible to the linker (ld)
2. the .thumb_func directive tells the assembler (as) that the next symbol points to the thumb instructions
3. stack point address, whose value will be at address 0x08000000 flash. Currently, it will be set to the end of the SRAM memory, which can be read from the microcontroller documentation.
4. vector reset, which will be located at address 0x08000004 flash. The content of this address contains the address for the reset function:
5. Branch - jumping to the function with a label (address) without entering the address to lr (link register).
6. The code should never be here.
7. Infinite loop function.

2. Linker script

Now that we have our startup, we can start writing our linker script. The script below describes what code (and in what order) will go to each section and will provide us with information about the beginning and end of each section.

1. MEMORY allocates names to memory spaces.
2. Flash start and length. Information about the address in the documentation
3. Start of memory and Ram length. Again, information about the address in the documentation
4. section .text in this section is the executable code.
5. section start address. We can also set the section address manually via . = 0x80000;
6. Drop all text from objects into this section
7 and 8. Putting specific text from objects into this space. Keep in mind that swapping the order of the objects will cause the code to not work properly!
9. End of section address
10. Indication of the address to which the given section will fly.
11. The .rodata section contains all constant variables.
12. The .bss (block started by symbol) section contains all non-initialized static variables.
13. The .data section contains all initialized static variables. The entry "AT (__rodata_end__)" means that the section, although it will be located in SRAM, will be physically located in flash at the beginning and then it must be copied in cstartup to the ram.

3. cstartup

Now that we have our linker script, we can start writing our own crt0. This is a piece of code executed before the main function, which ensures that all static data is copied from non-volatile memory to volatile memory (in the case of static uninitialized variables, resetting them to zero).
So this is what our _cstartup looks like:

This is code written in C, but we can write it in asm without any problems.
Global variables starting with extern are variables containing information about the addresses of the beginning and end of the section, which will be described in detail in the linker section. The above code clears the .bss space where uninitialized static variables are located, when all initialized static variables are transferred from non-volatile memory to the .data section.
It is important to mention that if we want other sections to be in RAM as well (e.g. a section with code that is to be executed from RAM), they must be copied here as well.

4. Main

Once we have written all the elements related to the startup, we can proceed to writing our main function:
[syntax=c]#define RCCBASE 0x40021000
#define GPIOBBASE 0x48000400

static int wymuszenie_bss;
int wymuszenie_data = GPIOBBASE;
const int wymuszenie_rodata = 400000;

int main ( void )
{
unsigned int* ptr;

wymuszenie_bss = 0x40021000;

ptr = (unsigned int*)(wymuszenie_bss+0x14);
*ptr |= 1

About Author

_lazor_ wrote 3795 posts with rating 1129 , helped 259 times. Live in city Wrocław. Been with us since 2016 year.

Comments

Add a comment

chudybyk 19 Nov 2018 14:04

For those who are completely uninformed and trying to make their first project, it is worth adding that most of the above activities can be covered with a neat environment. There were some examples of... [Read more]

Janusz_kk 19 Nov 2018 19:24

And this is an example of why I will stay with avrstudio and avr, because 8 bits is enough for me, and for larger calculations I will use some pi or orange, because unfortunately, but you did not convince... [Read more]

_lazor_ 19 Nov 2018 20:31

Raspberry pi is already ARM + GPU from brodcom ;) and this article was just inspired by writing bare metal under raspberry pi :) So I didn't even have to convince you :) [Read more]

simw 19 Nov 2018 21:05

It would be good when writing something like this to give some arguments or examples, because you only sow unnecessary confusion. Your statement makes absolutely no sense. What didn't convince you... [Read more]

_lazor_ 19 Nov 2018 22:04

Unfortunately, you are wrong, what you are writing about is optimization, which in the example I gave is not present at all (default value). The -g option for the compiler means: "Produce debugging information... [Read more]

simw 19 Nov 2018 22:18

Well, behind my back I felt that it was about debugging options, but I didn't check it out of momentum. After all, in System Worbench we have: https://obrazki.elektroda.pl/8017753000_1542662242_thumb.jpg... [Read more]

_lazor_ 19 Nov 2018 22:24

Now I've actually tested how the code will work with -O3 and interestingly the compiler wants to optimize the code by using memset and memcpy, which we don't have because we don't use standard... [Read more]

vp32 20 Nov 2018 09:42

Could you elaborate a bit on the "soft reset" thread. The first time I met this term and I was interested in what it is about, how it is called and maybe where it is described, e.g. for STM32? [Read more]

_lazor_ 20 Nov 2018 11:09

In general, it turns out that I mixed up the information from cortex-M with cortex-A. In the case of cortex-M, it seems that there is always an internal pull of Vdd to GND and starting the uC again. ... [Read more]

vp32 20 Nov 2018 12:31

You know, you didn't explain how it relates to STM32. How to implement it and why after this soft reset the program starts from the address 0x00000004 [Read more]

_lazor_ 20 Nov 2018 13:31

On the cortex-m it is possible to detect the source of the reset, but the code will always start from the address 0x00000000, even if the VTOR was previously set to a different value (even software reset... [Read more]

vp32 20 Nov 2018 13:34

Ok thanks, I didn't notice the correction and still had what I read in my mind. :D [Read more]

Bojleros 20 Nov 2018 17:56

When it comes to documenting this knowledge, the article is great. I used to dig with this type of settings myself, but those were the times when the IDE for ARM could be either bought or assembled by... [Read more]

Atlantis86 30 Nov 2018 09:32

After all, no one is forcing you to operate STMs in a low-level way. You might as well use CubeMX, some convenient IDE, and HAL libraries. Then you won't even need to know that something like a Makefile... [Read more]

_lazor_ 30 Nov 2018 09:51

It's not like you don't reference registers in cortex-m, it's just hidden under definitions that are more descriptive, but they're still registers. The libraries themselves are convenient,... [Read more]

Atlantis86 30 Nov 2018 09:59

It's clear. However, the entry point is lower, precisely because using these definitions is more intuitive than writing registers directly. I'm just arguing with the myth that you should start... [Read more]

_lazor_ 30 Nov 2018 10:18

A twin article for raspberry bare metal is already being prepared :) Cortex-A is definitely a different world than cortex-m, but it's worth getting interested in them because the documentation provided... [Read more]

_jta_ 30 Nov 2018 18:42

2. the .thumb_func directive tells the assembler (as) that the next symbol points to the thumb instructions As far as I know, .thumb_func causes the function name symbol to have a value 1 greater than... [Read more]

_lazor_ 30 Nov 2018 19:02

https://sourceware.org/binutils/docs-2.15/as/ARM-Directives.html "This directive specifies that the following symbol is the name of a Thumb encoded function. This information is necessary in order to... [Read more]

FAQ

TL;DR: Enabling -O3 without -ffreestanding on bare-metal STM32 almost doubles code size (+100 %) and breaks builds, while “The -g option produces debugging information GDB can use” [Elektroda, lazor, #17571595; #17571519].

Why it matters: Correct startup, flags, and linker settings decide whether your Cortex-M even reaches main().

Quick Facts

• Reset vector lives at 0x00000004; initial SP at 0x00000000 [Elektroda, _lazor_, post #17568319]
• Example flash region: 0x0800 0000, length 0x1000 (4 KB) [Elektroda, _lazor_, post #17568319]
• Example SRAM region: 0x2000 0000, length 12 KB [Elektroda, _lazor_, post #17568319]
• -ffreestanding stops GCC auto-calling memset/memcpy under -O3 [Elektroda, _lazor_, post #17571595]
• NUCLEO-L432KC with on-board ST-Link ≈ PLN 49 [Elektroda, _lazor_, post #17612659]

What exactly runs before main() on a Cortex-M?

Execution starts at address 0x00000000, reading the initial stack pointer, then jumps to the Reset_Handler entry at 0x00000004. Reset_Handler calls _cstartup, which zeroes .bss, copies .data from flash to SRAM, sets up clocks if needed, and finally calls main() [Elektroda, lazor, post #17568319]

Why do I need my own startup.S file?

Startup.S defines the vector table, initial SP value, and Reset_Handler. Without it, the linker has no entry point, and the MCU boots into an undefined state. Writing it yourself lets you keep binaries small and free of vendor libraries [Elektroda, lazor, post #17568319]

What does .thumb_func really do?

.thumb_func marks following symbols as Thumb functions. The assembler stores an odd address so branches load PC with bit0=1, forcing Thumb mode. Using an even address on Cortex-M freezes execution at the first ISR fetch [Elektroda, jta, post #17597051]

How does the linker script map sections?

The sample script places .text and .rodata in flash, .bss and .data in SRAM. .data uses “AT (__rodata_end__)” so its initial values reside in flash but get copied to RAM during _cstartup [Elektroda, lazor, post #17568319]

How do I copy .data and clear .bss in C?

Use pointers provided by the linker:

while(bss<bss_end) *bss++ = 0;
while(data<data_end) data++ = rodata_end++;
main(); This three-step loop runs in _cstartup before main [Elektroda, lazor, post #17568319]

Which compiler flags are mandatory for debugging?

Compile with -g to embed DWARF symbols; without it, GDB cannot match addresses to source lines [Elektroda, lazor, post #17571519] Keep optimisation at -O0 or -Og for single-stepping clarity [Elektroda, simw, post #17571220]

Why add -ffreestanding when I raise optimisation to -O3?

-O3 makes GCC replace loops with memset/memcpy. In bare-metal projects these library calls are missing, so linking fails or the MCU jumps to HardFault. -ffreestanding tells GCC not to assume the standard library exists [Elektroda, lazor, post #17571595]

How can I issue a software reset on STM32 and where does execution resume?

Write 0x05FA0004 to SCB->AIRCR. After reset, the core always restarts from 0x00000000 regardless of previous VTOR settings; RCC registers retain their values [Elektroda, lazor, #17572569; RM0364, 2021].

What’s an edge case that bricks the boot?

If any vector entry holds an even address, the core attempts ARM mode (unsupported on Cortex-M) and locks up before main—no error message, just a stalled PC [Elektroda, jta, post #17597051]

How do I flash and debug without a full IDE?

3-step how-to:

openocd -f board/st_nucleo_f3.cfg
arm-none-eabi-gdb build.elf
In GDB: target extended-remote :3333 ➜ monitor reset halt ➜ load ➜ continue Works with any ST-Link board [OpenOCD manual; Elektroda, lazor, #17568319].

Are 32-bit MCUs really harder than 8-bit AVRs?

No. Modern STM32 tools generate Makefiles, startup code and linker scripts automatically. Quote: “The plugin will generate makefiles, startups, and linker scripts” [Elektroda, chudybyk, post #17570228] Larger datasheets exist, but higher-level libraries and debuggers speed learning.

What low-cost boards include a ready programmer/debugger?

ST STM32L100C-Discovery (≈ PLN 39), Infineon XMC 2Go (≈ PLN 40), and NUCLEO-L432KC (≈ PLN 49) ship with on-board ST-Link or Segger probes [Elektroda, lazor, post #17612659]