logo elektroda
logo elektroda
X
logo elektroda

Basic use of the arm-none-eabi toolchain or what happens before main

_lazor_ 10962 31
ADVERTISEMENT
Treść została przetłumaczona polish » english Zobacz oryginalną wersję tematu
  • Introduction

    Tutorials that help in developing software for microcontrollers focus mainly on the use of peripherals of the microcontroller itself, which is understandable, but unfortunately they rarely focus on the part of the program that executes before the main function.
    The following tutorial aims to introduce this part of the program based on the GNU ARM toolchain and the stm32F334 microcontroller with the cortex-m4 core.


    Tools used:
    - GNU Arm Embedded Toolchain 7-2018-q2-update
    https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/downloads

    - OpenOCD
    - cygwin (with make)
    - puTTs
    - drivers required by STLINK

    Install the above programs and add the path to the bin folder for the toolchain to the Path environment variable (for our convenience, otherwise you have to specify the relative path to the tools every time).

    1. Startup

    Where to start writing code? From searching for information on how the core manufacturer (in this case, ARM) envisioned the processor startup:

    https://www.keil.com/pack/doc/CMSIS/Core/html/startup_s_pg.html

    Our code starts at address 0x00000000 with the value of the stack pointer, and then the program starts at address 0x00000004 with the beginning of the vector table whose first element is the pointer for the "Reset_Handler" function.
    It is possible to change the memory address from which our program runs. This is possible by physically interfering with the pins marked BOOT0 and nBOOT1. Interestingly, we can start the program from "System memory" where the "Embedded boot loader" is located, which is loaded during the production of the system and is not possible to modify.
    More information about the Embedded boot loader can be found at the following link:
    https://www.st.com/content/ccc/resource/techn...0/0c/CD00167594.pdf/files/CD00167594.pdf/jcr: content/translations/en.CD00167594.pdf

    We can now proceed to writing our own startup.S code. What is this file? It is a file written in assembly language called assembler. The assembly language consists of two sets of instructions: ARM and thumb (thumb-2). You can read more about it in this thread:
    https://stackoverflow.com/questions/28669905/...e-arm-thumb-and-thumb-2-instruction-encodings
    In our case, we use the thumb-2 instruction set:
    http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf

    Code: text
    Log in, to see the code


    1. the .global directive makes the _start symbol visible to the linker (ld)
    2. the .thumb_func directive tells the assembler (as) that the next symbol points to the thumb instructions
    3. stack point address, whose value will be at address 0x08000000 flash. Currently, it will be set to the end of the SRAM memory, which can be read from the microcontroller documentation.
    4. vector reset, which will be located at address 0x08000004 flash. The content of this address contains the address for the reset function:
    5. Branch - jumping to the function with a label (address) without entering the address to lr (link register).
    6. The code should never be here.
    7. Infinite loop function.

    2. Linker script

    Now that we have our startup, we can start writing our linker script. The script below describes what code (and in what order) will go to each section and will provide us with information about the beginning and end of each section.

    Code: text
    Log in, to see the code


    1. MEMORY allocates names to memory spaces.
    2. Flash start and length. Information about the address in the documentation
    3. Start of memory and Ram length. Again, information about the address in the documentation
    4. section .text in this section is the executable code.
    5. section start address. We can also set the section address manually via . = 0x80000;
    6. Drop all text from objects into this section
    7 and 8. Putting specific text from objects into this space. Keep in mind that swapping the order of the objects will cause the code to not work properly!
    9. End of section address
    10. Indication of the address to which the given section will fly.
    11. The .rodata section contains all constant variables.
    12. The .bss (block started by symbol) section contains all non-initialized static variables.
    13. The .data section contains all initialized static variables. The entry "AT (__rodata_end__)" means that the section, although it will be located in SRAM, will be physically located in flash at the beginning and then it must be copied in cstartup to the ram.

    3. cstartup

    Now that we have our linker script, we can start writing our own crt0. This is a piece of code executed before the main function, which ensures that all static data is copied from non-volatile memory to volatile memory (in the case of static uninitialized variables, resetting them to zero).
    So this is what our _cstartup looks like:
    Code: text
    Log in, to see the code


    This is code written in C, but we can write it in asm without any problems.
    Global variables starting with extern are variables containing information about the addresses of the beginning and end of the section, which will be described in detail in the linker section. The above code clears the .bss space where uninitialized static variables are located, when all initialized static variables are transferred from non-volatile memory to the .data section.
    It is important to mention that if we want other sections to be in RAM as well (e.g. a section with code that is to be executed from RAM), they must be copied here as well.


    4. Main

    Once we have written all the elements related to the startup, we can proceed to writing our main function:
    [syntax=c]#define RCCBASE 0x40021000
    #define GPIOBBASE 0x48000400

    static int wymuszenie_bss;
    int wymuszenie_data = GPIOBBASE;
    const int wymuszenie_rodata = 400000;

    int main ( void )
    {
    unsigned int* ptr;

    wymuszenie_bss = 0x40021000;

    ptr = (unsigned int*)(wymuszenie_bss+0x14);
    *ptr |= 1

    Cool? Ranking DIY
    About Author
    _lazor_
    Moderator of Designing
    Offline 
    _lazor_ wrote 3795 posts with rating 1127, helped 259 times. Live in city Wrocław. Been with us since 2016 year.
  • ADVERTISEMENT
  • #2 17570228
    chudybyk
    Level 31  
    For those who are completely uninformed and trying to make their first project, it is worth adding that most of the above activities can be covered with a neat environment. There were some examples of using Eclipse C++ on the forum. For Eclipse, I suggest the plugin https://gnu-mcu-eclipse.github.io/ and the beginning is a bit easier, because the plugin will generate makefiles, startups, and linker scripts. The disadvantage is that you forcefully upload old STD libraries from ST, but you can throw them away and program without this "happiness package". Alternatively, you can combine with STM32_CUBE and struggle with "luck squared".
    It's definitely worth playing with this plugin and seeing how configuration files and compilation parameters change depending on the option you click.
  • #3 17570982
    Janusz_kk
    Level 39  
    And this is an example of why I will stay with avrstudio and avr, because 8 bits is enough for me, and for larger calculations I will use some pi or orange,
    because unfortunately, but you did not convince me to arm-s.
  • ADVERTISEMENT
  • #4 17571210
    _lazor_
    Moderator of Designing
    Raspberry pi is already ARM + GPU from brodcom ;) and this article was just inspired by writing bare metal under raspberry pi :)

    So I didn't even have to convince you :)
  • #5 17571220
    simw
    Level 27  
    Janusz_kk wrote:
    And this is an example of why I will stay with avrstudio and avr, because 8 bits is enough for me, and for larger calculations I will use some pi or orange,
    because unfortunately, but you did not convince me to arm-s.

    It would be good when writing something like this to give some arguments or examples, because you only sow unnecessary confusion. Your statement makes absolutely no sense. What didn't convince you to arm? That the boot procedure needs to be understood? Is it that the user can create his own start routine?
    Several things convinced me to STM32:

    - mature, ready-made development environments that can be installed and run the first working program in literally 5 minutes,

    - countless cheap development boards, you can choose at will, most of them equipped with a programmer and debugger,

    - unification of programming of many families of processors, easy code portability - between some families, in many cases it is enough to run the appropriate template and copy the codes to transfer the program to a more efficient uK,

    - the argument that usually for the "same" price we get a more efficient uK, richer in peripherals - this argument is often negated by 8-bit users, I don't know why,

    - advanced libraries for handling the USB and ETH stack available on the "slope" - I haven't had any needs in this area yet, but I easily run HID devices or a COM port simulator on development boards for $2,

    - an advanced debugger available immediately without any combination - I used to underestimate it, a diode was enough for me, but now I can't imagine working without a debugger - it makes work much easier,

    - interrupt prioritization or DMA controller,

    - the CubeMX (Cube) environment definitely facilitates work when analyzing project needs,

    - free, easy to use DSP library, I launched the first working pseudo spectrum analyzer in the first week of learning programming.


    I'm not a professional myself, I pick at UK in the evenings, making various more or less necessary gadgets, I'm not an eagle in C, despite this, currently RM for 1300 pages doesn't impress me anymore. I can move around in it quite freely, which does not mean that I understand everything - on the contrary. I recommend everyone to "transition" to arm'y - there is no need to limit yourself to 8 bit.
    It's fast, convenient, I hope perspective, in addition we have quite detailed documentation - so far I haven't even had to look at the errata - probably still ahead of me.

    Sorry, maybe a little off topic, but I decided that Janusz_kk's "unclear" allegation requires a firm reaction :)

    P.S
    I don't understand how you can be so tortured by staying with such an old avrstudio :) This eclipse environment is like a poor relative from across the sea :) One could still think about Atmel Studio, but the first one?

    Added after 30 [minutes]:

    _lazor_ wrote:
    6. Debugging
    Important! The compiler must build with the -g flag!


    Why does it have to? How to refer to ready-made environments in which I tinker myself, where, according to my knowledge, the "-Og" option very often "damages" proper debugging, it is rather better to debug with the "-O0" option - I hope I have not confused the terms - I think it's a compiler optimisation.
    Why my question? According to my experience with TrueStudio or System Workbench - debugging is better without code optimization - because every optimization, including "-Og" only makes the analysis more difficult - stepwise work causes incomprehensible "jumps", so to speak, of the cursor.
    Perhaps I have misconfigured OCD, but I'm running on default environments.
  • #6 17571519
    _lazor_
    Moderator of Designing
    simw wrote:
    _lazor_ wrote:
    6. Debugging
    Important! The compiler must build with the -g flag!


    Why does it have to? How to refer to ready-made environments in which I tinker myself, where, according to my knowledge, the "-Og" option very often "damages" proper debugging, it is rather better to debug with the "-O0" option - I hope I have not confused the terms - I think it's a compiler optimisation.
    Why my question? According to my experience with TrueStudio or System Workbench - debugging is better without code optimization - because every optimization, including "-Og" only makes the analysis more difficult - stepwise work causes incomprehensible "jumps", so to speak, of the cursor.
    Perhaps I have misconfigured OCD, but I'm running on default environments.


    Unfortunately, you are wrong, what you are writing about is optimization, which in the example I gave is not present at all (default value).
    The -g option for the compiler means:
    "Produce debugging information in the operating system's native format (stabs, COFF, XCOFF, or DWARF). GDB can work with this debugging information."

    https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options

    Optimization options can be found at this link:

    https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
  • #7 17571575
    simw
    Level 27  
    _lazor_ wrote:
    Unfortunately, you are wrong, what you are writing about is optimization, which in the example I gave is not present at all (default value).

    Well, behind my back I felt that it was about debugging options, but I didn't check it out of momentum.
    After all, in System Worbench we have:
    Basic use of the arm-none-eabi toolchain or what happens before main

    Thanks for the links, I'll read the pillow :)
  • ADVERTISEMENT
  • #8 17571595
    _lazor_
    Moderator of Designing
    Now I've actually tested how the code will work with -O3 and interestingly the compiler wants to optimize the code by using memset and memcpy, which we don't have because we don't use standard libraries. The solution to this problem is to add the -ffreestanding compiler option.

    The size of the code is almost doubled and the LED blinks much faster (due to the fact that the delay is done on the "nope" instructions).



    You also need to remember that every company or IDE uses slightly different tools and the flags of the compiler, assembler and linker may differ significantly.
  • #9 17572144
    vp32
    Level 11  
    Could you elaborate a bit on the "soft reset" thread.
    The first time I met this term and I was interested in what it is about, how it is called and maybe where it is described, e.g. for STM32?
  • #10 17572297
    _lazor_
    Moderator of Designing
    In general, it turns out that I mixed up the information from cortex-M with cortex-A. In the case of cortex-M, it seems that there is always an internal pull of Vdd to GND and starting the uC again.

    As for the cortex-A, I will provide some information in this post tonight.
    What about hard and soft resets?
    https://en.wikipedia.org/wiki/Reboot
  • #11 17572449
    vp32
    Level 11  
    _lazor_ wrote:
    What about hard and soft resets?
    https://en.wikipedia.org/wiki/Reboot

    You know, you didn't explain how it relates to STM32. How to implement it and why after this soft reset the program starts from the address 0x00000004
  • #12 17572569
    _lazor_
    Moderator of Designing
    On the cortex-m it is possible to detect the source of the reset, but the code will always start from the address 0x00000000, even if the VTOR was previously set to a different value (even software reset sets the default value for registers, except for the RCC registers).

    So I gave the wrong information in the article, which I corrected.
  • #13 17572577
    vp32
    Level 11  
    Ok thanks, I didn't notice the correction and still had what I read in my mind. :D
  • #14 17573152
    Bojleros
    Level 16  
    When it comes to documenting this knowledge, the article is great. I used to dig with this type of settings myself, but those were the times when the IDE for ARM could be either bought or assembled by yourself from publicly available pieces (I skip those with a code size limit). It's always been a bit painful but now, at least for ST, it's no longer a problem because there's AC6.
  • #15 17595053
    Atlantis86
    Level 19  
    Janusz_kk wrote:
    And this is an example of why I will stay with avrstudio and avr, because 8 bits is enough for me, and for larger calculations I will use some pi or orange,
    because unfortunately, but you did not convince me to arm-s.


    After all, no one is forcing you to operate STMs in a low-level way. You might as well use CubeMX, some convenient IDE, and HAL libraries. Then you won't even need to know that something like a Makefile or a linker script even exists. It also works the other way - if you wanted, you could write to AVRs in a similar way at low level.

    You perpetuate the well-known myth that 8-bit microcontrollers are easier than modern 32-bit ones. And they are not. On the contrary. When programming AVRs, you usually need to access registers directly. In the case of most modern systems, due to the extensive nature of the hardware, it would be too troublesome in most cases (although of course possible), so manufacturers provide convenient libraries that can configure and operate peripherals in a more natural, "human" way.

    In addition, the specificity of 8-bit systems also causes traps that are not present on more modern systems - such as the need to disable interrupts in the case of operations on variables that are not able to be performed within one cycle.

    RasPi-style single board computers are not always a good alternative to microcontrollers. Their restart after reboot requires power, disconnecting the power without shutting down the system can generate problems, despite the large processor, the speed of operation is limited due to the need to support an extensive operating system.
  • #16 17595103
    _lazor_
    Moderator of Designing
    Atlantis86 wrote:
    You perpetuate the well-known myth that 8-bit microcontrollers are easier than modern 32-bit ones. And they are not. On the contrary. When programming AVRs, you usually need to access registers directly. In the case of most modern systems, due to the extensive nature of the hardware, it would be too troublesome in most cases (although of course possible), so manufacturers provide convenient libraries that can configure and operate peripherals in a more natural, "human" way.

    It's not like you don't reference registers in cortex-m, it's just hidden under definitions that are more descriptive, but they're still registers.
    The libraries themselves are convenient, but until you want to do something more complex, you still need to know how it works and write it yourself. There is no difference between AVR and ARM here.

    Atlantis86 wrote:
    In addition, the specificity of 8-bit systems also causes traps that are not present on more modern systems - such as the need to disable interrupts in the case of operations on variables that are not able to be performed within one cycle.

    You still need to disable interrupts in critical code, and there are still instructions that don't execute in one cycle.

    Atlantis86 wrote:
    RasPi-style single board computers are not always a good alternative to microcontrollers. Their restart after reboot requires power, disconnecting the power without shutting down the system can generate problems, despite the large processor, the speed of operation is limited due to the need to support an extensive operating system.

    You don't need an operating system to write on the raspberry pi :)
  • #17 17595128
    Atlantis86
    Level 19  
    _lazor_ wrote:

    It's not like you don't reference registers in cortex-m, it's just hidden under definitions that are more descriptive, but they're still registers.
    The libraries themselves are convenient, but until you want to do something more complex, you still need to know how it works and write it yourself. There is no difference between AVR and ARM here.


    It's clear. However, the entry point is lower, precisely because using these definitions is more intuitive than writing registers directly. I'm just arguing with the myth that you should start with 8-bit processors as the easier ones.
    I say this as a person who started with AVRs a few years ago, then switched to PIC32, and is now experimenting with STMs. In the meantime, out of curiosity, I also made projects on "antique" platforms such as 8051, Z80 or 6502. Generally, after some time, the platform ceases to matter, as long as a compiler that adheres to standards is available.


    Quote:
    Still critical code needs to disable interrupts and there are still instructions that don't execute in one cycle.


    However, such situations are much rarer in the case of 32-bit microcontrollers. You could only afford AVRs if you were assigning a value to an eight-bit variable. Modern processors even allow bits to be manipulated in a single cycle. So situations where you can make a critical mistake are less frequent.

    Quote:
    You don't need an operating system to write on the raspberry pi :)


    Right. However, when it comes to writing under "bare metal", the support from the environment is much greater in the case of such STMs than the various SoCs on which popular single-board computers are based.
  • #18 17595173
    _lazor_
    Moderator of Designing
    Atlantis86 wrote:
    Quote:
    You don't need an operating system to write on the raspberry pi :)


    Right. However, when it comes to writing under "bare metal", the support from the environment is much greater in the case of such STMs than the various SoCs on which popular single-board computers are based.


    A twin article for raspberry bare metal is already being prepared :) Cortex-A is definitely a different world than cortex-m, but it's worth getting interested in them because the documentation provided by ARM is still very good, worse with the documentation about the SoC itself ...
  • #19 17596249
    _jta_
    Electronics specialist
    2. the .thumb_func directive tells the assembler (as) that the next symbol points to the thumb instructions

    As far as I know, .thumb_func causes the function name symbol to have a value 1 greater than its address (i.e. odd - code addresses are even). And this applies not only to the next function, but to all functions after this directive.

    Thus, if this symbol is used as a jump address, the lowest bit of the PC (program counter) will be loaded with 1, which means that instructions will be executed in Thumb mode. But it's not done by any instruction - only some instructions are allowed to load this bit in the PC (others keep it as it was); have the right to, for example, BX, BLX instructions (jump to the address from the register; BLX is a jump with saving the return address in the LR register), and - which is especially important in the exception/interrupt vector - exceptions/interrupt handling - the use of an even number there address would mean that the exception or interrupt will try to execute instructions in ARM mode, which the Cortex-M3 (this is probably the ARMv6-M architecture) does not have, so it hangs.

    If the processor can execute instructions in both modes, then they can be combined within one program - in particular, exceptions and interrupts can be handled in different modes, on return the processor will return to the previous mode - but the difficulty will be the fact that BL (function call) cannot change mode, need to use BLX (or BX).
  • #20 17596290
    _lazor_
    Moderator of Designing
    https://sourceware.org/binutils/docs-2.15/as/ARM-Directives.html

    "This directive specifies that the following symbol is the name of a Thumb encoded function. This information is necessary in order to allow the assembler and linker to generate correct code for interworking between Arm and Thumb instructions and should be used even if interworking is not going to be performed. The presence of this directive also implies .thumb"

    Of course, in the above code it is redundant, because the cortex-m does not support the ARM instruction set. Cortex-m is an overview of cores from ARMv6-M to ARMv8 in the near future (from ST it will be the STM32L5 series with m33).

    Of course, thank you for your substantive attention and if you notice anything else or you are not sure of the correctness, write boldly, I will be happy to discuss the problem.
  • #21 17597051
    _jta_
    Electronics specialist
    Of course, in the code above, this is redundant

    This is not redundant, on the contrary, it is necessary if the name is to be used to generate the 32-bit address of the function - the fact that the processor has only Thumb instructions does not mean that you can load an even address into the PC with impunity - it freezes the processor, I checked.

    I used this construction, and it worked:
    Code: ARM assembler
    Log in, to see the code

    - all functions defined by _func had odd addresses.

    However, the problem with (non)parity of the address only shows up when the instruction uses a 32-bit address - instructions containing a relative address do not encode the lowest bit of the address and preserve the lowest PC bit.
  • ADVERTISEMENT
  • #22 17597540
    _lazor_
    Moderator of Designing
    Yes I was wrong and you are right about .thumb_func. This is a very helpful note _jta_
  • #23 17600380
    Janusz_kk
    Level 39  
    Atlantis86 wrote:
    You perpetuate the well-known myth that 8-bit microcontrollers are easier than modern 32-bit ones. And they are not.

    No myth, of course they are easier to master, there are more materials besides the need to reach for 32 bits
    in simple controllers without a graphic screen it reaches 0%.
  • #24 17600579
    _lazor_
    Moderator of Designing
    I would like to support the thesis with arguments, preferably technical ones. Sandboxed arguments will not be tolerated.
  • #25 17601791
    chudybyk
    Level 31  
    Janusz_kk wrote:
    need to use 32 bits
    in simple controllers without a graphic screen it reaches 0%

    Something like that. I think I'm making a mistake that where I used AVRs before, I now load Cortex M0. They are cheaper and just as good, sometimes even better.
    Word width doesn't matter much to ease of programming. For me, the example of a friendly core design is the Motorola MC68000 series. A clear instruction list, a very versatile set of registers, continuous memory, etc., coding it in assembly language is a pleasure.
    But who plays with assembler today? Microcontrollers are programmed in C or even C++, not caring about the construction of the CPU. I think that 8-bit enthusiasts are convinced by the low level of complexity of the peripherals, not the core itself. In Cortex, you need to read a bit about setting the timing, turn on peripheral functional blocks, or use DMA. The horned devil is kinder and the porcupine is cozier! ;-)
    I will agree that if 8-bit had its economic justification, I would still find a use for them, but it does not look like it.
  • #26 17602108
    _lazor_
    Moderator of Designing
    If we look at the casings, 8bit architectures actually have a big plus, because you can get really tiny casings, which we will not find with cortex-M or other 32bit architectures.

    However, I witnessed a company making a home temperature control system and they used... Cortex-A with Linux... Unfortunately, the solutions will go in this direction rather than the most optimal...
  • #27 17612271
    _jta_
    Electronics specialist
    It seems that when it comes to modules, some Arduinos can be programmed directly via USB; what is cheaper STM32 requires adapters, e.g. USB to UART TTL, or RS-232 to UART TTL, or USB to SWD. From this I guess RS-232 to UART TTL is easy to do, but it's extra work to get started. The STM32F103C8T6 module, which is price-competitive with Arduino clones, has USB, but its bootloader works only with USART1 - you need to upload the USB bootloader to the flash.
  • #28 17612321
    _lazor_
    Moderator of Designing
    This is not true, ST boards just have STLINK on board, which is a debugger and programmer. It has JTAG and SWD exposed, so it can also be used for self-made circuits (there is a chance to break STLINK and use it completely separately).

    Arduino is a creation with a shell, not a specific core architecture. ST releases boards with specific layouts, no frills. Of course, I'm not saying that it's the only right manufacturer for cortex-m, after all, there are also NXP with their kinetis, TI with their lunchpads or Infineon with XMC (see XMC 2Go).
  • #29 17612591
    _jta_
    Electronics specialist
    So what board from ST for a price of up to a dozen zlotys can I immediately connect to the PC? Because an Arduino clone at this price I can. LPC4370 can also, but it's a different price shelf.
  • #30 17612659
    _lazor_
    Moderator of Designing
    Fast prices with kamas
    STM32L100C-DISCO - PLN 39
    XMC 2Go - PLN 40
    NUCLEO-L432KC - PLN 49
    STM32F3348 - PLN 80 (and it has everything you need to start playing with power electronics)

    Maybe for someone PLN 20 makes a difference, for them it is negligible.

Topic summary

The discussion revolves around the use of the arm-none-eabi toolchain for programming microcontrollers, specifically focusing on the startup code that executes before the main function. Participants share insights on development environments, with suggestions for using Eclipse with the GNU MCU Eclipse plugin for easier project setup. The conversation highlights the advantages of STM32 microcontrollers, such as mature development environments, cost-effective boards, and code portability. There are debates on the ease of programming 8-bit versus 32-bit microcontrollers, with some arguing that modern 32-bit systems offer more intuitive programming experiences despite their complexity. The topic also touches on debugging options, reset types in Cortex-M, and the importance of understanding the boot procedure. Additionally, there are discussions about the availability and pricing of various STM32 boards compared to Arduino clones.
Summary generated by the language model.
ADVERTISEMENT