STM Code bloated?

doctek
Wed Jul 25, 2018 4:25 am
I am attempting to create some very small Arduinos. I’ve used the AVRTiny84, but it only has 8K of flash. Some of the ST chips are not much larger (in UFQFN packages), but have 32K of flash, still in a very low power, inexpensive package. I’m making a breakout board, but to get the software going, I’m working with an L053 Nucleo and an L031 Nucleo. My first projects (after blink) have been a simple I2C interface with a few reads and writes. Very simple.

The software has all worked just fine! I’m very happy except for one problem. The code size is huge! Its over 22K with debug on, and still over 20K with Smallest optimization. That’s nearly 70% of the code space chewed up to do a simple task! Since I can do this easily in the Tiny84, I see this as a show stopper!

Perhaps this forum can shed some light on why the ST code is so bloated and what approach I might use to deal with it.

Thanks for any suggestions.


doctek
Wed Jul 25, 2018 5:13 am
Hmmm. After my bold accusation of code bloat, I thought I should collect some data.

Program -Teensy2- Teensy3.2- L031
Blink 2,410 11,592 10,248
I2C 5.016 21,212 20,780

Looks like the problem is the use of the ARM, not the STM implementation. So I guess my question just needs to be rephrased: Why do the ARM processors have so much overhead? Is there a way to reduce it, or am I faced with needing to use a device with a larger flash (ala Teensy 3.2)? I don’t see how using an L031 would be a gain over just using the Tiny84.


fpiSTM
Wed Jul 25, 2018 6:08 am
Hi, there some optimizations on the pipe:
https://github.com/stm32duino/Arduino_C … issues/228
https://github.com/stm32duino/Arduino_C … issues/274

One other optimization which save space is to use LL for clock config, this will save some space, example here:
https://github.com/stm32duino/Arduino_C … -373646769


stevestrong
Wed Jul 25, 2018 6:59 am
A short search in the forum (you know, the small unimportant field on the top right part of the page which is not much used ;) ) would have reveal a lot of discussions about that…

dannyf
Wed Jul 25, 2018 10:26 am
Why do the ARM processors have so much overhead?

1. programming approach: for the larger / more complex chips, the programming approach tends to be more modular. that means more layers and more code for the same functionality on a 8-bit chip;
2. code density is generally lower on a 32-bit chip vs. a 8-bit chip.
3. your particular implementation.


ag123
Wed Jul 25, 2018 3:46 pm
the libmaple stm32duino core has a usb-serial component build into the sketch, it is part of stm32duino core since the leaflabs maple. This is more than simply a uart and it do away with a separate usb-serial chip on board, and now we are going beyond simply usb-serial to other usb functionalities, but that’s a different topic

if you are willing to do without usb-serial, you could edit boards.txt and undefine -DSERIAL_USB build flag
i’m not too sure if you simply include a header file and place #undef SERIAL_USB would that be the same as building without usb-serial
without usb-serial you’d not be able to do things like Serial.write(“any message”); to send anything back to your serial console

the other things are the compiler and various optimization flags e.g.
-fno-exceptions no exceptions
-fno-rtti no run time type info
-fno_use_cxa_atexit no use cxa atexit
-fno-threadsafe-statics – no thread safe statics
-nostdlib – no startup or default libs
-Xlinker –gc-sections remove unused sections (this can save lots of space)
-specs=nano.specs – use newlib-nano

some of these are probably in platform.txt or boards.txt, so have backups if you are editing them.
using all the above flags actually still allow my sketches to run ok, but your mileage may vary


doctek
Wed Jul 25, 2018 9:45 pm
Thanks for the prompt, helpful responses!
fpiSTM – I’ll have a look at those items tonight.
stevestrong – Searching for L053 or L031 brought up only my own very recent posts. That’s why I posted.
ag123 – Hmmm. I’ll look closely into the usb-serial. I’ll also check out the other items you mention.

Thanks, all!


stevestrong
Thu Jul 26, 2018 7:36 am
I just realized that it is ST core related, so I move this topic there.

doctek
Fri Jul 27, 2018 12:24 am
Thanks for putting this in the proper forum, stevestrong.

Following what seems like the most direct path to reducing the code size, I’m trying to get rid of the uart, and hopefully the usb. But I’m not making very good headway. Here’s what I’ve done.

In variants/NUCLEO_L031K6, I modified variant.h and stm32l0xx_hal_conf.h to comment out uart and serial port defines. However, when I try to compile (from the Arduino environment), UART_HandleTypeDef is undefined. The problem is that the term is defined in stm32l0xx_hal_conf.h, but that file is not included because of the mods I’ve described. So when uart.h is included from boards.h, I get the error.

You’re likely thinking, Why not modify boards.h to not call uart.h? Well, that leads to further problems. I went down that path, even commenting out large sections of code in various files. Where it ends is that the Mass_StorageUpLoadMethod calls syscalls_stm32.c which calls uart_debug_write which is not found (because I’ve pulled out the uart, remember?). The source for these is not included, so I can’t do anything about it. Switching to StLink upload has the same problem. I don’t see how to specify “no upload” in the Arduino environment, so I’m stuck in this route also.

Any suggestions for how to proceed will be most appreciated. Getting rid of the uart seems like an important step.


doctek
Fri Jul 27, 2018 4:13 am
More experimentation. Following another ag123 suggestion, I modified boards.txt to remove {build.enable_usb} {build.xSerial} from the build.extra_flags. I removed all other changes. Code built with no problems, but is exactly the same size.

Perhaps I am not making the modifications in the right place. I really have no good idea and appreciate all suggestions.


Rick Kimball
Fri Jul 27, 2018 4:22 am
use arm-none-eabi-nm on the .elf file, sort by size, start with the largest memory users and figure out how they get included and if you really need what they provide

fpiSTM
Fri Jul 27, 2018 6:17 am
By default USB is not defined.
For USART simply select disable in the Serial interface menu.

stevestrong
Fri Jul 27, 2018 7:49 am
I use AMAP to analyze the generated map file, you can then figure out which segment is the largest.

doctek
Sat Jul 28, 2018 8:37 pm
Thanks to all for the postings! I’ve used nm in the past, so I tried AMAP. Worked great and is very helpful.

First thing I discovered is that selecting No Serial made no difference in code size. Neither did removing the Upload method menu and actions in boards.txt. Still 22,680 in Debug mode.

Since what i wanted to do was get rid of uart stuff, I simply moved the uart.c library out of the build path. Then I discovered that uart_debug_write was being called from syscalls_stm32.c. I removed that call and things built. Code size is now 17,208 in Debug mode. Drops to 15,048 if optimization is set to Smallest. Checking the application, it still works!

Still more work to do, but this is real progress!

Again, thanks so much for the help!


fpiSTM
Sat Jul 28, 2018 10:15 pm
Sound strange, as uart.c is under:
#if defined(HAL_UART_MODULE_ENABLED)

doctek
Sun Jul 29, 2018 5:20 pm
fpiSTM: I’m using version 1.2.
ag123: All of the optimization flags except -fno-use-cxa-atexit are already in the platform.txt. I added the cxa one, but didn’t get a separate measurement of just how much difference it makes.

Further experiments: I noticed that syscalls_stm32.c has stuff like _sbrk, _signal, etc. I don’t think I need these, but removing syscalls caused lots of problems, so I couldn’t do that.

I did remove ipAddress.cpp and Print.cpp, as well as the -Dprintf+iprintf in platform.txt. Now the code size is 15.500 (Debug) and 13,636 (Smallest)

For comparison, I compiled with Teensy3.2, no usb and Smallest optimization. Code side is 11,372. I also modified the teensy platform.txt to create a map so I could use AMAP to do some more comparison.

Looks like the teensy includes about 5,452 bytes of serial and HardwareSerial stuff, so the actual code size is closer to 5,467 for just the I2C stuff. Trying to figure out just how much of the L031 code is likely unnecessary is more difficult, but here are some observations.
Stuff I might be able to remove or reduce:
lib_a: 2608 (has the _sbrk stuff, etc.)
HAL_rcc: 3256 (seems like a huge amount to set up the clocking?)
Stuff for the i2C:
HAL_i2c: 3256
Wire.cpp: 1184
twi.c: 928

Anyway, I feel like I’m making some headway toward reducing the code size. Thanks again for all the help and advice.


Rick Kimball
Sun Jul 29, 2018 6:20 pm
Maybe you should just grab STM32CubeMX and truestudio, generate for LL and move on?

doctek
Sun Jul 29, 2018 6:54 pm
That’s a good idea and a reasonable approach. My only concern is whether or not LL will be supported going forward. Seems like I’ve seen notices that said it wasn’t. Should I care about that?

Thanks!


Rick Kimball
Sun Jul 29, 2018 7:04 pm
They wrote LL because everyone was being hostile towards HAL and its bloat.

LL mostly looks like older SPL (Standard Peripheral Library) stuff. I don’t work for ST but I would make a guess that is going to be supported going forward as a way to quell the revolt towards HAL.

There is an SPL -> LL code converter application. I think many developers were sticking with SPL because of HAL. (Again anecdotal observations on my part )


fpiSTM
Sun Jul 29, 2018 9:20 pm
LL is supported.
Move to 1.3.0 to remove USART related code.
And use LL for systemclock config will reduce code size.

Rick Kimball
Wed Aug 01, 2018 12:51 am
I have a personal project that uses my own core. I’m focused on making the code small by using c++ templates and register access using the defines provided by the stm32f103xb.h device header from STM32CubeMX. I don’t use any of the cube code, just that header and their clock initialization routine.

The other day someone ( @human890209 ) posted some code using new/malloc() and it didn’t work with Sloeber. I ported that code over to my core and compiled it with the latest arm-none-eabi-gcc (7.3.1) using the gnu++17 standard. It surprised me in that it completely eliminated the new and malloc calls and replaced them with a constant. Granted the code was very simple so the compiler could see what is going on. What I find really nice is that, while I sleep happily dreaming, someone else is making improvements that just show up on my doorstep.

Here is the code:
/* fabooh code highlighting the efficiency of gnu++17 and gcc 7.3.1 */
class MYCLASS {
public:
uint8_t content;

MYCLASS() : content(99) { }
~MYCLASS() { }
};

LED_BUILTIN LED_BUILTIN_;
serial_default_t<115200, CPU::frequency, TX_PIN, NO_PIN> Serial;

void setup() {
Serial.begin();
pinMode(LED_BUILTIN_, OUTPUT);
}

void loop() {
static unsigned x = 0;
unsigned ts;

digitalWrite(LED_BUILTIN_, HIGH);
do {
ts = millis();
} while ( (ts + 1000) < x);
x += 1000;

Serial << "curr millis()=" << ts << "\r\n";
digitalWrite(LED_BUILTIN_, LOW);
delay(50);

{
MYCLASS mc;
MYCLASS * const mcp = &mc;

Serial << (mcp->content + 1) << "\r\n";
}

{
MYCLASS * const mcp = new MYCLASS;

Serial << (mcp->content + 1) << "\r\n";
delete(mcp);
}

{
void * p = malloc(sizeof(MYCLASS));
MYCLASS * const mcp = new(p) MYCLASS;

Serial << ( mcp->content + 1) << "\r\n";
free(mcp);
}

}


doctek
Fri Aug 03, 2018 4:34 am
Very inspiring! Right now I’m looking at using the LL stuff from CubeMX to create my own I2C code for my board that will work with Arduino. It may be too much for me, but it’s good exercise just figuring out how to do it. Thanks for the encouragement!

doctek
Fri Aug 10, 2018 4:30 am
As I explore further, I’ve made a couple of discoveries that may be helpful.

First, I noticed that all the LL interface code is in with the HAL interface code, so I’m hoping that all I need to do is use it. That means sections like TWI.c have to be re-written, but the LL pieces should be available. I’m going to start by trying to use the LL version of system_clock_config.

Second, I had a look at an old favorite: Geoffery Brown’s classic Discovering the STM32 Microcontroller. One section that really caught my eye is on newlib (libc). This libc adds about 2600 bytes to my code and is totally unnecessary for my simple i2c program. The question is: HOW do I get rid of it?? I haven’t found where it is being pulled into the linked code. Anyone have any idea on this?

TIA!


tve
Fri Aug 10, 2018 5:26 am
I’m not an authority, but as far as I can tell newlib is being pulled in with the GCC toolchain config and it provides a pile of pretty essential functions that make C & C++ work. There are smaller libraries that provide the same functionality, you could go down that path… The thing is that your code may turn out not to call anything in that library, but some code in the STM Core probably does. In that sense I have good news and bad news for you ;-). The good news is that GCC is very good at only linking in the functions that are actually referenced. So while newlib may be large, the portion that ends up in your binary may be very small. The bad news is that the biggest pig you’re dragging in from newlib is (most likely) printf (and all it depends on) and if you grep for printf in the STM Core you will find that it is used to print some very simple error messages that are actually plain strings and could be printed using puts. I hesitated whether to submit a PR but then decided I’m willing to trade-off the convenience of printf in my own sketches for the size, so I gave up. :lol: :lol: :lol:

fpiSTM
Fri Aug 10, 2018 6:46 am
Right, printf is the good track. That’s why I’ve already remove it from Error_Handler.
This is one another point in my huge todo list. :roll:

tve
Fri Aug 10, 2018 7:46 pm
In the context of avoiding bloat…, having done some ADC implementation using the LL libraries I’m wondering what the pros and cons are compared to using just the CMSIS headers and I thought I’d ask those of you that have more experience with the various ST libraries and uC series…

Here is a trivial example, which is to start an ADC conversion. Using the LL library I’d write:
LL_ADC_REG_StartConversion(ADC1);


doctek
Fri Aug 10, 2018 10:36 pm
The little test program I am working with (only does a few I2C transactions – results just saved to an array) has been compiled for the STM32L031 and for the Teensy 3.2 (uses a kinetisis ARM). I think it’s very worth noting that the Teensy3.2 map does not show any sign that libc is being used! It is also noteworthy that a direct implementation using the code from STMCubeMX does not invoke libc. Since its clearly not needed, I want to get rid of it!

How do I pull the hooks out that invoke it??

Thanks!

BTW, regarding the LL functions, that’s the other thing I’m working on. Right now I’m trying to use the LL system_clock_config, but it won’t build. I have to include the LL header files. Then I can know how much difference it makes.


doctek
Fri Aug 10, 2018 10:41 pm
One more point regarding libc. It seems that libc brings in several “support” functions even if they are not called elsewhere in the code. And the code that is used is designed for larger systems (read: memory is free), and so is bloated by design for use in small systems. I refer you to Brown’s book referenced above.

doctek
Sat Aug 11, 2018 12:48 am
Here are the results of using the LL version of system_clock_config. I find this almost amazing!

With the HAL version, code size was 13,784 (smallest optimization). Looking at the map, I was able to attribute 2,112 to HAL_RCC calls.
Using the LL version (nothing else changed!), the size is now 11,556! The difference is almost exactly the HAL_RCC code size. I now see NO calls to HAL_RCC stuff in the map. In fact, I don’t even see any calls to LL_RCC functions. And, yes, the code still runs correctly.


doctek
Sat Aug 11, 2018 8:26 pm
Results of another experiment:

Before I tackle the I2C implementation using LL, I thought I’d start with the simple Blink program. Using the same mods I did to the I2C program, Blink compiles to 6420 bytes. Looking at the map shows that Lib_a (libc) is gone (2608), Wire.cpp (1184) is gone, and all but 150 or so of twi.c (~930-150=780) is gone. So, compared to my I2C program that’s 11556 – 4072 = 7484. Actual is 6420; so about 1000 reduction besides the big ones. BUT the libc is clearly NOT required for a basic Arduino sketch. I will continue to try to figure out how to get rid of it.

A further observation is that the HAL_I2C continues to be pulled in, even though it is not used at all. That’s 2408 bytes of bloat. So I’m trying to find out why that’s in Blink when it is clearly not used.

And I haven’t even considered how to use LL for doing Blink!

Lots of fun!


heisan
Sun Aug 12, 2018 11:56 am
The gcc static linker is actually very good. It does not include ANY code from libc.a in the final binary, unless it is actually used.

Looking through the .elf file for one of my projects, I see the biggest chunk of libc is malloc_r, which is used by the cxa_atexit code (which is required for strict C++ standards compliance). I should be possible to turn this off with the -fno-use-cxa-atexit compiler flag.

For the rest of libc, many build environments provide optimised replacements for common functions, like malloc() and friends, and/or include stripped down libcs like uclibc to reduce the size of these required functions when they are used.

Looking at the dumps, the biggest win would be to add simpler malloc/free implementations.

Removing libc entirely is not really an option (for example, static array initialisers are implemented internally with libc functions). Any use of a non-static C++ object uses a LOT of libc functions. You can try to modify your code to avoid all such internal uses, but then you will end up with trivial code. Much better to look into replacing the default libc functions with simpler/smaller variants.


Squonk42
Sun Aug 12, 2018 12:37 pm
Check https://keithp.com/blogs/embedded-arm-libc/ and the corresponding repo https://keithp.com/cgit/newlib.git/.

doctek
Mon Aug 13, 2018 4:38 am
heisan: Are your observations for the Arduino environment? I ask because I see a large chunk of libc pulled in, yet my code shouldn’t be using it.

Squonk42: Thank you for this pointer! Keith P. is a brilliant guy and does great work. Now if I can just figure out how to incorporate his library into the Arduino environment, my problem could be on the way to a solution!


heisan
Mon Aug 13, 2018 7:51 am
[doctek – Mon Aug 13, 2018 4:38 am] –
heisan: Are your observations for the Arduino environment? I ask because I see a large chunk of libc pulled in, yet my code shouldn’t be using it.

The symbols I gave were from the spiscanner in Roger’s core, compiled in Arduino 1.8.5. I followed all the calls by using ‘objdump -d xxxx.elf’ and all included libc functions were called from the application.


heisan
Mon Aug 13, 2018 7:54 pm
Was just double checking. If you look at the map file, there are a number of sections. At first all symbols are listed – but later there is a section for ‘Discarded symbols’… So it is quite difficult to see what is linked by looking at the map file.

The ELF file does not list the source file, but it only lists the symbols that were actually included.


Squonk42
Mon Aug 13, 2018 8:59 pm
[heisan – Mon Aug 13, 2018 7:54 pm] –
Was just double checking. If you look at the map file, there are a number of sections. At first all symbols are listed – but later there is a section for ‘Discarded symbols’… So it is quite difficult to see what is linked by looking at the map file.
The ELF file does not list the source file, but it only lists the symbols that were actually included.

The map files first lists the included archive members, then the allocated common symbols, then the discarded sections, the general memory configuration, the linker script, and eventually the memory map, such that it actually contains 6 different line formats within one file (!!!).

Example:
Archive member included to satisfy reference by file (symbol)

/tmp/arduino_cache_163115/core/core_STM32_stm32_GenF103_pnum_BLUEPILL_F103C8,flash_C8,upload_method_serialMethod,xserial_generic,opt_osstd_37797f055aaae0f124e551a637c4a2ed.a(startup_stm32yyxx.S.o)
(--whole-archive)
/tmp/arduino_cache_163115/core/core_STM32_stm32_GenF103_pnum_BLUEPILL_F103C8,flash_C8,upload_method_serialMethod,xserial_generic,opt_osstd_37797f055aaae0f124e551a637c4a2ed.a(board.c.o)
(--whole-archive)
...
Allocating common symbols
Common symbol size file

errno 0x4 /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/../../../../arm-none-eabi/lib/thumb/v7-m/libc_nano.a(lib_a-reent.o)
uwTick 0x4 /tmp/arduino_cache_163115/core/core_STM32_stm32_GenF103_pnum_BLUEPILL_F103C8,flash_C8,upload_method_serialMethod,xserial_generic,opt_osstd_37797f055aaae0f124e551a637c4a2ed.a(stm32yyxx_hal.c.o)
pFlash 0x20 /tmp/arduino_cache_163115/core/core_STM32_stm32_GenF103_pnum_BLUEPILL_F103C8,flash_C8,upload_method_serialMethod,xserial_generic,opt_osstd_37797f055aaae0f124e551a637c4a2ed.a(stm32yyxx_hal_flash.c.o)

Discarded input sections

.text 0x0000000000000000 0x0 /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crti.o
.data 0x0000000000000000 0x0 /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crti.o
.bss 0x0000000000000000 0x0 /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crti.o
...
Memory Configuration

Name Origin Length Attributes
RAM 0x0000000020000000 0x0000000000005000 xrw
FLASH 0x0000000008000000 0x0000000000010000 xr
*default* 0x0000000000000000 0xffffffffffffffff

Linker script and memory map

LOAD /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crti.o
LOAD /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crtbegin.o
LOAD /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/../../../../arm-none-eabi/lib/thumb/v7-m/crt0.o
LOAD /home/mstempin/.arduino15/packages/STM32/tools/CMSIS/5.3.0/CMSIS/Lib/GCC//libarm_cortexM3l_math.a
START GROUP
LOAD /tmp/arduino_build_254597/sketch/slave_sender_receiver.ino.cpp.o
...
0x0000000020005000 _estack = 0x20005000
0x0000000000000200 _Min_Heap_Size = 0x200
0x0000000000000400 _Min_Stack_Size = 0x400

.isr_vector 0x0000000008000000 0x10c
0x0000000008000000 . = ALIGN (0x4)
*(.isr_vector)
.isr_vector 0x0000000008000000 0x10c /tmp/arduino_cache_163115/core/core_STM32_stm32_GenF103_pnum_BLUEPILL_F103C8,flash_C8,upload_method_serialMethod,xserial_generic,opt_osstd_37797f055aaae0f124e551a637c4a2ed.a(startup_stm32yyxx.S.o)
0x0000000008000000 g_pfnVectors
0x000000000800010c . = ALIGN (0x4)

.text 0x000000000800010c 0x44f4
0x000000000800010c . = ALIGN (0x4)
*(.text)
.text 0x000000000800010c 0x6c /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crtbegin.o
.text 0x0000000008000178 0x10 /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/../../../../arm-none-eabi/lib/thumb/v7-m/libc_nano.a(lib_a-strlen.o)
0x0000000008000178 strlen
*(.text*)
.text.loop 0x0000000008000188 0x2 /tmp/arduino_build_254597/sketch/slave_sender_receiver.ino.cpp.o
...


heisan
Mon Aug 13, 2018 9:23 pm
Thanks, forgot about nm…

`nm -C –size-sort i2c_scanner_wire.ino.elf | less`

gives a very nice idea of where to start optimising…


000002bc B tft
000002d0 T Setup0_Process
000003f0 T Adafruit_ILI9341_STM::begin(SPIClass&, unsigned long)
00000408 D __malloc_av_
00000428 d impure_data
00000500 r font
00000538 T _malloc_r

Malloc itself is the single biggest function, and it’s initialised __malloc_av_ structure is the single biggest chunk of RAM…

EDIT: Bah – just tried replacing it as a test, but malloc() and free() are strong symbols in the default libc, so can not be overloaded at compile time…


heisan
Mon Aug 13, 2018 10:03 pm
EDIT2: A quick and dirty hack of:

objcopy –weaken libc.a

And then copy/paste the K&R reference malloc/free into the .ino file saves ~3k of flash and 1k of RAM…


doctek
Tue Aug 14, 2018 1:07 am
stevestrong suggested using AMAP (there’s a link on Page 2) to look at the map. I find it very helpful.

Squonk42 – Regarding the newlib version from KeithP, I cloned the repository but I don’t see the debian directory that shows when I look at his web site. It is also not at all clear to me how to build the library so I can use it. Any advice would be welcome! This looks very promising if I can just figure it out.

hesian – Your use of objcopy is very interesting. I have not see this trick. Thanks for sharing it.


heisan
Tue Aug 14, 2018 7:40 am
Thanks. AMAP looks interesting… I have always used binutils, but this does seem to have a lot of nice features.

As for objcopy, use with extreme care (or better yet, only weaken specific symbols you want to replace). If you do what I did, then ALL symbols in libc are weak. Using a symbol of the same name in your application (or any library used by your application) will replace the libc symbol with no warning. If you accidentally replace an important one (like brk()) your application will break in extremely interesting ways.

Also need to keep in mind that some functions operate on the same internal data structures, so must be replaced as a set (eg malloc+free+calloc+realloc+memalign+friends). I only replaced malloc and free – but only after ensuring none of the other functions were referenced.


heisan
Wed Aug 15, 2018 6:14 pm
I have been playing a bit more, and I realise that there is actually quite a bit of code for managing ‘exiting’ from a program. But this is not possible in an Arduino environment, and it can all be removed.

On the original Arduino core, exit(0) compiles to ‘cli(); while(1);’ – so disable interrupts and spin forever. There is no lower level OS to return to, so we can’t actually exit!

First step in cleaning up is to add ‘-fno_use_cxa_atexit’ to cpp flags. This removes code to call an indefinite number of static object destructors, when the application exits (which can never happen, so is useless).

Even with this flag, the compiler will call atexit() instead, which is almost as inefficient.

With weakend libc symbols, you can add:
int atexit(void (*function)(void)) {return 0;}
void exit(int status) {while(1);}


doctek
Mon Aug 20, 2018 11:18 pm
The last few days I’ve been investigating the libraries that are linked into my Arduino Blink program. The main tools I’ve used are AMAP (mentioned earlier) to look at the symbol map, and the verbose flag for the linker. The verbose option is passed by adding -Wl,-verbose to the linker recipe in platform.txt.

heisan – How did you figure out which version of libc to weaken? There seem to be so many versions, and I can’t really figure out how the Arduino build system decides what ones to use. I see nowhere that the paths are clearly defined. All I can go by is the verbose output from the linker.

The verbose output of the linker reveals that several libraries are accessed. These include libc_nano.a, libm.a, libgcc.a, libstdc++_nano.a, and libc.a. However, the map shows that only libc_nano.a and libgcc.a actually contribute code to Blink. What part do the other libraries play and why are they accessed? I’d like to understand this. All libraries except libgcc.a are located at STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/arm-none-eabi/lib/thumb/v6-m/. The libgcc.a is located at STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/lib/gcc/arm-none-eabi/6.3.1/thumb/v6-m/. Note that the verbose output does not reveal which symbols from the accessed libraries will appear in the map, and presumably the executable image.

What I would really like to figure out is how to build Keith Packard’s version of newlib. I’ve looked at trying to use the Makefiles that come with the git archive from KeithP, but the automake/autoconfig stuff is just too complex for me to decipher. I’m thinking that just copying the files and doing the appropriate gcc compilation and building the library is the correct approach. But then I get to the question: What flags should I specify? And then how do I get the linker to use the new library so I can test it? Finally, what header file should be specified for programs to use and what linker flags should I use for them? I’m sure a lot of this I’ll have to answer for myself, but if anyone else can help out, I’d be eternally grateful! And I expect so would the STM32Duino community!!


heisan
Tue Aug 21, 2018 8:03 am
[doctek – Mon Aug 20, 2018 11:18 pm] – heisan – How did you figure out which version of libc to weaken? There seem to be so many versions, and I can’t really figure out how the Arduino build system decides what ones to use. I see nowhere that the paths are clearly defined. All I can go by is the verbose output from the linker.

Look in the .map file – it provides the full path of the libraries being included. I think it was the ‘v7-m’ version, but I don’t have my Arduino stuff at work…

EDIT: For linker flags, look in the .spec files in the core tree – they contain examples of how to switch to the nano lib, should be possible to mod them to switch to other libs too. Although you should swap the header files too, this is usually not necessary, as C prototypes are standardised.


doctek
Mon Aug 27, 2018 4:51 am
After a few false starts, I managed to build the tinystdio section of newlib. Then I modified the link recipe in platform.txt to use my library. Wile the library was found and opened by the link process, none of the files were used. A quick investigation showed the obvious reason: the names of the functions in the library (thumb/v6-m/libc-nano.a) all begin with lib_a-. For example, the getc function is named getc in tinystdio library, but is lib_a-getc in the libc-nano version. Therefore, my attempts to provide an alternate library fail.

So what to do? First, why are standard library functions given names with a lib_a- prefix? Second, how should I replace them? Rename each one to have the prefix, or is there a more effective way?

As always, any help or guidance is greatly appreciated!


heisan
Mon Aug 27, 2018 7:41 am
I think you need to check your link recipe.

I have double checked, and the symbols in libc_s.a are ‘getc’. The symbol comes from the file ‘lib_a-getc.o’ – but the original filename makes no difference at link time, only the symbol name.

Unless you weaken libc.a (or libc_s.a is you are using nano.specs), you can not just replace stdio – you need to replace the whole libc…


Squonk42
Mon Aug 27, 2018 7:53 am
IIRC, the toolchain (compiler, assembler, linker and other binary tools) are tied to a given libc.

This comes from the chicken and egg problem: the compiler needs a libc, but you need a compiler to compile the libc…

In fact, when you build the toolchain, a first libc based on newlib is created, that is used to compile the compiler, which is used to compile the final libc (standard libc or newlib) before compiling the final compiler. It is getting more complex when building a cross compiler or even worse, a Canadian-Cross compiler (host != build != target):
Image

My guess is that you have to recompile the whole cross-toolchain to work with the final newlib.

I suggest to use an automated tool like crosstool-ng.


heisan
Mon Aug 27, 2018 7:59 am
[Squonk42 – Mon Aug 27, 2018 7:53 am] –
IIRC, the toolchain (compiler, assembler, linker and other binary tools) are tied to a given libc.

This comes from the chicken and egg problem: the compiler needs a libc, but you need a compiler to compile the libc…

While it is true that you need libc to build and run the compiler, the final libc on the target does not have to be the same one. As long as it provides all the symbols required by the C standard the compiler is built to, you should be fine.

All the compiler specific library code is placed in libgcc – and that must not change.


Squonk42
Mon Aug 27, 2018 8:59 am
It is not that simple, the compiler may use builtin functions, and functions may call other functions.

So as you said, unless you weaken the whole libc, you should replace the whole thing by newlib, not only the stdio part.

What about using the smaller newlib-nano if multithreading is not required?


heisan
Mon Aug 27, 2018 10:02 am
[Squonk42 – Mon Aug 27, 2018 8:59 am] –
It is not that simple, the compiler may use builtin functions, and functions may call other functions.

So as you said, unless you weaken the whole libc, you should replace the whole thing by newlib, not only the stdio part.

What about using the smaller newlib-nano if multithreading is not required?

If you stick to public APIs you can mix and match pretty much as you want. The API documents will list if there are related symbols that need to be replaced as a family.

I have an ARMv6 product in the field with only stdio and malloc functions replaced over a stock libc. Well over a million operational hours (aggregate) without any software issues.


Squonk42
Mon Aug 27, 2018 11:46 am
APIs do not describe internal states (causing re-entrance problems), side-effects or timing-related issues, and the fact that something did not happened is not a proof that it will not happen :)

I got bitten when changing a malloc/free implementation on a memory-constrained device because of memory fragmentation. I can give some more examples related to different stack usage, structure alignment that only caused problems in very specific conditions (the worst, according to Murphy’s law).

Changing only part of a libc is more or less tinkering, a more general approach would be to have the choice between several libc implementations like plain GNU libc, uclibc, newlib, newlib-nano…


heisan
Mon Aug 27, 2018 12:04 pm
[Squonk42 – Mon Aug 27, 2018 11:46 am] –
APIs do not describe internal states (causing re-entrance problems), side-effects or timing-related issues, and the fact that something did not happened is not a proof that it will not happen :)

I got bitten when changing a malloc/free implementation on a memory-constrained device because of memory fragmentation. I can give some more examples related to different stack usage, structure alignment that only caused problems in very specific conditions (the worst, according to Murphy’s law).

Changing only part of a libc is more or less tinkering, a more general approach would be to have the choice between several libc implementations like plain GNU libc, uclibc, newlib, newlib-nano…

You will never find a single libc that meets all your operational requirements. Just read the library documentation, and you can safely replace the bits that don’t work for you. Malloc family of instructions share internal state and must be replaced as a group. Stdio is stateless and can be replaced piecewise – but that generally defeats the object, so rather replace the entire family too.

See here for glibc details:
https://www.gnu.org/software/libc/manua … ing-malloc


Squonk42
Mon Aug 27, 2018 12:44 pm
[heisan – Mon Aug 27, 2018 12:04 pm] –
You will never find a single libc that meets all your operational requirements. Just read the library documentation, and you can safely replace the bits that don’t work for you. Malloc family of instructions share internal state and must be replaced as a group. Stdio is stateless and can be replaced piecewise – but that generally defeats the object, so rather replace the entire family too.

… And then you find out that stdio depends on a malloc family of functions using internal/external defragmentation using buddy recombination and same-size pooling to avoid pooling structures by itself, but is not the case in the other simpler malloc/free that you chose and then you get out of memory sooner… Been there, done that. :roll:


heisan
Mon Aug 27, 2018 1:05 pm
If you plug in a memory allocator which is not sufficient for the workload, then you can obviously expect problems.

And that is the main reason I usually replace the stdio functions almost immediately. The full C specification for the format string cannot be implemented without malloc/free. Intensively using formatted strings will trash your heap no matter how good your allocator is. So rather replace the stdio functions with deterministic ones. Have to take short cuts on some formatting options but rather that than have things crash randomly.


Squonk42
Mon Aug 27, 2018 4:16 pm
[heisan – Mon Aug 27, 2018 1:05 pm] –
If you plug in a memory allocator which is not sufficient for the workload, then you can obviously expect problems.

Intensively using formatted strings will trash your heap no matter how good your allocator is.

So it looks like that no memory allocator is then “sufficient” for handling formatted strings :) Which is of course not true, since there are Linux servers running worldwide without problem, although they handle formatted string routinely.

[heisan – Mon Aug 27, 2018 1:05 pm] – And that is the main reason I usually replace the stdio functions almost immediately. The full C specification for the format string cannot be implemented without malloc/free.

Please note that the strict Arduino environment does not include stdio and xxprintf() routines:
https://playground.arduino.cc/main/printf

However, if you need one, here is a tiny printf that may be useful:
http://www.sparetimelabs.com/printfrevi … isited.php


heisan
Mon Aug 27, 2018 5:39 pm
[Squonk42 – Mon Aug 27, 2018 4:16 pm] –

[heisan – Mon Aug 27, 2018 1:05 pm] –
If you plug in a memory allocator which is not sufficient for the workload, then you can obviously expect problems.

Intensively using formatted strings will trash your heap no matter how good your allocator is.

So it looks like that no memory allocator is then “sufficient” for handling formatted strings :) Which is of course not true, since there are Linux servers running worldwide without problem, although they handle formatted string routinely.

I was obviously talking about an MCU environment with limited heap space.

[Squonk42 – Mon Aug 27, 2018 4:16 pm] –

[heisan – Mon Aug 27, 2018 1:05 pm] – And that is the main reason I usually replace the stdio functions almost immediately. The full C specification for the format string cannot be implemented without malloc/free.

Please note that the strict Arduino environment does not include stdio and xxprintf() routines:
https://playground.arduino.cc/main/printf

However, if you need one, here is a tiny printf that may be useful:
http://www.sparetimelabs.com/printfrevi … isited.php

Thanks. I already have a very small and full featured printf implementation which I use on my other projects. For applications which require a lot of text formatting, adding printf saves a lot of space over hand coding or building custom formatters.


heisan
Fri Aug 31, 2018 11:15 am
For anybody who wants to play with plug-in libraries for reducing size, I have put my first efforts on github:

viewtopic.php?f=9&t=4066


heisan
Wed Sep 05, 2018 7:03 pm
Just added printf support to my libc replacement. There were licensing issues with my original version, so I wrote a new one from scratch. I am quite pleased with the results very nearly character for character perfect with the glibc version, yet 19kB smaller.

Surprisingly, it is even 400 bytes smaller the using Print.println((float)), and you have feature rich formatting (precision, padding, justification, etc). Only real drawback is that the float conversion is only accurate to around 7 significant digits.


doctek
Thu Sep 06, 2018 3:36 am
Thanks for the LibC! While it didn’t make my code smaller (I’m not using malloc or the other functions), it gave me the key to replacing libc functions. The abort function seemed to be adding code to no useful purpose. So I copied it from Keith Packard’s github download of newlib, stuck it in the LibC directory and modified it to do nothing. I recompiled, and Lo! my code was nearly 400 bytes smaller. I think this is a valid approach I plan to use more often.

Next I attacked unneeded Interrupt routines. For example, the GPIO pin IRQs in interrupt.cpp. I moved that file so it would not be compiled (it’s not needed in Blink!), took out #include interrupt.h from board.h, and moved Winterupts.cpp. Reduced the executable almost 1100 bytes!

Doing that last little bit helped me see where a lot of bloat comes from: If a file is compiled that has a definition of a function that is declared as “weak” somewhere else, the stronger version is used. The most obvious case of this is in ISRs! The ISR/IRQ code gets pulled in. Since it’s unused: instant bloat. The ISRs often pull in other functions adding more bloat. The GPIO pin interrupts are a good example. I’m looking at the analog.c and timer.c code as well. This gets pulled in by the need for pwm-stop by wiring-digital.c. I’ll attack that next.

Again, thanks for LibC!


heisan
Thu Sep 06, 2018 6:52 am
Thanks. If you have any tested changes which you think will be useful to others, please submit them via github. Even one-off stuff, as this can be packaged in standalone .h files for inclusion on a case-by-case basis.

fpiSTM
Mon Sep 17, 2018 6:23 pm
Just FYI, I’ve remove use of printf per default.
https://github.com/stm32duino/Arduino_C … c367c8c6a5

Sketch size is now smaller. ;)


doctek
Fri Oct 05, 2018 3:57 am
Summary of my work so far. Next up is the I2C functionality. After getting through the GPIO and SysTick stuff, I think I’m ready to tackle it.

STM32L031 Size Reduction History 10/4/18 (started 7/24/18)

Began with a simple I2C program. It just did a simple configure and a read into an array.
Used an IMU board for I2C slave.
With debug, 22,680; with smallest, about 20K.
Removed any Upload directions and commented out uart_debug_write in syscalls_stm32.c
Debug: 17,208 Smallest: 15,048.
Removed -Dprintf=iprintf and did -fno-use-cxa-atexit
Debug: 15,500 Smallest: 13,636
Put in LL version of clock configuration.
Debug: 13,128 Smallest: 11,556
Switched to Blink at this stage.
Smallest: 6420
Moved the HAL I2C stuff and twi.c so no I2C stuff (maybe ISRs?)
Smallest: 3692 – but not blinking!
Fixed conflict between HAL and LL SysTick (used HAL) – Blinks!
Debug: 4880
Using <LibC> from hesian, I moved and gutted abort.c.
Debug: 4500
After discovering how much ISRs could add, I removed them. Got rid of Interrupt.cpp and WInterrupts.cpp.
Debug: 3420
Tried to remove timer.c and analog.c, but pwm_stop() in wiring_digitalwanted to pull them in. So I commented out the call to pwm_stop(). That’s OK for now, I’ll fix it later when I want to use PWM.
Debug: 2376.
Got LL version of SysTick working.
Debug: 2180.
Added LL version of GPIO.
Debug: 1856. Smallest: 1780. RAM: 72 Bytes.


Leave a Reply

Your email address will not be published. Required fields are marked *