The software has all worked just fine! I’m very happy except for one problem. The code size is huge! Its over 22K with debug on, and still over 20K with Smallest optimization. That’s nearly 70% of the code space chewed up to do a simple task! Since I can do this easily in the Tiny84, I see this as a show stopper!
Perhaps this forum can shed some light on why the ST code is so bloated and what approach I might use to deal with it.
Thanks for any suggestions.
| Program | -Teensy2- | Teensy3.2- | L031 |
| Blink | 2,410 | 11,592 | 10,248 |
| I2C | 5.016 | 21,212 | 20,780 |
Looks like the problem is the use of the ARM, not the STM implementation. So I guess my question just needs to be rephrased: Why do the ARM processors have so much overhead? Is there a way to reduce it, or am I faced with needing to use a device with a larger flash (ala Teensy 3.2)? I don’t see how using an L031 would be a gain over just using the Tiny84.
https://github.com/stm32duino/Arduino_C … issues/228
https://github.com/stm32duino/Arduino_C … issues/274
One other optimization which save space is to use LL for clock config, this will save some space, example here:
https://github.com/stm32duino/Arduino_C … -373646769
1. programming approach: for the larger / more complex chips, the programming approach tends to be more modular. that means more layers and more code for the same functionality on a 8-bit chip;
2. code density is generally lower on a 32-bit chip vs. a 8-bit chip.
3. your particular implementation.
…
if you are willing to do without usb-serial, you could edit boards.txt and undefine -DSERIAL_USB build flag
i’m not too sure if you simply include a header file and place #undef SERIAL_USB would that be the same as building without usb-serial
without usb-serial you’d not be able to do things like Serial.write(“any message”); to send anything back to your serial console
the other things are the compiler and various optimization flags e.g.
-fno-exceptions no exceptions
-fno-rtti no run time type info
-fno_use_cxa_atexit no use cxa atexit
-fno-threadsafe-statics – no thread safe statics
-nostdlib – no startup or default libs
-Xlinker –gc-sections remove unused sections (this can save lots of space)
-specs=nano.specs – use newlib-nano
some of these are probably in platform.txt or boards.txt, so have backups if you are editing them.
using all the above flags actually still allow my sketches to run ok, but your mileage may vary
fpiSTM – I’ll have a look at those items tonight.
stevestrong – Searching for L053 or L031 brought up only my own very recent posts. That’s why I posted.
ag123 – Hmmm. I’ll look closely into the usb-serial. I’ll also check out the other items you mention.
Thanks, all!
Following what seems like the most direct path to reducing the code size, I’m trying to get rid of the uart, and hopefully the usb. But I’m not making very good headway. Here’s what I’ve done.
In variants/NUCLEO_L031K6, I modified variant.h and stm32l0xx_hal_conf.h to comment out uart and serial port defines. However, when I try to compile (from the Arduino environment), UART_HandleTypeDef is undefined. The problem is that the term is defined in stm32l0xx_hal_conf.h, but that file is not included because of the mods I’ve described. So when uart.h is included from boards.h, I get the error.
You’re likely thinking, Why not modify boards.h to not call uart.h? Well, that leads to further problems. I went down that path, even commenting out large sections of code in various files. Where it ends is that the Mass_StorageUpLoadMethod calls syscalls_stm32.c which calls uart_debug_write which is not found (because I’ve pulled out the uart, remember?). The source for these is not included, so I can’t do anything about it. Switching to StLink upload has the same problem. I don’t see how to specify “no upload” in the Arduino environment, so I’m stuck in this route also.
Any suggestions for how to proceed will be most appreciated. Getting rid of the uart seems like an important step.
Perhaps I am not making the modifications in the right place. I really have no good idea and appreciate all suggestions.
For USART simply select disable in the Serial interface menu.
First thing I discovered is that selecting No Serial made no difference in code size. Neither did removing the Upload method menu and actions in boards.txt. Still 22,680 in Debug mode.
Since what i wanted to do was get rid of uart stuff, I simply moved the uart.c library out of the build path. Then I discovered that uart_debug_write was being called from syscalls_stm32.c. I removed that call and things built. Code size is now 17,208 in Debug mode. Drops to 15,048 if optimization is set to Smallest. Checking the application, it still works!
Still more work to do, but this is real progress!
Again, thanks so much for the help!
#if defined(HAL_UART_MODULE_ENABLED)ag123: All of the optimization flags except -fno-use-cxa-atexit are already in the platform.txt. I added the cxa one, but didn’t get a separate measurement of just how much difference it makes.
Further experiments: I noticed that syscalls_stm32.c has stuff like _sbrk, _signal, etc. I don’t think I need these, but removing syscalls caused lots of problems, so I couldn’t do that.
I did remove ipAddress.cpp and Print.cpp, as well as the -Dprintf+iprintf in platform.txt. Now the code size is 15.500 (Debug) and 13,636 (Smallest)
For comparison, I compiled with Teensy3.2, no usb and Smallest optimization. Code side is 11,372. I also modified the teensy platform.txt to create a map so I could use AMAP to do some more comparison.
Looks like the teensy includes about 5,452 bytes of serial and HardwareSerial stuff, so the actual code size is closer to 5,467 for just the I2C stuff. Trying to figure out just how much of the L031 code is likely unnecessary is more difficult, but here are some observations.
Stuff I might be able to remove or reduce:
lib_a: 2608 (has the _sbrk stuff, etc.)
HAL_rcc: 3256 (seems like a huge amount to set up the clocking?)
Stuff for the i2C:
HAL_i2c: 3256
Wire.cpp: 1184
twi.c: 928
Anyway, I feel like I’m making some headway toward reducing the code size. Thanks again for all the help and advice.
Thanks!
LL mostly looks like older SPL (Standard Peripheral Library) stuff. I don’t work for ST but I would make a guess that is going to be supported going forward as a way to quell the revolt towards HAL.
There is an SPL -> LL code converter application. I think many developers were sticking with SPL because of HAL. (Again anecdotal observations on my part )
Move to 1.3.0 to remove USART related code.
And use LL for systemclock config will reduce code size.
The other day someone ( @human890209 ) posted some code using new/malloc() and it didn’t work with Sloeber. I ported that code over to my core and compiled it with the latest arm-none-eabi-gcc (7.3.1) using the gnu++17 standard. It surprised me in that it completely eliminated the new and malloc calls and replaced them with a constant. Granted the code was very simple so the compiler could see what is going on. What I find really nice is that, while I sleep happily dreaming, someone else is making improvements that just show up on my doorstep.
Here is the code:
/* fabooh code highlighting the efficiency of gnu++17 and gcc 7.3.1 */
class MYCLASS {
public:
uint8_t content;
MYCLASS() : content(99) { }
~MYCLASS() { }
};
LED_BUILTIN LED_BUILTIN_;
serial_default_t<115200, CPU::frequency, TX_PIN, NO_PIN> Serial;
void setup() {
Serial.begin();
pinMode(LED_BUILTIN_, OUTPUT);
}
void loop() {
static unsigned x = 0;
unsigned ts;
digitalWrite(LED_BUILTIN_, HIGH);
do {
ts = millis();
} while ( (ts + 1000) < x);
x += 1000;
Serial << "curr millis()=" << ts << "\r\n";
digitalWrite(LED_BUILTIN_, LOW);
delay(50);
{
MYCLASS mc;
MYCLASS * const mcp = &mc;
Serial << (mcp->content + 1) << "\r\n";
}
{
MYCLASS * const mcp = new MYCLASS;
Serial << (mcp->content + 1) << "\r\n";
delete(mcp);
}
{
void * p = malloc(sizeof(MYCLASS));
MYCLASS * const mcp = new(p) MYCLASS;
Serial << ( mcp->content + 1) << "\r\n";
free(mcp);
}
}
First, I noticed that all the LL interface code is in with the HAL interface code, so I’m hoping that all I need to do is use it. That means sections like TWI.c have to be re-written, but the LL pieces should be available. I’m going to start by trying to use the LL version of system_clock_config.
Second, I had a look at an old favorite: Geoffery Brown’s classic Discovering the STM32 Microcontroller. One section that really caught my eye is on newlib (libc). This libc adds about 2600 bytes to my code and is totally unnecessary for my simple i2c program. The question is: HOW do I get rid of it?? I haven’t found where it is being pulled into the linked code. Anyone have any idea on this?
TIA!
This is one another point in my huge todo list.
Here is a trivial example, which is to start an ADC conversion. Using the LL library I’d write:
LL_ADC_REG_StartConversion(ADC1);
How do I pull the hooks out that invoke it??
Thanks!
BTW, regarding the LL functions, that’s the other thing I’m working on. Right now I’m trying to use the LL system_clock_config, but it won’t build. I have to include the LL header files. Then I can know how much difference it makes.
With the HAL version, code size was 13,784 (smallest optimization). Looking at the map, I was able to attribute 2,112 to HAL_RCC calls.
Using the LL version (nothing else changed!), the size is now 11,556! The difference is almost exactly the HAL_RCC code size. I now see NO calls to HAL_RCC stuff in the map. In fact, I don’t even see any calls to LL_RCC functions. And, yes, the code still runs correctly.
Before I tackle the I2C implementation using LL, I thought I’d start with the simple Blink program. Using the same mods I did to the I2C program, Blink compiles to 6420 bytes. Looking at the map shows that Lib_a (libc) is gone (2608), Wire.cpp (1184) is gone, and all but 150 or so of twi.c (~930-150=780) is gone. So, compared to my I2C program that’s 11556 – 4072 = 7484. Actual is 6420; so about 1000 reduction besides the big ones. BUT the libc is clearly NOT required for a basic Arduino sketch. I will continue to try to figure out how to get rid of it.
A further observation is that the HAL_I2C continues to be pulled in, even though it is not used at all. That’s 2408 bytes of bloat. So I’m trying to find out why that’s in Blink when it is clearly not used.
And I haven’t even considered how to use LL for doing Blink!
Lots of fun!
Looking through the .elf file for one of my projects, I see the biggest chunk of libc is malloc_r, which is used by the cxa_atexit code (which is required for strict C++ standards compliance). I should be possible to turn this off with the -fno-use-cxa-atexit compiler flag.
For the rest of libc, many build environments provide optimised replacements for common functions, like malloc() and friends, and/or include stripped down libcs like uclibc to reduce the size of these required functions when they are used.
Looking at the dumps, the biggest win would be to add simpler malloc/free implementations.
Removing libc entirely is not really an option (for example, static array initialisers are implemented internally with libc functions). Any use of a non-static C++ object uses a LOT of libc functions. You can try to modify your code to avoid all such internal uses, but then you will end up with trivial code. Much better to look into replacing the default libc functions with simpler/smaller variants.
Squonk42: Thank you for this pointer! Keith P. is a brilliant guy and does great work. Now if I can just figure out how to incorporate his library into the Arduino environment, my problem could be on the way to a solution!
[doctek – Mon Aug 13, 2018 4:38 am] –
heisan: Are your observations for the Arduino environment? I ask because I see a large chunk of libc pulled in, yet my code shouldn’t be using it.
The symbols I gave were from the spiscanner in Roger’s core, compiled in Arduino 1.8.5. I followed all the calls by using ‘objdump -d xxxx.elf’ and all included libc functions were called from the application.
The ELF file does not list the source file, but it only lists the symbols that were actually included.
[heisan – Mon Aug 13, 2018 7:54 pm] –
Was just double checking. If you look at the map file, there are a number of sections. At first all symbols are listed – but later there is a section for ‘Discarded symbols’… So it is quite difficult to see what is linked by looking at the map file.
The ELF file does not list the source file, but it only lists the symbols that were actually included.
The map files first lists the included archive members, then the allocated common symbols, then the discarded sections, the general memory configuration, the linker script, and eventually the memory map, such that it actually contains 6 different line formats within one file (!!!).
Example:
Archive member included to satisfy reference by file (symbol)
/tmp/arduino_cache_163115/core/core_STM32_stm32_GenF103_pnum_BLUEPILL_F103C8,flash_C8,upload_method_serialMethod,xserial_generic,opt_osstd_37797f055aaae0f124e551a637c4a2ed.a(startup_stm32yyxx.S.o)
(--whole-archive)
/tmp/arduino_cache_163115/core/core_STM32_stm32_GenF103_pnum_BLUEPILL_F103C8,flash_C8,upload_method_serialMethod,xserial_generic,opt_osstd_37797f055aaae0f124e551a637c4a2ed.a(board.c.o)
(--whole-archive)
...
Allocating common symbols
Common symbol size file
errno 0x4 /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/../../../../arm-none-eabi/lib/thumb/v7-m/libc_nano.a(lib_a-reent.o)
uwTick 0x4 /tmp/arduino_cache_163115/core/core_STM32_stm32_GenF103_pnum_BLUEPILL_F103C8,flash_C8,upload_method_serialMethod,xserial_generic,opt_osstd_37797f055aaae0f124e551a637c4a2ed.a(stm32yyxx_hal.c.o)
pFlash 0x20 /tmp/arduino_cache_163115/core/core_STM32_stm32_GenF103_pnum_BLUEPILL_F103C8,flash_C8,upload_method_serialMethod,xserial_generic,opt_osstd_37797f055aaae0f124e551a637c4a2ed.a(stm32yyxx_hal_flash.c.o)
Discarded input sections
.text 0x0000000000000000 0x0 /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crti.o
.data 0x0000000000000000 0x0 /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crti.o
.bss 0x0000000000000000 0x0 /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crti.o
...
Memory Configuration
Name Origin Length Attributes
RAM 0x0000000020000000 0x0000000000005000 xrw
FLASH 0x0000000008000000 0x0000000000010000 xr
*default* 0x0000000000000000 0xffffffffffffffff
Linker script and memory map
LOAD /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crti.o
LOAD /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crtbegin.o
LOAD /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/../../../../arm-none-eabi/lib/thumb/v7-m/crt0.o
LOAD /home/mstempin/.arduino15/packages/STM32/tools/CMSIS/5.3.0/CMSIS/Lib/GCC//libarm_cortexM3l_math.a
START GROUP
LOAD /tmp/arduino_build_254597/sketch/slave_sender_receiver.ino.cpp.o
...
0x0000000020005000 _estack = 0x20005000
0x0000000000000200 _Min_Heap_Size = 0x200
0x0000000000000400 _Min_Stack_Size = 0x400
.isr_vector 0x0000000008000000 0x10c
0x0000000008000000 . = ALIGN (0x4)
*(.isr_vector)
.isr_vector 0x0000000008000000 0x10c /tmp/arduino_cache_163115/core/core_STM32_stm32_GenF103_pnum_BLUEPILL_F103C8,flash_C8,upload_method_serialMethod,xserial_generic,opt_osstd_37797f055aaae0f124e551a637c4a2ed.a(startup_stm32yyxx.S.o)
0x0000000008000000 g_pfnVectors
0x000000000800010c . = ALIGN (0x4)
.text 0x000000000800010c 0x44f4
0x000000000800010c . = ALIGN (0x4)
*(.text)
.text 0x000000000800010c 0x6c /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7-m/crtbegin.o
.text 0x0000000008000178 0x10 /home/mstempin/.arduino15/packages/STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/bin/../lib/gcc/arm-none-eabi/6.3.1/../../../../arm-none-eabi/lib/thumb/v7-m/libc_nano.a(lib_a-strlen.o)
0x0000000008000178 strlen
*(.text*)
.text.loop 0x0000000008000188 0x2 /tmp/arduino_build_254597/sketch/slave_sender_receiver.ino.cpp.o
...
`nm -C –size-sort i2c_scanner_wire.ino.elf | less`
gives a very nice idea of where to start optimising…
…
000002bc B tft
000002d0 T Setup0_Process
000003f0 T Adafruit_ILI9341_STM::begin(SPIClass&, unsigned long)
00000408 D __malloc_av_
00000428 d impure_data
00000500 r font
00000538 T _malloc_r
Malloc itself is the single biggest function, and it’s initialised __malloc_av_ structure is the single biggest chunk of RAM…
EDIT: Bah – just tried replacing it as a test, but malloc() and free() are strong symbols in the default libc, so can not be overloaded at compile time…
objcopy –weaken libc.a
And then copy/paste the K&R reference malloc/free into the .ino file saves ~3k of flash and 1k of RAM…
Squonk42 – Regarding the newlib version from KeithP, I cloned the repository but I don’t see the debian directory that shows when I look at his web site. It is also not at all clear to me how to build the library so I can use it. Any advice would be welcome! This looks very promising if I can just figure it out.
hesian – Your use of objcopy is very interesting. I have not see this trick. Thanks for sharing it.
As for objcopy, use with extreme care (or better yet, only weaken specific symbols you want to replace). If you do what I did, then ALL symbols in libc are weak. Using a symbol of the same name in your application (or any library used by your application) will replace the libc symbol with no warning. If you accidentally replace an important one (like brk()) your application will break in extremely interesting ways.
Also need to keep in mind that some functions operate on the same internal data structures, so must be replaced as a set (eg malloc+free+calloc+realloc+memalign+friends). I only replaced malloc and free – but only after ensuring none of the other functions were referenced.
On the original Arduino core, exit(0) compiles to ‘cli(); while(1);’ – so disable interrupts and spin forever. There is no lower level OS to return to, so we can’t actually exit!
First step in cleaning up is to add ‘-fno_use_cxa_atexit’ to cpp flags. This removes code to call an indefinite number of static object destructors, when the application exits (which can never happen, so is useless).
Even with this flag, the compiler will call atexit() instead, which is almost as inefficient.
With weakend libc symbols, you can add:
int atexit(void (*function)(void)) {return 0;}
void exit(int status) {while(1);}
heisan – How did you figure out which version of libc to weaken? There seem to be so many versions, and I can’t really figure out how the Arduino build system decides what ones to use. I see nowhere that the paths are clearly defined. All I can go by is the verbose output from the linker.
The verbose output of the linker reveals that several libraries are accessed. These include libc_nano.a, libm.a, libgcc.a, libstdc++_nano.a, and libc.a. However, the map shows that only libc_nano.a and libgcc.a actually contribute code to Blink. What part do the other libraries play and why are they accessed? I’d like to understand this. All libraries except libgcc.a are located at STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/arm-none-eabi/lib/thumb/v6-m/. The libgcc.a is located at STM32/tools/arm-none-eabi-gcc/6-2017-q2-update/lib/gcc/arm-none-eabi/6.3.1/thumb/v6-m/. Note that the verbose output does not reveal which symbols from the accessed libraries will appear in the map, and presumably the executable image.
What I would really like to figure out is how to build Keith Packard’s version of newlib. I’ve looked at trying to use the Makefiles that come with the git archive from KeithP, but the automake/autoconfig stuff is just too complex for me to decipher. I’m thinking that just copying the files and doing the appropriate gcc compilation and building the library is the correct approach. But then I get to the question: What flags should I specify? And then how do I get the linker to use the new library so I can test it? Finally, what header file should be specified for programs to use and what linker flags should I use for them? I’m sure a lot of this I’ll have to answer for myself, but if anyone else can help out, I’d be eternally grateful! And I expect so would the STM32Duino community!!
[doctek – Mon Aug 20, 2018 11:18 pm] – heisan – How did you figure out which version of libc to weaken? There seem to be so many versions, and I can’t really figure out how the Arduino build system decides what ones to use. I see nowhere that the paths are clearly defined. All I can go by is the verbose output from the linker.
Look in the .map file – it provides the full path of the libraries being included. I think it was the ‘v7-m’ version, but I don’t have my Arduino stuff at work…
EDIT: For linker flags, look in the .spec files in the core tree – they contain examples of how to switch to the nano lib, should be possible to mod them to switch to other libs too. Although you should swap the header files too, this is usually not necessary, as C prototypes are standardised.
So what to do? First, why are standard library functions given names with a lib_a- prefix? Second, how should I replace them? Rename each one to have the prefix, or is there a more effective way?
As always, any help or guidance is greatly appreciated!
I have double checked, and the symbols in libc_s.a are ‘getc’. The symbol comes from the file ‘lib_a-getc.o’ – but the original filename makes no difference at link time, only the symbol name.
Unless you weaken libc.a (or libc_s.a is you are using nano.specs), you can not just replace stdio – you need to replace the whole libc…
This comes from the chicken and egg problem: the compiler needs a libc, but you need a compiler to compile the libc…
In fact, when you build the toolchain, a first libc based on newlib is created, that is used to compile the compiler, which is used to compile the final libc (standard libc or newlib) before compiling the final compiler. It is getting more complex when building a cross compiler or even worse, a Canadian-Cross compiler (host != build != target):

My guess is that you have to recompile the whole cross-toolchain to work with the final newlib.
I suggest to use an automated tool like crosstool-ng.
[Squonk42 – Mon Aug 27, 2018 7:53 am] –
IIRC, the toolchain (compiler, assembler, linker and other binary tools) are tied to a given libc.This comes from the chicken and egg problem: the compiler needs a libc, but you need a compiler to compile the libc…
While it is true that you need libc to build and run the compiler, the final libc on the target does not have to be the same one. As long as it provides all the symbols required by the C standard the compiler is built to, you should be fine.
All the compiler specific library code is placed in libgcc – and that must not change.
So as you said, unless you weaken the whole libc, you should replace the whole thing by newlib, not only the stdio part.
What about using the smaller newlib-nano if multithreading is not required?
[Squonk42 – Mon Aug 27, 2018 8:59 am] –
It is not that simple, the compiler may use builtin functions, and functions may call other functions.So as you said, unless you weaken the whole libc, you should replace the whole thing by newlib, not only the stdio part.
What about using the smaller newlib-nano if multithreading is not required?
If you stick to public APIs you can mix and match pretty much as you want. The API documents will list if there are related symbols that need to be replaced as a family.
I have an ARMv6 product in the field with only stdio and malloc functions replaced over a stock libc. Well over a million operational hours (aggregate) without any software issues.
I got bitten when changing a malloc/free implementation on a memory-constrained device because of memory fragmentation. I can give some more examples related to different stack usage, structure alignment that only caused problems in very specific conditions (the worst, according to Murphy’s law).
Changing only part of a libc is more or less tinkering, a more general approach would be to have the choice between several libc implementations like plain GNU libc, uclibc, newlib, newlib-nano…
[Squonk42 – Mon Aug 27, 2018 11:46 am] –
APIs do not describe internal states (causing re-entrance problems), side-effects or timing-related issues, and the fact that something did not happened is not a proof that it will not happen![]()
I got bitten when changing a malloc/free implementation on a memory-constrained device because of memory fragmentation. I can give some more examples related to different stack usage, structure alignment that only caused problems in very specific conditions (the worst, according to Murphy’s law).
Changing only part of a libc is more or less tinkering, a more general approach would be to have the choice between several libc implementations like plain GNU libc, uclibc, newlib, newlib-nano…
You will never find a single libc that meets all your operational requirements. Just read the library documentation, and you can safely replace the bits that don’t work for you. Malloc family of instructions share internal state and must be replaced as a group. Stdio is stateless and can be replaced piecewise – but that generally defeats the object, so rather replace the entire family too.
See here for glibc details:
https://www.gnu.org/software/libc/manua … ing-malloc
[heisan – Mon Aug 27, 2018 12:04 pm] –
You will never find a single libc that meets all your operational requirements. Just read the library documentation, and you can safely replace the bits that don’t work for you. Malloc family of instructions share internal state and must be replaced as a group. Stdio is stateless and can be replaced piecewise – but that generally defeats the object, so rather replace the entire family too.
… And then you find out that stdio depends on a malloc family of functions using internal/external defragmentation using buddy recombination and same-size pooling to avoid pooling structures by itself, but is not the case in the other simpler malloc/free that you chose and then you get out of memory sooner… Been there, done that. ![]()
And that is the main reason I usually replace the stdio functions almost immediately. The full C specification for the format string cannot be implemented without malloc/free. Intensively using formatted strings will trash your heap no matter how good your allocator is. So rather replace the stdio functions with deterministic ones. Have to take short cuts on some formatting options but rather that than have things crash randomly.
[heisan – Mon Aug 27, 2018 1:05 pm] –
If you plug in a memory allocator which is not sufficient for the workload, then you can obviously expect problems.
…
Intensively using formatted strings will trash your heap no matter how good your allocator is.
So it looks like that no memory allocator is then “sufficient” for handling formatted strings
Which is of course not true, since there are Linux servers running worldwide without problem, although they handle formatted string routinely.
[heisan – Mon Aug 27, 2018 1:05 pm] – And that is the main reason I usually replace the stdio functions almost immediately. The full C specification for the format string cannot be implemented without malloc/free.
Please note that the strict Arduino environment does not include stdio and xxprintf() routines:
https://playground.arduino.cc/main/printf
However, if you need one, here is a tiny printf that may be useful:
http://www.sparetimelabs.com/printfrevi … isited.php
[Squonk42 – Mon Aug 27, 2018 4:16 pm] –[heisan – Mon Aug 27, 2018 1:05 pm] –
If you plug in a memory allocator which is not sufficient for the workload, then you can obviously expect problems.
…
Intensively using formatted strings will trash your heap no matter how good your allocator is.So it looks like that no memory allocator is then “sufficient” for handling formatted strings
Which is of course not true, since there are Linux servers running worldwide without problem, although they handle formatted string routinely.
I was obviously talking about an MCU environment with limited heap space.
[Squonk42 – Mon Aug 27, 2018 4:16 pm] –[heisan – Mon Aug 27, 2018 1:05 pm] – And that is the main reason I usually replace the stdio functions almost immediately. The full C specification for the format string cannot be implemented without malloc/free.Please note that the strict Arduino environment does not include stdio and xxprintf() routines:
https://playground.arduino.cc/main/printfHowever, if you need one, here is a tiny printf that may be useful:
http://www.sparetimelabs.com/printfrevi … isited.php
Thanks. I already have a very small and full featured printf implementation which I use on my other projects. For applications which require a lot of text formatting, adding printf saves a lot of space over hand coding or building custom formatters.
Surprisingly, it is even 400 bytes smaller the using Print.println((float)), and you have feature rich formatting (precision, padding, justification, etc). Only real drawback is that the float conversion is only accurate to around 7 significant digits.
Next I attacked unneeded Interrupt routines. For example, the GPIO pin IRQs in interrupt.cpp. I moved that file so it would not be compiled (it’s not needed in Blink!), took out #include interrupt.h from board.h, and moved Winterupts.cpp. Reduced the executable almost 1100 bytes!
Doing that last little bit helped me see where a lot of bloat comes from: If a file is compiled that has a definition of a function that is declared as “weak” somewhere else, the stronger version is used. The most obvious case of this is in ISRs! The ISR/IRQ code gets pulled in. Since it’s unused: instant bloat. The ISRs often pull in other functions adding more bloat. The GPIO pin interrupts are a good example. I’m looking at the analog.c and timer.c code as well. This gets pulled in by the need for pwm-stop by wiring-digital.c. I’ll attack that next.
Again, thanks for LibC!
https://github.com/stm32duino/Arduino_C … c367c8c6a5
Sketch size is now smaller. ![]()
STM32L031 Size Reduction History 10/4/18 (started 7/24/18)
Began with a simple I2C program. It just did a simple configure and a read into an array.
Used an IMU board for I2C slave.
With debug, 22,680; with smallest, about 20K.
Removed any Upload directions and commented out uart_debug_write in syscalls_stm32.c
Debug: 17,208 Smallest: 15,048.
Removed -Dprintf=iprintf and did -fno-use-cxa-atexit
Debug: 15,500 Smallest: 13,636
Put in LL version of clock configuration.
Debug: 13,128 Smallest: 11,556
Switched to Blink at this stage.
Smallest: 6420
Moved the HAL I2C stuff and twi.c so no I2C stuff (maybe ISRs?)
Smallest: 3692 – but not blinking!
Fixed conflict between HAL and LL SysTick (used HAL) – Blinks!
Debug: 4880
Using <LibC> from hesian, I moved and gutted abort.c.
Debug: 4500
After discovering how much ISRs could add, I removed them. Got rid of Interrupt.cpp and WInterrupts.cpp.
Debug: 3420
Tried to remove timer.c and analog.c, but pwm_stop() in wiring_digitalwanted to pull them in. So I commented out the call to pwm_stop(). That’s OK for now, I’ll fix it later when I want to use PWM.
Debug: 2376.
Got LL version of SysTick working.
Debug: 2180.
Added LL version of GPIO.
Debug: 1856. Smallest: 1780. RAM: 72 Bytes.




