https://github.com/rogerclarkmelbourne/ … 2/pull/313
e.g. it adds this to every board
mapleMini.menu.opt.o2std=Faster
mapleMini.menu.opt.o2std.build.flags.optimize=-O2
mapleMini.menu.opt.o2std.build.flags.ldspecs=
mapleMini.menu.opt.o2lto=Faster with LTO
mapleMini.menu.opt.o2lto.build.flags.optimize=-O2 -flto
mapleMini.menu.opt.o2lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o1std=Fast
mapleMini.menu.opt.o1std.build.flags.optimize=-O1
mapleMini.menu.opt.o1std.build.flags.ldspecs=
mapleMini.menu.opt.o1lto=Fast with LTO
mapleMini.menu.opt.o1lto.build.flags.optimize=-O1 -flto
mapleMini.menu.opt.o1lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o3std=Fastest
mapleMini.menu.opt.o3std.build.flags.optimize=-O3
mapleMini.menu.opt.o3std.build.flags.ldspecs=
mapleMini.menu.opt.o3lto=Fastest with LTO
mapleMini.menu.opt.o3lto.build.flags.optimize=-O3 -flto
mapleMini.menu.opt.o3lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.ogstd=Debug
mapleMini.menu.opt.ogstd.build.flags.optimize=-Og
mapleMini.menu.opt.ogstd.build.flags.ldspecs=
mapleMini.menu.opt.oglto=Debug with LTO
mapleMini.menu.opt.oglto.build.flags.optimize=-Og -flto
mapleMini.menu.opt.oglto.build.flags.ldspecs=-flto
mapleMini.menu.opt.osstd=Smallest Code
mapleMini.menu.opt.osstd.build.flags.optimize=-Os
mapleMini.menu.opt.osstd.build.flags.ldspecs=
mapleMini.menu.opt.oslto=Smallest Code with LTO
mapleMini.menu.opt.oslto.build.flags.optimize=-Os -flto
mapleMini.menu.opt.oslto.build.flags.ldspecs=-flto
If -flto brings generally an improvement on code size without affecting execution speed, than I think it would be wise to integrate it to platform.txt (assuming that nothing else breaks).
If execution speed suffers, then I don’t welcome it. The reason why many people uses this chip is because of its speed, which must be kept.
And there is enough room in flash for most of user applications, the max 5% saving will not bring too much, except some time saving at upload time.
Regarding speed optimization, I think the -Os gives an overall good performance for most apps, unless one has special cases (liveOV7670).
Where necessary, the speed can be eventually increased by using special coding style which would then bring the same advantage as -O2 or 3 while keeping the general setting of -Os in place.
An alternative solution is to locally change the optimization level:
#pragma GCC push_options
#pragma GCC optimize ("O0")
your code
#pragma GCC pop_options
I think -O2 may be faster than -Os, as that works for LiveOV7670, but I already posted about using -O2 and the consensus was that we should stay with -Os
[stevestrong – Thu Jul 20, 2017 9:29 am] –
I think this bloat is not going to solve any problem.If -flto brings generally an improvement on code size without affecting execution speed, than I think it would be wise to integrate it to platform.txt (assuming that nothing else breaks).
If execution speed suffers, then I don’t welcome it. The reason why many people uses this chip is because of its speed, which must be kept.
And there is enough room in flash for most of user applications, the max 5% saving will not bring too much, except some time saving at upload time.Regarding speed optimization, I think the -Os gives an overall good performance for most apps, unless one has special cases (liveOV7670).
Where necessary, the speed can be eventually increased by using special coding style which would then bring the same advantage as -O2 or 3 while keeping the general setting of -Os in place.
An alternative solution is to locally change the optimization level:
#pragma GCC push_options
#pragma GCC optimize ("O0")your code
#pragma GCC pop_options
#pragma GCC optimize ("string"...)
This pragma allows you to set global optimization options for functions defined later in the source file. One or more strings can be specified. Each function that is defined after this point is as if attribute((optimize("STRING"))) was specified for that function. The parenthesis around the options is optional. See Function Attributes, for more information about the optimize attribute and the attribute syntax.We already have at least one example that needs -O2 , which at the moment requires changes to platform.txt
The proem with the menu system, is that if someone opened the “Ov7670 live” demo which needs -O2 it would not automatically select -O2, but the pragma would fix this situation
https://gcc.gnu.org/onlinedocs/gcc/Func … agmas.html
It says
The #pragma GCC target pragma is presently implemented for x86, PowerPC, and Nios II targets only.
which implies its not available for ARM ![]()
I tried adding the pragma to various headers and also directly into the core code which needs this optimisation and it made no difference.
So at the moment using #pragma gcc is not an option..
I’ve tried the PR and I actually quite like it… However it will need some changes
I ran the graphics test which draws lines and fills and text etc onto the ILI9341 display and compared our current optimisation of -Os (this generally means optimise for size, at optimisation level -O2)
-Os code and RAM sizes were
Sketch uses 30292 bytes (46%) of program storage space. Maximum is 65536 bytes.
Global variables use 3728 bytes (18%) of dynamic memory
-O3 & LTO code and RAM sizes where
Sketch uses 32720 bytes (49%) of program storage space. Maximum is 65536 bytes.
Global variables use 3704 bytes (18%) of dynamic memory
Which is what would be expected as the code size increased by 2k (around 8%)
The speed test was also interesting and does what would be expected, with noticeable gains on some places
| Operation | -Os optimisation | -O3 & LTO optimisation | __Improvement % |
| ScreenFill | 170789 | 170739 | 0.02927589 |
| Text | 39905 | 31275 | 21.62636261 |
| Lines | 228371 | 169259 | 25.88419721 |
| Horiz/VertLines | 15736 | 15031 | 4.480172852 |
| Rectangles(outline) | 11469 | 10496 | 8.483738774 |
| Rectangles(filled) | 355032 | 354789 | 0.068444535 |
| Circles (filled) | 140210 | 106118 | 24.31495614 |
| Circles(outline) | 154955 | 120331 | 22.34455164 |
| Triangles(outline) | 57983 | 40706 | 29.79666454 |
| Triangles(filled) | 164206 | 146417 | 10.83334348 |
| Rounded rects(outline) | 54528 | 42923 | 21.28264378 |
| Rounded rects(filled) | 414590 | 403744 | 2.616078535 |
Text, Lines, Circles and rounded rects are all considerably faster, which “Triangles (outline)” being almost 30% faster
I think the guys using the PigOScope (and derivatives) may find this speed increase quite useful, assuming it doesnt break anything else.
The only problem I see with this PR is that it changes the default optimisation to “Faster”, which is
.menu.opt.o2std=Faster
.menu.opt.o2std.build.flags.optimize=-O2
I’d like to add this, but to stop it potentially breaking existing code the optimisation needs to be set to -Os by default.
I think it would also be better if the amount of optimisation increased in the lower menu options
So probably would go.
Smallest
Smallest + LTO
Fast
Fast+LTO
Faster
Faster+LTO
Fastest
Fastest+LTO
and then have debug as the last option as I’m not entirely sure who would use this or whether we should include it at all as the IDE does not have any debugging capabilities and this option would only be useful to people using the repo in another IDE, which may not support the Menu options at all.
Unfortunately I don’t have time, at the moment, to go through and change the order of all of these entries in board.txt
https://github.com/mtiutiu/Arduino_STM3 … boards.txt
I will see if the OP is willing to change this, or perhaps someone else with time on their hands could do it?
PS. I guess it could be done by taking one section into a separate editor window
#-- Optimizations
mapleMini.menu.opt.o2std=Faster
mapleMini.menu.opt.o2std.build.flags.optimize=-O2
mapleMini.menu.opt.o2std.build.flags.ldspecs=
mapleMini.menu.opt.o2lto=Faster with LTO
mapleMini.menu.opt.o2lto.build.flags.optimize=-O2 -flto
mapleMini.menu.opt.o2lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o1std=Fast
mapleMini.menu.opt.o1std.build.flags.optimize=-O1
mapleMini.menu.opt.o1std.build.flags.ldspecs=
mapleMini.menu.opt.o1lto=Fast with LTO
mapleMini.menu.opt.o1lto.build.flags.optimize=-O1 -flto
mapleMini.menu.opt.o1lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o3std=Fastest
mapleMini.menu.opt.o3std.build.flags.optimize=-O3
mapleMini.menu.opt.o3std.build.flags.ldspecs=
mapleMini.menu.opt.o3lto=Fastest with LTO
mapleMini.menu.opt.o3lto.build.flags.optimize=-O3 -flto
mapleMini.menu.opt.o3lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.ogstd=Debug
mapleMini.menu.opt.ogstd.build.flags.optimize=-Og
mapleMini.menu.opt.ogstd.build.flags.ldspecs=
mapleMini.menu.opt.oglto=Debug with LTO
mapleMini.menu.opt.oglto.build.flags.optimize=-Og -flto
mapleMini.menu.opt.oglto.build.flags.ldspecs=-flto
mapleMini.menu.opt.osstd=Smallest Code
mapleMini.menu.opt.osstd.build.flags.optimize=-Os
mapleMini.menu.opt.osstd.build.flags.ldspecs=
mapleMini.menu.opt.oslto=Smallest Code with LTO
mapleMini.menu.opt.oslto.build.flags.optimize=-Os -flto
mapleMini.menu.opt.oslto.build.flags.ldspecs=-flto
Edit.
I’ve committed changes for the F1 but I don’t have time to do the F3 and F4 at the moment.
Is that correct? Is the -Os always faster?
-Os is always slower in the graphics test
just FYI,
I’m updating the pulseIn() function for the Arduino_Core_STM32 in order to have a better precision.
I saw you have hard coded the number of cycle per iteration (to 16) but since this new menu, depending of the optimization selected, this number is not always the same so the duration value is not correct for all Ox.
https://github.com/rogerclarkmelbourne/ … f1.cpp#L42
I’m trying to find a generic way to compute this value (asm, DWT_CYCCNT,…) as I have to deal with all STM32 series.
I’d noticed that software I2C was also affected by the optimisation setting, but because LibMaple now uses Hardware I2C for “Wire” its not so much of a problem, and will only impact people specifically choosing to use software I2C
Because the delay loops in software I2C are so short, I don’t think there is a way to make it work correctly for all optimisation settings.
__asm__("")
That may help keep those timing loops always result in the same code.
Frederic, you mention DWT_CYCCNT, which has come up several times in the forum. Is there any ill effect from using it? I believe the concerns were about the effects of debugging, do I dont remember it was because it could be stopped or reset during debugging or what other reason.
About the DWT CYCCNT, I’m just stating the investigation and I think I will not use that as not supported by M0 family. And probably you’re right about debugging effect.
can you give more details of the “optimize” attribute..
I investigated something similar when trying to get the original OV7670 camera code to run with our default optimization settings, i.e switch to -O2 for some code, but the pragma for this is not supported by the ARM version of GCC
#pragma GCC push_options
#pragma GCC optimize ("Os")
your function
#pragma GCC pop_options
[victor_pv – Mon Sep 18, 2017 8:36 pm] –
I actually had hoped that you would say it’s ok to use it![]()
I tried to use the DWT CYCNT for F103 but I do not succeed to write properly the DWT CTRL register. I think it locked by default (probably because I used STLink) so it required to be unlocked.
I found this nice example and think I’m in this case:
https://github.com/PetteriAimonen/STM32 … mple.c#L35
but I did not checked.
uint32_t __attribute__((optimize("Os"))) pulseIn( uint32_t pin, uint32_t state, uint32_t timeout )without the fix, using O1 provide wrong value while with the fix it’s ok.
uint32_t __attribute__((optimize("Os"))) pulseIn( uint32_t pin, uint32_t state, uint32_t timeout )Thats interesting.
Thanks
https://github.com/stm32duino/Arduino_C … 2/pull/110
Very useful to avoid to edit platform.txt
Just one information for those using arm gcc 6-2017-q1-update to use LTO feature, it is required to use the arm gcc 6-2017-q2-update
previous one has a bug causing a segmentation fault during the build.
Re:LTO
I have not noticed any problems in gcc 4.x, except LTO does not always result in smaller or faster code.
Also, in LibMaple I have noticed enabling LTO shows some warnings during compilation, but they dont seem to effect the operation of the code.


