PR to add optimisation menu (to all boards individually)

RogerClark
Wed Jul 19, 2017 11:09 pm
There is a PR to add an optimisation menu

https://github.com/rogerclarkmelbourne/ … 2/pull/313

e.g. it adds this to every board

mapleMini.menu.opt.o2std=Faster
mapleMini.menu.opt.o2std.build.flags.optimize=-O2
mapleMini.menu.opt.o2std.build.flags.ldspecs=
mapleMini.menu.opt.o2lto=Faster with LTO
mapleMini.menu.opt.o2lto.build.flags.optimize=-O2 -flto
mapleMini.menu.opt.o2lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o1std=Fast
mapleMini.menu.opt.o1std.build.flags.optimize=-O1
mapleMini.menu.opt.o1std.build.flags.ldspecs=
mapleMini.menu.opt.o1lto=Fast with LTO
mapleMini.menu.opt.o1lto.build.flags.optimize=-O1 -flto
mapleMini.menu.opt.o1lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o3std=Fastest
mapleMini.menu.opt.o3std.build.flags.optimize=-O3
mapleMini.menu.opt.o3std.build.flags.ldspecs=
mapleMini.menu.opt.o3lto=Fastest with LTO
mapleMini.menu.opt.o3lto.build.flags.optimize=-O3 -flto
mapleMini.menu.opt.o3lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.ogstd=Debug
mapleMini.menu.opt.ogstd.build.flags.optimize=-Og
mapleMini.menu.opt.ogstd.build.flags.ldspecs=
mapleMini.menu.opt.oglto=Debug with LTO
mapleMini.menu.opt.oglto.build.flags.optimize=-Og -flto
mapleMini.menu.opt.oglto.build.flags.ldspecs=-flto
mapleMini.menu.opt.osstd=Smallest Code
mapleMini.menu.opt.osstd.build.flags.optimize=-Os
mapleMini.menu.opt.osstd.build.flags.ldspecs=
mapleMini.menu.opt.oslto=Smallest Code with LTO
mapleMini.menu.opt.oslto.build.flags.optimize=-Os -flto
mapleMini.menu.opt.oslto.build.flags.ldspecs=-flto


stevestrong
Thu Jul 20, 2017 9:29 am
I think this bloat is not going to solve any problem.

If -flto brings generally an improvement on code size without affecting execution speed, than I think it would be wise to integrate it to platform.txt (assuming that nothing else breaks).
If execution speed suffers, then I don’t welcome it. The reason why many people uses this chip is because of its speed, which must be kept.
And there is enough room in flash for most of user applications, the max 5% saving will not bring too much, except some time saving at upload time.

Regarding speed optimization, I think the -Os gives an overall good performance for most apps, unless one has special cases (liveOV7670).
Where necessary, the speed can be eventually increased by using special coding style which would then bring the same advantage as -O2 or 3 while keeping the general setting of -Os in place.
An alternative solution is to locally change the optimization level:
#pragma GCC push_options
#pragma GCC optimize ("O0")

your code

#pragma GCC pop_options


RogerClark
Thu Jul 20, 2017 10:50 am
Thanks Steve

I think -O2 may be faster than -Os, as that works for LiveOV7670, but I already posted about using -O2 and the consensus was that we should stay with -Os


mtiutiu
Fri Jul 21, 2017 7:19 am
[stevestrong – Thu Jul 20, 2017 9:29 am] –
I think this bloat is not going to solve any problem.

If -flto brings generally an improvement on code size without affecting execution speed, than I think it would be wise to integrate it to platform.txt (assuming that nothing else breaks).
If execution speed suffers, then I don’t welcome it. The reason why many people uses this chip is because of its speed, which must be kept.
And there is enough room in flash for most of user applications, the max 5% saving will not bring too much, except some time saving at upload time.

Regarding speed optimization, I think the -Os gives an overall good performance for most apps, unless one has special cases (liveOV7670).
Where necessary, the speed can be eventually increased by using special coding style which would then bring the same advantage as -O2 or 3 while keeping the general setting of -Os in place.
An alternative solution is to locally change the optimization level:
#pragma GCC push_options
#pragma GCC optimize ("O0")

your code

#pragma GCC pop_options


zoomx
Mon Jul 24, 2017 1:34 pm
https://gcc.gnu.org/onlinedocs/gcc/Func … agmas.html
#pragma GCC optimize ("string"...)
This pragma allows you to set global optimization options for functions defined later in the source file. One or more strings can be specified. Each function that is defined after this point is as if attribute((optimize("STRING"))) was specified for that function. The parenthesis around the options is optional. See Function Attributes, for more information about the optimize attribute and the attribute syntax.

RogerClark
Mon Jul 24, 2017 10:21 pm
#pragma GCC optimize looks really interesting

We already have at least one example that needs -O2 , which at the moment requires changes to platform.txt

The proem with the menu system, is that if someone opened the “Ov7670 live” demo which needs -O2 it would not automatically select -O2, but the pragma would fix this situation


RogerClark
Sun Jul 30, 2017 12:17 am
Unfortunatly, looking at the gcc reference page

https://gcc.gnu.org/onlinedocs/gcc/Func … agmas.html

It says


The #pragma GCC target pragma is presently implemented for x86, PowerPC, and Nios II targets only.

which implies its not available for ARM :-(


RogerClark
Sun Jul 30, 2017 12:44 am
I tested that #pragma on the LiveOV7670 coded which only runs correctly with -O2 optimisation, and it didn’t seem to work.

I tried adding the pragma to various headers and also directly into the core code which needs this optimisation and it made no difference.

So at the moment using #pragma gcc is not an option..


RogerClark
Sun Jul 30, 2017 1:08 am
Back to the original purpose of this thread

I’ve tried the PR and I actually quite like it… However it will need some changes

I ran the graphics test which draws lines and fills and text etc onto the ILI9341 display and compared our current optimisation of -Os (this generally means optimise for size, at optimisation level -O2)

-Os code and RAM sizes were
Sketch uses 30292 bytes (46%) of program storage space. Maximum is 65536 bytes.
Global variables use 3728 bytes (18%) of dynamic memory

-O3 & LTO code and RAM sizes where
Sketch uses 32720 bytes (49%) of program storage space. Maximum is 65536 bytes.
Global variables use 3704 bytes (18%) of dynamic memory

Which is what would be expected as the code size increased by 2k (around 8%)

The speed test was also interesting and does what would be expected, with noticeable gains on some places

Operation -Os optimisation -O3 & LTO optimisation __Improvement %
ScreenFill 170789 170739 0.02927589
Text 39905 31275 21.62636261
Lines 228371 169259 25.88419721
Horiz/VertLines 15736 15031 4.480172852
Rectangles(outline) 11469 10496 8.483738774
Rectangles(filled) 355032 354789 0.068444535
Circles (filled) 140210 106118 24.31495614
Circles(outline) 154955 120331 22.34455164
Triangles(outline) 57983 40706 29.79666454
Triangles(filled) 164206 146417 10.83334348
Rounded rects(outline) 54528 42923 21.28264378
Rounded rects(filled) 414590 403744 2.616078535

Text, Lines, Circles and rounded rects are all considerably faster, which “Triangles (outline)” being almost 30% faster

I think the guys using the PigOScope (and derivatives) may find this speed increase quite useful, assuming it doesnt break anything else.

The only problem I see with this PR is that it changes the default optimisation to “Faster”, which is

.menu.opt.o2std=Faster
.menu.opt.o2std.build.flags.optimize=-O2

I’d like to add this, but to stop it potentially breaking existing code the optimisation needs to be set to -Os by default.

I think it would also be better if the amount of optimisation increased in the lower menu options

So probably would go.

Smallest
Smallest + LTO
Fast
Fast+LTO
Faster
Faster+LTO
Fastest
Fastest+LTO

and then have debug as the last option as I’m not entirely sure who would use this or whether we should include it at all as the IDE does not have any debugging capabilities and this option would only be useful to people using the repo in another IDE, which may not support the Menu options at all.

Unfortunately I don’t have time, at the moment, to go through and change the order of all of these entries in board.txt

https://github.com/mtiutiu/Arduino_STM3 … boards.txt

I will see if the OP is willing to change this, or perhaps someone else with time on their hands could do it?

PS. I guess it could be done by taking one section into a separate editor window

#-- Optimizations
mapleMini.menu.opt.o2std=Faster
mapleMini.menu.opt.o2std.build.flags.optimize=-O2
mapleMini.menu.opt.o2std.build.flags.ldspecs=
mapleMini.menu.opt.o2lto=Faster with LTO
mapleMini.menu.opt.o2lto.build.flags.optimize=-O2 -flto
mapleMini.menu.opt.o2lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o1std=Fast
mapleMini.menu.opt.o1std.build.flags.optimize=-O1
mapleMini.menu.opt.o1std.build.flags.ldspecs=
mapleMini.menu.opt.o1lto=Fast with LTO
mapleMini.menu.opt.o1lto.build.flags.optimize=-O1 -flto
mapleMini.menu.opt.o1lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o3std=Fastest
mapleMini.menu.opt.o3std.build.flags.optimize=-O3
mapleMini.menu.opt.o3std.build.flags.ldspecs=
mapleMini.menu.opt.o3lto=Fastest with LTO
mapleMini.menu.opt.o3lto.build.flags.optimize=-O3 -flto
mapleMini.menu.opt.o3lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.ogstd=Debug
mapleMini.menu.opt.ogstd.build.flags.optimize=-Og
mapleMini.menu.opt.ogstd.build.flags.ldspecs=
mapleMini.menu.opt.oglto=Debug with LTO
mapleMini.menu.opt.oglto.build.flags.optimize=-Og -flto
mapleMini.menu.opt.oglto.build.flags.ldspecs=-flto
mapleMini.menu.opt.osstd=Smallest Code
mapleMini.menu.opt.osstd.build.flags.optimize=-Os
mapleMini.menu.opt.osstd.build.flags.ldspecs=
mapleMini.menu.opt.oslto=Smallest Code with LTO
mapleMini.menu.opt.oslto.build.flags.optimize=-Os -flto
mapleMini.menu.opt.oslto.build.flags.ldspecs=-flto


RogerClark
Sun Jul 30, 2017 5:53 am
I’ll see if I can script this change myself…

Edit.

I’ve committed changes for the F1 but I don’t have time to do the F3 and F4 at the moment.


Pito
Sun Jul 30, 2017 11:04 am
Operation -Os optimisation -O3 & LTO optimisation __Improvement %
Is that correct? Is the -Os always faster?

RogerClark
Sun Jul 30, 2017 12:08 pm
The other way around

-Os is always slower in the graphics test


fpiSTM
Mon Sep 18, 2017 8:24 am
@RogerClark,

just FYI,

I’m updating the pulseIn() function for the Arduino_Core_STM32 in order to have a better precision.
I saw you have hard coded the number of cycle per iteration (to 16) but since this new menu, depending of the optimization selected, this number is not always the same so the duration value is not correct for all Ox.

https://github.com/rogerclarkmelbourne/ … f1.cpp#L42

I’m trying to find a generic way to compute this value (asm, DWT_CYCCNT,…) as I have to deal with all STM32 series.


RogerClark
Mon Sep 18, 2017 8:28 am
Thanks

I’d noticed that software I2C was also affected by the optimisation setting, but because LibMaple now uses Hardware I2C for “Wire” its not so much of a problem, and will only impact people specifically choosing to use software I2C

Because the delay loops in software I2C are so short, I don’t think there is a way to make it work correctly for all optimisation settings.


fpiSTM
Mon Sep 18, 2017 8:52 am
You could try to add this is the impacted function:

__asm__("")


victor_pv
Mon Sep 18, 2017 1:49 pm
What about the “optimize” attribute to force a function to always be optimized witht he same level no matter what is used for the rest of the code?
That may help keep those timing loops always result in the same code.
Frederic, you mention DWT_CYCCNT, which has come up several times in the forum. Is there any ill effect from using it? I believe the concerns were about the effects of debugging, do I dont remember it was because it could be stopped or reset during debugging or what other reason.

fpiSTM
Mon Sep 18, 2017 3:39 pm
Thanks Victor, I will check about the optimize. :)
About the DWT CYCCNT, I’m just stating the investigation and I think I will not use that as not supported by M0 family. And probably you’re right about debugging effect.

victor_pv
Mon Sep 18, 2017 8:36 pm
I actually had hoped that you would say it’s ok to use it :(

RogerClark
Mon Sep 18, 2017 9:42 pm
Victor

can you give more details of the “optimize” attribute..

I investigated something similar when trying to get the original OV7670 camera code to run with our default optimization settings, i.e switch to -O2 for some code, but the pragma for this is not supported by the ARM version of GCC


fpiSTM
Tue Sep 19, 2017 4:12 am
I tried quickly yesterday and it seems to work, but I use the arm gcc v6, I do not test for v4.8
#pragma GCC push_options
#pragma GCC optimize ("Os")
your function

#pragma GCC pop_options


fpiSTM
Tue Sep 19, 2017 4:22 am
[victor_pv – Mon Sep 18, 2017 8:36 pm] –
I actually had hoped that you would say it’s ok to use it :(

I tried to use the DWT CYCNT for F103 but I do not succeed to write properly the DWT CTRL register. I think it locked by default (probably because I used STLink) so it required to be unlocked.

I found this nice example and think I’m in this case:
https://github.com/PetteriAimonen/STM32 … mple.c#L35
but I did not checked.


fpiSTM
Tue Sep 19, 2017 2:40 pm
To conclude, by adding:
uint32_t __attribute__((optimize("Os"))) pulseIn( uint32_t pin, uint32_t state, uint32_t timeout )

RogerClark
Tue Sep 19, 2017 8:51 pm
Has this been tested with gcc 4.x ?

fpiSTM
Wed Sep 20, 2017 8:19 am
I’ve tested on the BP with Arduino_STM32 repo.
without the fix, using O1 provide wrong value while with the fix it’s ok.
uint32_t __attribute__((optimize("Os"))) pulseIn( uint32_t pin, uint32_t state, uint32_t timeout )

RogerClark
Wed Sep 20, 2017 8:27 am
OK.

Thats interesting.

Thanks


fpiSTM
Thu Sep 21, 2017 12:49 pm
Welcome, I have also to thank you, I’ve added this PR to the Arduino_Core_STM32.
https://github.com/stm32duino/Arduino_C … 2/pull/110
Very useful to avoid to edit platform.txt :)

Just one information for those using arm gcc 6-2017-q1-update to use LTO feature, it is required to use the arm gcc 6-2017-q2-update
previous one has a bug causing a segmentation fault during the build.


RogerClark
Thu Sep 21, 2017 8:48 pm
No worries…

Re:LTO

I have not noticed any problems in gcc 4.x, except LTO does not always result in smaller or faster code.

Also, in LibMaple I have noticed enabling LTO shows some warnings during compilation, but they dont seem to effect the operation of the code.


fpiSTM
Fri Sep 22, 2017 5:30 am
Yes in fact it’s a regression of the v6. so v4.8 is not impacted.

Leave a Reply

Your email address will not be published. Required fields are marked *