Should we still use optimisation level -Os (Small)

RogerClark
Sun Jun 25, 2017 12:40 am
Guys

I have been experimenting with the OV7670 camera using code from github, but the code requires that optimisation be changed from -Os to -O2

Changing the flag seemed to have a significant impact on speed (I didnt look at code size)

I know this is a big upheaval and may break some code, but I wonder whether using -Os is the best option.

Really, size is not normally an issue, as in reality all F103’s have at least 128k flash

Looking at the gcc docs

-Os

Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.

-Os disables the following optimization flags:

-falign-functions -falign-jumps -falign-loops
-falign-labels -freorder-blocks -freorder-blocks-algorithm=stc
-freorder-blocks-and-partition -fprefetch-loop-arrays

I checked the Arduino SAM (Due) and they use -Os, so I guess we should but I do wonder how much performance we are loosing because of this, vs how much additional size the sketches would be, an whether the increase in sketch size would really be a problem.

e.g. the OV7670 example compiled with -O2 was 24360 bytes
and with -Os was 22732 bytes

Personally I’m not worried about 2k

Anyway. Its probably too late now for LibMaple, but perhaps worth considering for @danielef


danieleff
Sun Jun 25, 2017 6:25 am
One more thing you can check: GNU ARM embedded 6-2017-q1
It can give you speed boost without changing to -O2

RogerClark
Sun Jun 25, 2017 7:14 am
[danieleff – Sun Jun 25, 2017 6:25 am] –
One more thing you can check: GNU ARM embedded 6-2017-q1
It can give you speed boost without changing to -O2

Thanks.
Thats interesting.


stevestrong
Sun Jun 25, 2017 8:33 am
-Os does a good job job in general, so I think we should leave it.
However, as far as I know, different optimization level flags can be still used selectively for some modules independent on the default one, if necessary.

OTOH, the achieved speed also depends on the individual coding style, so it should be possible to reach high speed even with -Os, when you keep in mind the desired speed performance. So I bet the OV7670 lib (without having a look at it) could be speed optimized only by revising the coding style.
Typical example:
for (int i = 0; i<256; i++) {
[do_something];
}


ag123
Sun Jun 25, 2017 8:43 am
i’d live with -Os (small) optimizations on maple mini/ blue pill, the -O2 optimizations seem to help in particular on the F4
nevertheless -Os (small) is still preferred for F4 as a default mode
i think the main reasons -O2 is faster on F4 are due to ‘indiscriminate’ optimizations by gcc/g++ (the compiler is known to *remove codes*)
and that F4 has the ART memory accelerator (accelerating flash memory access to 0 wait conditionally) which possibly makes codes run *much* faster
http://www.stm32duino.com/viewtopic.php … &start=160
among some of those -O2, -O3 optimizations is to *unroll loops*, this won’t work if codes is running directly running from flash, it would simply be a bottleneck and occupy more flash/memory. Hence, -Os (small) is probably better for stm32 f103
just 2 cents

RogerClark
Sun Jun 25, 2017 10:39 am
OK

I’ll leave it as -Os, but I’ll need to work out why the camera code runs slow and won’t work in -Os optimisation :-(


Pito
Sun Jun 25, 2017 2:29 pm
Do overclock – 100MHz might solve your problems :)

ag123
Sun Jun 25, 2017 2:57 pm
*single precision* floating point on f4 is *very fast* in part due to hardware fpu x2 (there is 2 fpu in there) + ART memory accelerator
http://www.stm32duino.com/viewtopic.php?f=39&t=2001
hence if there is a a lot of *single precision* floating point calcs f4 may be the ‘right platform’ to run it
it’d probably benefit from -O2 and -O3 optimisations
with whetstone benchmarks of 500mflops (close to it) at 240 mhz (overclocked) we’ve pretty much ‘reach the limit’ of what may be possible on f4 (2 fpu instructions per clock)
http://www.stm32duino.com/viewtopic.php … &start=160
i’m thinking f4 may literally be able to ‘compress on the fly’ floating point jpeg etc

RogerClark
Sun Jun 25, 2017 8:56 pm
[Pito – Sun Jun 25, 2017 2:29 pm] –
Do overclock – 100MHz might solve your problems :)

Thats an interesting option.. Interrupts already need to be disabled in this code so that the pixels get clocked in correctly, so I could overclock.

Also, if I can push the ILI9341 SPI faster it would help.

I think QVGA should be possible at 7 FPS at 72MHz but would need double buffering of the pixel data from the camera.

I am not sure what max speed SPI to the ILI934- would run at.


Pito
Sun Jun 25, 2017 9:13 pm
40MHz+

stevestrong
Sun Jun 25, 2017 9:14 pm
Roger, do you also have all those nops in your code?
It seems that they serve a precise timing.
What i dont understand at which frequency is the epixel clock working? It should be PLLCLK/2.
Is the CPU runnung with twice of this? If so, i dont get the point of those nops, other than an undersampling.

RogerClark
Sun Jun 25, 2017 9:33 pm
There are still NOPs when it samples the pixels from the camera, but not when sending to the display.

AFIK, the clock to the camera is normally 8MHZ ( the spec says the minimum is 10MHz but I think everyone uses 8 MHz as its easier)

I am not sure if the clock from the camera, changes depending on frame rate. But the sampling code seems to have different numbersnof NOPs dependinb on frame rate, which seems to suggest the pixel output clock from the camera may change depending on frame rate

I presume the NOPs in the code that samples the pixels, is there to time the GPIO read ( of PB8 to PB15), when the data is valid.

Although the code works, I dont link the way it uses NOPs and various other things, which cause code execution timing to vary depending on compiler optimisation settings and possibly the compiler version.

I thought that DMA could be clocked using an external trigger using a Hardware Timer, and this would be a better solution and allow more throughput and also probably not need to disable interrupts all the time.


zoomx
Mon Jun 26, 2017 6:29 am
It is not possible to change the optimization in the Tools menu?

RogerClark
Mon Jun 26, 2017 7:07 am
[zoomx – Mon Jun 26, 2017 6:29 am] –
It is not possible to change the optimization in the Tools menu?

I think it should be possible to do that, but it means you have to manually set this for each sketch in case it needs to be small or needs to be fast.


zmemw16
Mon Jun 26, 2017 10:29 am
can’t we just default it to -Os ? as now in effect, let user then change it if reqd.
srp

RogerClark
Mon Jun 26, 2017 10:04 pm
[zmemw16 – Mon Jun 26, 2017 10:29 am] –
can’t we just default it to -Os ? as now in effect, let user then change it if reqd.
srp

Yes.

It could be done


Leave a Reply

Your email address will not be published. Required fields are marked *