Newbie question about 'short' and 'uint16_t' data types

KenLaszlo

Wed Nov 22, 2017 6:41 pm

I’m trying to make my code as fast as possible on my newly purchased Blue Pill.
I was wondering . . .

Since the CPU is a 32-bit device, is there any particular speed advantage (or other possible advantage) of using the ‘short’ and ‘uint16_t’ data types that use two bytes instead of four, please? Would Maths with these smaller data types be any faster on these 32-bit devices?

Thank you
Ken

P.S. I did a Google search and also searched this forum, but couldn’t find an answer.

RogerClark

Wed Nov 22, 2017 7:40 pm

The compiler does all sorts of tricks to improve code speed or size, so the only way to really know if changing to 16 bit would be faster would be to change it and do some timing tests.

My guess would be that you would not speed things up, but may save some RAM

There is an optimise option menu, in my Arduno-STM32 core, which would have more effect.

The default optimisation is for “size” -Os but this is probably the slowest setting.
Try the -O2 or -O3 settings, with or without LTO (link time optimisation ) and you will find the -O3 is probably the fastest.
I find LTO can sometimes make things worse.

Note we have warnings when using -O3 but the core normally still works Ok

KenLaszlo

Wed Nov 22, 2017 7:54 pm

Hi Roger – I appreciate your reply.

I just ran a few test, changing some ‘shorts’ to ‘ints’.
I found no speed advantage to using the smaller data types.
In fact, my code ran at pretty much identical speed.

So no advantage at all (in my case).

I did, however, shave a good ten milliseconds off my code by changing floats to Ints, and rejigging the maths as essentially ‘two floating point accuracy’.

Where would I be apply to apply these optimisations that you speak of, please?
Are these in the Arduino IDE somewhere? (I assume not, since I looked and didn’t see anything applicable)

Ken

RogerClark

Wed Nov 22, 2017 8:02 pm

On the F1 floats and doubles will be slower as it does not have a FPU

The F4 has a FPU so the speed would be almost identical for both int and float

Try the -O3 optimisation setting, you will probably notice a decent increase in speed

Personally, I would prefer if the default was -O2 but the view of the community is that they prefer the optimise for size option, which is the slowest.

KenLaszlo

Wed Nov 22, 2017 8:13 pm

Hi again Roger

Apparently, I wasn’t using your core (or your latest core).
I downloaded the files and placed them in the right place, restarted the Arduino IDE and now I see ‘Optimise’ in the tools drop-down menu.
Great!

I’ll quickly have a tinker (and a time), and report back.

Ken

KenLaszlo

Wed Nov 22, 2017 8:19 pm

Optimising my little sketch (which now uses no floats at all).

Smallest (standard) – 30.2 Microseconds
-01 setting – 31.4 Microseconds (slower!)
-02 setting – 29.5 Microseconds
-03 setting -29.6 Microseconds.
-03 with LTO -25.8 Microseconds (Sweet!!!!)

Thanks for this – It may not seem much of a difference but I may be adding some more maths in there, and like Tesco’s motto ‘every little helps’.

Ken

RogerClark

Wed Nov 22, 2017 8:22 pm

Some things are a lot faster with higher optimisation settings, but your code could potentially already be quite well optimised

dave j

Wed Nov 22, 2017 8:37 pm

The F4 (and F3) use the ARM M4 core which has 16 bit SIMD instructions and so can speed up 16 bit arithmetic. I don’t think GCC supports those instruction though so you have to use them via intrinsics or assembler. That doens’t help you though because the F1’s M3 core doesn’t have them.

In terms of using 16 bit data types to speed things up, the easiest thing to do is look for where you copy data – moving 32 bits at a time is faster than 2×16 bits. Look at the peripheral documentation to see if the peripherals have modes that can help (e.g. dual mode for ADC transfers).

Beyond that, it’s look to see if you can process two things at once – <32bit value> & 0xfff0fff0 is faster than two x <16bit value> & 0xfff0.

If you know the range of values you will using won’t cause problems you can sometimes get away with treating a 32 bit value as two 16 bit ones. e.g. If your input data is guaranteed to be 14 bits unsigned, you can multiply two values by 2 at a time using <32bit value> << 1.

RogerClark

Wed Nov 22, 2017 8:46 pm

ken,

You could post your code and ask people for suggestions to speed it up.

BTW I presume you already inlined the functions and looked at unrolling loops and also branch optimisation, and use of lookup tables etc etc

KenLaszlo

Fri Nov 24, 2017 12:08 am

Post the code? I might just do that . . .

It’s pretty much optimised but it could do with expert eyes.
We’ll see what this weekend brings.

Ken

RogerClark

Fri Nov 24, 2017 4:32 am

No worries

Someone may spot an optimisation…

Newbie question about ‘short’ and ‘uint16_t’ data types

Making examples easier to find

Custom STM32F103C8T6

Leave a Reply Cancel reply

Newbie question about ‘short’ and ‘uint16_t’ data types

New Posts

Related Posts

Leave a Reply Cancel reply