library for driving leds with stm32f103

octavio

Tue Sep 12, 2017 2:46 pm

https://github.com/octavio4/parallel-output-stm32
This library is non blocking ,it uses dma and irq to send the data while the main program can do other tasks,and is very flexible on the pins used.
It supports ws2812,apa102 and dmx protocols.

RogerClark

Tue Sep 12, 2017 8:56 pm

Thanks for sharing.

Can you explain what this is used for.

Is WS28 protocol for addressable leds aka NeoPixels? If so, have you compared the performance with just bit banging or less complicated methods ?

Both Rick Kimball and I have also attempted to write libraries to use DMA ( and in my case SPI DMA) to send data to addressable pixels, but the setup time required to configure the encoded data buffers is only faster than bit banging under some limited circumstances.

victor_pv

Tue Sep 12, 2017 10:22 pm

Looking at the library I saw it supports up to 4 strings, in parallel. I wonder how many led per string and how many updates per second, but it those can be driven at a good rate that”s impressive.

octavio

Tue Sep 12, 2017 10:58 pm

>Is WS28 protocol for addressable leds aka NeoPixels?
Yes,and also the apa102 (spi) and DMX (250kbps serial ).Y have not tried your library,but compared with Fastled:
drive 4 strips in parallel ,Fastled can handle 8 strips with teensy but for the stm32f103 only one strip at a time is supported,so my lib is 4 times faster.
My library allows the aplication to continue working while the data is been sent and uses (extimated )less than 30% of cpu,whith fastled your code has to wait until the transfer is complete to do other tasks.As a reference i have a program that reads data from sd card does some image processing (mix samples,color changes etc..) at 60fps with 4*512 ws2812 pixels and is a bit faster with the apa102 leds.

octavio

Tue Sep 12, 2017 11:11 pm

> I wonder how many led per string.
There is no limit,in the example there is a function that reads data from memory,but you can write a routine(see get_bits) that uses no memory bufer and generates pixel data on demand or simply puts all pixels with the same color. Note that a DMX universe is limited by definition to 512/3 pixels (and fps is also limited) ,but the library itself has no limit and clock settings can be changed to support new led types.

RogerClark

Wed Sep 13, 2017 1:58 am

Could you briefly explain how you build and send data for WS28 protocol ?

I know the protocol sends data by sending a High pulse followed by a Low pulse for each data bit

The length of the High pulse determines whether the bit is 1 or 0.

A long High and short Low = 1
A short High and long Low = 0

Do you encode a buffer with 3 bits per pixel bit ?

e.g.

110 = Data “1” bit
100 = Data “0” bit

???

Or does your code look use an interrupt called 3 times per Pixel data bit ?

Or some other method ?

octavio

Wed Sep 13, 2017 12:06 pm

For ws2812 the irq is called one time for pixel (24 bits per strip),then the handler fills a dma buffer with 24*3 words,the first word sets the pin to 1 ,second word is data, and the last sets the pin to 0, words 1 and 3 are only written on the 2 first and 2 last irqs.

octavio

Wed Sep 13, 2017 12:11 pm

110 = Data “1” bit
100 = Data “0” bit
there is also the posibility to encode like this:
npk=4
1100 = Data “1” bit
1000 = Data “0” bit
npk=5
11000 = Data “1” bit
10000 = Data “0” bit
to reduce the transfer speed if needed.

RogerClark

Wed Sep 13, 2017 9:49 pm

Thanks for the explanation

@racemaniac also wrote a system for WS28 which uses 32 bits per pixel.

I used 24 bit because it uses less RAM and I build the whole buffer before sending via DMA SPI
( but my code can only do one string)

I can see with your code, which handles multiple strings, that it would take too much RAM to build a buffer for multiple strings, hence you have to build each pixel in the ISR

I used LUTs to encode the pixel data into 24 bits, but I am not sure if that would speed up your code.

I found it very hard to make a real system which ran much faster than simply bit banging the data.

Did you analyse what percentage of time your code spent inside the ISR?

Did you analyse how long it took your code to send 1 string, vs how long it took via bit-banging? i.e. what was the delay between each pixel while you encoded ? Or does your ISR fill the DMA buffer as fast as its being sent ?

BTW. i wanted to make my library compatible with the Adafruit neopixel library, but had problems sending asynchronously, and had to double buffer the encoded data to prevent it being overwritten with new data while the old data was still being sent

octavio

Wed Sep 13, 2017 11:27 pm

The ISR fills the DMA buffer faster than its being sent ,so speed is limited by the 800kbps(can work also at 1Mbps) used by the ws2812,writting 4 strips at a time speed up x4.
If you want to know how many cpu clks takes to process a pixel ,you can look at the assembly output and count,(i can’t do this with arm assembly).
Another way is to use apa102 leds,increassing the speed until something does not work.The code for apa102 and ws2812 is very similar and has the same
performance. ” setup_apa(48); ” means that a bit is sent every 48 cpu clocks ,so in this case the clock is 1.5Mhz,try to reduce this value until something fails,then you will know with precision the performance of all the isr code ( dma_apa() + get_rgb44() + transpose() ).In my program(not published) i do more things on the ISR ,so i know that speed can be increased if less processing is done.
>I used LUTs to encode the pixel data into 24 bits, but I am not sure if that would speed up your code.
No ,it works different.Each dma transfer equals to 1 bit output ,dma writes 32bits to GPIOA->regs->BSRR ,upper 16 bits sets corresponding pin to 0 while lower 16 bits sets the pin to 1,a value of 0 does not change the pin.The “output_table[]” is used to translate 4 bits of data to the 32bits word that sets the selected pins. The code can be changed to work with more led strips ,and this would be faster,but for my project i only need 4.

RogerClark

Thu Sep 14, 2017 12:05 am

Thanks…

Sounds like a good system.

Encoding on the fly means that you don’t need to hold a large “encoded” buffer, which is extra important if you need to double buffer the whole “pixel” string RGB data.

For multiple strings the ISR latency is something worth putting up with…. There have been several interesting threads about external trigger ISR latency, which varies depending if the ISR vector is shared across multiple pins or not.
I suspect in your case, the ISR vector will not be shared, otherwise the code would be less efficient.
In fact, I’m not sure about latency of ISR’s when triggered by a timer, and whether this is as slow as external GPIO triggering, or how much call overhead there is.

We briefly investigated trying to optimise the shared ISR latency, but can only shave about 10 % off the speed. (but this won’t apply to your usage)

octavio

Thu Sep 14, 2017 1:49 pm

To avoid latency problems the irq priority is set higher than default nvic_irq_set_priority(NVIC_DMA_CH1,14);

hackstage

Tue Feb 05, 2019 4:08 pm

I have been trying to modify the library to run 8 parallel outputs for RGBW leds for a couple of weeks now – but little luck so far. Can you give some advice on how it can be done? The previous comments on this forum have been very helpful!

library for driving leds with stm32f103

STM32-UNO(ARMDuino)

Yet another meter.

Leave a Reply Cancel reply

library for driving leds with stm32f103

New Posts

Related Posts

Leave a Reply Cancel reply