This library is non blocking ,it uses dma and irq to send the data while the main program can do other tasks,and is very flexible on the pins used.
It supports ws2812,apa102 and dmx protocols.
Can you explain what this is used for.
Is WS28 protocol for addressable leds aka NeoPixels? If so, have you compared the performance with just bit banging or less complicated methods ?
Both Rick Kimball and I have also attempted to write libraries to use DMA ( and in my case SPI DMA) to send data to addressable pixels, but the setup time required to configure the encoded data buffers is only faster than bit banging under some limited circumstances.
Yes,and also the apa102 (spi) and DMX (250kbps serial ).Y have not tried your library,but compared with Fastled:
drive 4 strips in parallel ,Fastled can handle 8 strips with teensy but for the stm32f103 only one strip at a time is supported,so my lib is 4 times faster.
My library allows the aplication to continue working while the data is been sent and uses (extimated )less than 30% of cpu,whith fastled your code has to wait until the transfer is complete to do other tasks.As a reference i have a program that reads data from sd card does some image processing (mix samples,color changes etc..) at 60fps with 4*512 ws2812 pixels and is a bit faster with the apa102 leds.
There is no limit,in the example there is a function that reads data from memory,but you can write a routine(see get_bits) that uses no memory bufer and generates pixel data on demand or simply puts all pixels with the same color. Note that a DMX universe is limited by definition to 512/3 pixels (and fps is also limited) ,but the library itself has no limit and clock settings can be changed to support new led types.
I know the protocol sends data by sending a High pulse followed by a Low pulse for each data bit
The length of the High pulse determines whether the bit is 1 or 0.
A long High and short Low = 1
A short High and long Low = 0
Do you encode a buffer with 3 bits per pixel bit ?
e.g.
110 = Data “1” bit
100 = Data “0” bit
???
Or does your code look use an interrupt called 3 times per Pixel data bit ?
Or some other method ?
100 = Data “0” bit
there is also the posibility to encode like this:
npk=4
1100 = Data “1” bit
1000 = Data “0” bit
npk=5
11000 = Data “1” bit
10000 = Data “0” bit
to reduce the transfer speed if needed.
@racemaniac also wrote a system for WS28 which uses 32 bits per pixel.
I used 24 bit because it uses less RAM and I build the whole buffer before sending via DMA SPI
( but my code can only do one string)
I can see with your code, which handles multiple strings, that it would take too much RAM to build a buffer for multiple strings, hence you have to build each pixel in the ISR
I used LUTs to encode the pixel data into 24 bits, but I am not sure if that would speed up your code.
I found it very hard to make a real system which ran much faster than simply bit banging the data.
Did you analyse what percentage of time your code spent inside the ISR?
Did you analyse how long it took your code to send 1 string, vs how long it took via bit-banging? i.e. what was the delay between each pixel while you encoded ? Or does your ISR fill the DMA buffer as fast as its being sent ?
BTW. i wanted to make my library compatible with the Adafruit neopixel library, but had problems sending asynchronously, and had to double buffer the encoded data to prevent it being overwritten with new data while the old data was still being sent
If you want to know how many cpu clks takes to process a pixel ,you can look at the assembly output and count,(i can’t do this with arm assembly).
Another way is to use apa102 leds,increassing the speed until something does not work.The code for apa102 and ws2812 is very similar and has the same
performance. ” setup_apa(48); ” means that a bit is sent every 48 cpu clocks ,so in this case the clock is 1.5Mhz,try to reduce this value until something fails,then you will know with precision the performance of all the isr code ( dma_apa() + get_rgb44() + transpose() ).In my program(not published) i do more things on the ISR ,so i know that speed can be increased if less processing is done.
>I used LUTs to encode the pixel data into 24 bits, but I am not sure if that would speed up your code.
No ,it works different.Each dma transfer equals to 1 bit output ,dma writes 32bits to GPIOA->regs->BSRR ,upper 16 bits sets corresponding pin to 0 while lower 16 bits sets the pin to 1,a value of 0 does not change the pin.The “output_table[]” is used to translate 4 bits of data to the 32bits word that sets the selected pins. The code can be changed to work with more led strips ,and this would be faster,but for my project i only need 4.
Sounds like a good system.
Encoding on the fly means that you don’t need to hold a large “encoded” buffer, which is extra important if you need to double buffer the whole “pixel” string RGB data.
For multiple strings the ISR latency is something worth putting up with…. There have been several interesting threads about external trigger ISR latency, which varies depending if the ISR vector is shared across multiple pins or not.
I suspect in your case, the ISR vector will not be shared, otherwise the code would be less efficient.
In fact, I’m not sure about latency of ISR’s when triggered by a timer, and whether this is as slow as external GPIO triggering, or how much call overhead there is.
We briefly investigated trying to optimise the shared ISR latency, but can only shave about 10 % off the speed. (but this won’t apply to your usage)
nvic_irq_set_priority(NVIC_DMA_CH1,14);
