WS2812B (Neopixel) library for Libmaple using SPI DMA

RogerClark
Sun Jun 11, 2017 4:24 am
I’ve written a library to drive a WS2812B (aka Neopixel) strip using SPI and DMA.

https://github.com/rogerclarkmelbourne/ … 2_Libmaple

The API is based on the Adafruit library, but most of the code has been written from scratch, as the method of operation is completely different.

Notes.

Will probably only work with WS2812B and not the older WS2812, because of the timing constraints caused by using SPI to generate the pulse-train.

The library does not have all the functions of the Adafruit version, e.g. getPixelColor, because the pixel data is not easily accessible. In the longer term getPixelColor could be written, to read the encoded pulsetrain data and extract the pixel colour but I don’t have time to do this at at the moment, and none of the standard examples use it.

the Colour(r,g,b,w) does not make use of the w channel. At the moment I’m not sure how to use the w channel, and I’ll need to reverse engineer the Adafruit library to understand what that channel does.

The normal limitations also apply to this library, e.g. Theoretically the Data In requires a logic high voltage of 0.7 x Vdd. So if Vdd = 5, Vhi needs to be 3.5V which is more than the logic high on a 3.3V system. However in practice this seems to work OK for most people.

References

https://cdn-shop.adafruit.com/datasheets/WS2812.pdf
https://cdn-shop.adafruit.com/datasheets/WS2812B.pdf
http://rgb-123.com/files/WS2812B_VS_WS2812.pdf


racemaniac
Sun Jun 11, 2017 9:55 am
great work!
nicer coded than what i had so far :).

Just 2 small remarks after going trough the code:
– you still have the original comments from the neopixel library above your code, mentioning AVR’s and people who wrote that :p
– it’s fixed on which SPI port it uses? (if you want to support multiple ports, do mind that the divider is different since spi2 runs at half the speed of spi1 for the same divider)


RogerClark
Sun Jun 11, 2017 10:44 am
I had considered whether to make it work on SPI1 and SPI2, but I only had time to get it working on SPI1 today.

It should be easy to make it work on SPI2, I just need to work out the best way to allow users to select that, perhaps have the SPI channel as the second argument in the constructor and default it to SPI1.

I’d forgotten about the speed limitation on SPI2, but it won’t be a problem as its running on DIV32 on SPI1, so if SPI2’s clock is half of SPI1, then it would need to be DIV16

Re: References to AVR
Somehow the original file header got back in. (arrggh) I’d not spotted that. I’ll remove all that junk.

So I’ll go through and remove references to AVR

A few more things…

getPixelColor has not been implemented yet, as I didnt have time to write the code to read the encoded bit pattern from the buffer and convert back to RGB. It should be fairly straight forward to do, simply by reading the middle bit of each of the triplets

e.g something like

take the 24 bit encodedValue
clear the decoded (uint8_t)
for(0 to 7)
{
decoded |= (encodedValue>>1 & 0x01)<<8
decoded>>1
}

Also, I’m not sure what the w parameter in Color(r,g,b,w) does, Perhaps its for a different sort of LED,


zoomx
Sun Jun 11, 2017 1:15 pm
+1 !!

RogerClark
Sun Jun 11, 2017 9:59 pm
I think will will add this lib part of the libmaple core libraries.

The reason for this, is that it uses SPI.dmaSend which is a special function not available in other cores.

I will also change to use the new function SPI.dmaSendAsync(), because at the moment there is no benefit with using the SPI DMA apart from keeping the USB working because I do not need to disable the interrupts while the data is being sent.

This will require double buffering, which will take 18 bytes per LED, however there will be enough RAM to drive at lest 600 LEDs


racemaniac
Mon Jun 12, 2017 6:13 am
RogerClark wrote:I think will will add this lib part of the libmaple core libraries.

The reason for this, is that it uses SPI.dmaSend which is a special function not available in other cores.

I will also change to use the new function SPI.dmaSendAsync(), because at the moment there is no benefit with using the SPI DMA apart from keeping the USB working because I do not need to disable the interrupts while the data is being sent.

This will require double buffering, which will take 18 bytes per LED, however there will be enough RAM to drive at lest 600 LEDs


RogerClark
Mon Jun 12, 2017 7:07 am
I’ve spent some time on it today and double buffering with DMA is working well.

But it does not use a completion interrupt as we don’t have any code for that at the moment, as there are issues with callbacks to functions in C classes – though @victor_pv has been investigating callbacks and I’ve PM’ed him about this and I’m hoping he can help.

I added a new function to SPI called dmaSendAsync(), which returns immediately after the transfer has been started.
But, I also added a static flag to indicate if a transfer had already been started, and added code to the start of the function, which waits for completion of a transfer if one is in progress (as determined by the static flag)

The downside of this approach, is that I don’t know when a transfer has finished, so can’t time the 50uS Reset time.
However, as other people have already pointed out, the Reset time that is actually needed is only around 6uS, and the function call overhead and DMA tidy-up etc etc, plus the 8 byte of zeros that is send prior to the start of the real data, mean that I get around 12uS of unavoidable dead time between each transfer. So there is no need to add code to determine if enough Reset time has been allowed.

Actually, its taken several hours to get this working, because double buffering caused all sorts of strange effects when running the standard example like colorWipe()

I initially thought that I’d messed up the buffer pointers, or perhaps I was writing to the same buffer that was being etc etc

However after hours of trying to send all zeros or all ones etc to confirm that none of the above was happing….

I realised that the LED effects like colorWipe() do not build the data for the entire strip of leds (30 in my case), they often just modify 1 or 2 LEDs, and because I had 2 separate buffers, the effects were building on the data from the show() from 2 frames ago

E.g. colorWipe() of Red (0=off) looks like

OOOOOOOOOO
ROOOOOOOOO
RROOOOOOOO
RRROOOOOOO
RRRROOOOOO

Etc

But because I have 2 buffers I ended up with

OOOOOOOOOO (buffer 1 and 2 initial all off)
ROOOOOOOOO (buffer 2)
OROOOOOOOO (buffer 1 plus R in 2nd LED)
ROROOOOOOO (buffer 2 plus R in 3rd LED)
OROROOOOOO (buffer 1 plus R in 4nd LED)
etc
So I ended up with alternately flashing Red leds

Anyway, the solution is to copy the buffer each time its send via SPI. This does waste a bit of time, but I do the buffer copy after the dmaSendAsync has been called, so that its done in parallel with the data being sent to the LEDs

I could probably do the memory copy in DMA, but I think for small numbers of bytes e.g 30 LEDS = 272 bytes, that the setup time would probably not be worth it. And it could be better to code it in inline assembler.

BTW. I also made some speed improvements by making one LUT rather than 3 (R,G,B) LUTS as this allows for pointer incrementing when reading the encoded values as well as the output position

I’ve updated my github repo, and I’ve also added it to the Libmaple F1 core libraries, so if you want to try it, please download the latest version of the core from github


racemaniac
Wed Jun 14, 2017 2:55 pm
I just realised i had set up a different double buffering system for my projects:
I had the buffer you write into, which just contains the real values, not the neopixel signals. Triggering a send would then convert this buffer to a bigger neopixel signal buffer that would then get sent via DMA.
This is of course also a slower approach since before sending you still have all the conversion effort, but there are no synchronization issues/difficulties :).

RogerClark
Wed Jun 14, 2017 10:11 pm
@racemaniac

I had the same thought about only doing the conversion, just before sending, but I have a feeling that would be slower.

My latest code takes 990ns to set a encoded pixel and about 660nS to sent a non encoded pixel.

I have not timed how long it takes to copy the encoded buffer, but it will be less than the time to encode all the pixels.

The only way I can see to speed this up, would be to keep track of which pixels in the RGB ( unencoded buffer) had changed and then only encode those pixels prior to transmission.

You could easily to this using a flag in the top byte of the RGB as its uint32

But this would only be faster for when just a few pixels need to be updated.

I think Rick has a good point about the STM32 not being very good for this application. The other MCU which I have, which may be good for this, are the Nordic nRF51 and nRF52, as it has a separate hardware section called the PPI which can do all sorts of fancy things based on timers and DMA.
But I have not developed using the PPI, and I dont have time at the moment to learn about it :-(


Nutsy
Wed Jun 21, 2017 10:10 am
Thought id add my two tidbits, in my speedo project I used the ported version of the library extensively, and even there the get pixel colour function wasnt perfect, because it was data stored in an array after the brightness value changes had been added…

A simple fix for that and for making a get pixel colour here. Was to create a primary array that you place all your colour values into and later run a loop to transfer that data into the led set colour function.

Only downside to this was in essence you doubled your led ram usage…


RogerClark
Wed Jun 21, 2017 11:19 am
@nutsy

Thanks.

I don’t think there is a easy or perfect solution to use these devices

I’m not convinced my method is that good, as it depends how many pixels you change before calling show(), whether the performance is better or worse than doing it the bit-banged way


racemaniac
Wed Jun 21, 2017 11:31 am
The main problem for these kind of things is that everyone has got his own requirements/priorities. For my implementation i gave up a bit of transmission speed & memory, in order to make the building the buffer part as fast as possible. As seen in the tests i did on DMA, even if the sending is now 33% slower, it won’t matter for all the other things i’m doing while the DMA is running, and is still plenty fast for the amount of leds i’ll be controlling :). So for me that was the optimal trade off. But of course for each user those trade offs will be different.

Luckily, for most users who don’t want to get every bit of power from the microcontroller, whichever implementation you choose won’t matter, all they care about is having the neopixels up & running in 10 minutes XD.


RogerClark
Wed Jun 21, 2017 10:51 pm
[racemaniac – Wed Jun 21, 2017 11:31 am] –

Luckily, for most users who don’t want to get every bit of power from the microcontroller, whichever implementation you choose won’t matter, all they care about is having the neopixels up & running in 10 minutes XD.

Thats true..

The only reason I started to look at using SPI DMA was because the bit-banged library that someone else had tried to port to STM32 did not work for me.
So I had to fix that first, and then it got me thinking because it loops in assembler for timing ( which didnt seem to give consistent timings)
Hence why I thought SPI which has a fixed clock freq could be better.

If i get chance, I will try to do some timings on my SPI version,because at the moment, I dont know how long the memcpy takes, so I dont know the efficiency of my system


racemaniac
Thu Jun 22, 2017 5:41 am
indeed, i also think the spi dma version is going to be the easiest to use, it doesn’t depend on critical timings/disabling interrupts/… that’s going to make the library a lot easier to use.

Leave a Reply

Your email address will not be published. Required fields are marked *