https://github.com/rogerclarkmelbourne/ … 2_Libmaple
The API is based on the Adafruit library, but most of the code has been written from scratch, as the method of operation is completely different.
Notes.
Will probably only work with WS2812B and not the older WS2812, because of the timing constraints caused by using SPI to generate the pulse-train.
The library does not have all the functions of the Adafruit version, e.g. getPixelColor, because the pixel data is not easily accessible. In the longer term getPixelColor could be written, to read the encoded pulsetrain data and extract the pixel colour but I don’t have time to do this at at the moment, and none of the standard examples use it.
the Colour(r,g,b,w) does not make use of the w channel. At the moment I’m not sure how to use the w channel, and I’ll need to reverse engineer the Adafruit library to understand what that channel does.
The normal limitations also apply to this library, e.g. Theoretically the Data In requires a logic high voltage of 0.7 x Vdd. So if Vdd = 5, Vhi needs to be 3.5V which is more than the logic high on a 3.3V system. However in practice this seems to work OK for most people.
References
https://cdn-shop.adafruit.com/datasheets/WS2812.pdf
https://cdn-shop.adafruit.com/datasheets/WS2812B.pdf
http://rgb-123.com/files/WS2812B_VS_WS2812.pdf
nicer coded than what i had so far

Just 2 small remarks after going trough the code:
– you still have the original comments from the neopixel library above your code, mentioning AVR’s and people who wrote that :p
– it’s fixed on which SPI port it uses? (if you want to support multiple ports, do mind that the divider is different since spi2 runs at half the speed of spi1 for the same divider)
It should be easy to make it work on SPI2, I just need to work out the best way to allow users to select that, perhaps have the SPI channel as the second argument in the constructor and default it to SPI1.
I’d forgotten about the speed limitation on SPI2, but it won’t be a problem as its running on DIV32 on SPI1, so if SPI2’s clock is half of SPI1, then it would need to be DIV16
Re: References to AVR
Somehow the original file header got back in. (arrggh) I’d not spotted that. I’ll remove all that junk.
So I’ll go through and remove references to AVR
A few more things…
getPixelColor has not been implemented yet, as I didnt have time to write the code to read the encoded bit pattern from the buffer and convert back to RGB. It should be fairly straight forward to do, simply by reading the middle bit of each of the triplets
e.g something like
take the 24 bit encodedValue
clear the decoded (uint8_t)
for(0 to 7)
{
decoded |= (encodedValue>>1 & 0x01)<<8
decoded>>1
}
Also, I’m not sure what the w parameter in Color(r,g,b,w) does, Perhaps its for a different sort of LED,
The reason for this, is that it uses SPI.dmaSend which is a special function not available in other cores.
I will also change to use the new function SPI.dmaSendAsync(), because at the moment there is no benefit with using the SPI DMA apart from keeping the USB working because I do not need to disable the interrupts while the data is being sent.
This will require double buffering, which will take 18 bytes per LED, however there will be enough RAM to drive at lest 600 LEDs
The reason for this, is that it uses SPI.dmaSend which is a special function not available in other cores.
I will also change to use the new function SPI.dmaSendAsync(), because at the moment there is no benefit with using the SPI DMA apart from keeping the USB working because I do not need to disable the interrupts while the data is being sent.
This will require double buffering, which will take 18 bytes per LED, however there will be enough RAM to drive at lest 600 LEDs
But it does not use a completion interrupt as we don’t have any code for that at the moment, as there are issues with callbacks to functions in C classes – though @victor_pv has been investigating callbacks and I’ve PM’ed him about this and I’m hoping he can help.
I added a new function to SPI called dmaSendAsync(), which returns immediately after the transfer has been started.
But, I also added a static flag to indicate if a transfer had already been started, and added code to the start of the function, which waits for completion of a transfer if one is in progress (as determined by the static flag)
The downside of this approach, is that I don’t know when a transfer has finished, so can’t time the 50uS Reset time.
However, as other people have already pointed out, the Reset time that is actually needed is only around 6uS, and the function call overhead and DMA tidy-up etc etc, plus the 8 byte of zeros that is send prior to the start of the real data, mean that I get around 12uS of unavoidable dead time between each transfer. So there is no need to add code to determine if enough Reset time has been allowed.
Actually, its taken several hours to get this working, because double buffering caused all sorts of strange effects when running the standard example like colorWipe()
I initially thought that I’d messed up the buffer pointers, or perhaps I was writing to the same buffer that was being etc etc
However after hours of trying to send all zeros or all ones etc to confirm that none of the above was happing….
I realised that the LED effects like colorWipe() do not build the data for the entire strip of leds (30 in my case), they often just modify 1 or 2 LEDs, and because I had 2 separate buffers, the effects were building on the data from the show() from 2 frames ago
E.g. colorWipe() of Red (0=off) looks like
OOOOOOOOOO
ROOOOOOOOO
RROOOOOOOO
RRROOOOOOO
RRRROOOOOO
Etc
But because I have 2 buffers I ended up with
OOOOOOOOOO (buffer 1 and 2 initial all off)
ROOOOOOOOO (buffer 2)
OROOOOOOOO (buffer 1 plus R in 2nd LED)
ROROOOOOOO (buffer 2 plus R in 3rd LED)
OROROOOOOO (buffer 1 plus R in 4nd LED)
etc
So I ended up with alternately flashing Red leds
Anyway, the solution is to copy the buffer each time its send via SPI. This does waste a bit of time, but I do the buffer copy after the dmaSendAsync has been called, so that its done in parallel with the data being sent to the LEDs
I could probably do the memory copy in DMA, but I think for small numbers of bytes e.g 30 LEDS = 272 bytes, that the setup time would probably not be worth it. And it could be better to code it in inline assembler.
BTW. I also made some speed improvements by making one LUT rather than 3 (R,G,B) LUTS as this allows for pointer incrementing when reading the encoded values as well as the output position
I’ve updated my github repo, and I’ve also added it to the Libmaple F1 core libraries, so if you want to try it, please download the latest version of the core from github
I had the buffer you write into, which just contains the real values, not the neopixel signals. Triggering a send would then convert this buffer to a bigger neopixel signal buffer that would then get sent via DMA.
This is of course also a slower approach since before sending you still have all the conversion effort, but there are no synchronization issues/difficulties

I had the same thought about only doing the conversion, just before sending, but I have a feeling that would be slower.
My latest code takes 990ns to set a encoded pixel and about 660nS to sent a non encoded pixel.
I have not timed how long it takes to copy the encoded buffer, but it will be less than the time to encode all the pixels.
The only way I can see to speed this up, would be to keep track of which pixels in the RGB ( unencoded buffer) had changed and then only encode those pixels prior to transmission.
You could easily to this using a flag in the top byte of the RGB as its uint32
But this would only be faster for when just a few pixels need to be updated.
I think Rick has a good point about the STM32 not being very good for this application. The other MCU which I have, which may be good for this, are the Nordic nRF51 and nRF52, as it has a separate hardware section called the PPI which can do all sorts of fancy things based on timers and DMA.
But I have not developed using the PPI, and I dont have time at the moment to learn about it
A simple fix for that and for making a get pixel colour here. Was to create a primary array that you place all your colour values into and later run a loop to transfer that data into the led set colour function.
Only downside to this was in essence you doubled your led ram usage…
Thanks.
I don’t think there is a easy or perfect solution to use these devices
I’m not convinced my method is that good, as it depends how many pixels you change before calling show(), whether the performance is better or worse than doing it the bit-banged way

Luckily, for most users who don’t want to get every bit of power from the microcontroller, whichever implementation you choose won’t matter, all they care about is having the neopixels up & running in 10 minutes XD.
[racemaniac – Wed Jun 21, 2017 11:31 am] –
Luckily, for most users who don’t want to get every bit of power from the microcontroller, whichever implementation you choose won’t matter, all they care about is having the neopixels up & running in 10 minutes XD.
Thats true..
The only reason I started to look at using SPI DMA was because the bit-banged library that someone else had tried to port to STM32 did not work for me.
So I had to fix that first, and then it got me thinking because it loops in assembler for timing ( which didnt seem to give consistent timings)
Hence why I thought SPI which has a fixed clock freq could be better.
If i get chance, I will try to do some timings on my SPI version,because at the moment, I dont know how long the memcpy takes, so I dont know the efficiency of my system