Working code in this post, and an explanation how to modify sdFat to use it in the next post after that:
http://www.stm32duino.com/viewtopic.php … =60#p30306
======================================================
This is something Roger has suggested several times, and it seems to only make sense.
Currently the dma TX/RX functions we added to the SPI library block until the DMA transfer is completed or has timed out.
This change would add 2 functions to the SPI library, similar to the Arduino official I2S library:
onTransmit(handler);
onReceive(handler);
Its probably worth PM’ing @stevstrong if he does not see this thread, as he has done a lot of work in SPI recently.
I think I have one pending PR from stev, but it seems to slow the SPI down, hence I have not actioned it yet. But it may be necessary as it contains bug fixes.
I think there is a general problem with callbacks into C++ classes, as only static functions addresses can be accessed.
So it would need to be a shared ISR for all instances.
Other APIs I work with, let you pass a pointer to the callback function, into the Transfer function. Which seems logical to me.
Its probably worth PM’ing @stevstrong if he does not see this thread, as he has done a lot of work in SPI recently.
I think I have one pending PR from stev, but it seems to slow the SPI down, hence I have not actioned it yet. But it may be necessary as it contains bug fixes.
I think there is a general problem with callbacks into C++ classes, as only static functions addresses can be accessed.
So it would need to be a shared ISR for all instances.
Other APIs I work with, let you pass a pointer to the callback function, into the Transfer function. Which seems logical to me.
I have not tested @stevstrongs latest PR, but the previous one was slower than what we had before ![]()
Re: one ISR per SPI channel
Sounds OK to me.
thanks
roger
Unfortunatelly the DMA slows down the CPU very strongly, so at the end, including the ovehead to init the DMA, no time saving could take place compared to the blocking non-DMA version. I tested it with the Adafruit graphics test. You can follow the results from that thread.
My version was designed to reserve the respective DMA channel (i think channel 3) only for SPI. Still, i was not happy with the result.
Unfortunatelly the DMA slows down the CPU very strongly, so at the end, including the ovehead to init the DMA, no time saving could take place compared to the blocking non-DMA version. I tested it with the Adafruit graphics test. You can follow the results from that thread.
My version was designed to reserve the respective DMA channel (i think channel 3) only for SPI. Still, i was not happy with the result.
The display lib was also partially writing larger blocks, no overall speed gain achieved however.
Once again, the cpu is slowed down strongly by the DMA running in background.
Dont understand me wrong, i see the theoretical benefit, that is why i also tested it. Still, the results did not convince me.
As i dont have any other application where saved time would play larger role than in the display lib, i have given up to push it into the repo.
But feel free to get it run. If you think it could help i could share my local version.
The display lib was also partially writing larger blocks, no overall speed gain achieved however.
Once again, the cpu is slowed down strongly by the DMA running in background.
Dont understand me wrong, i see the theoretical benefit, that is why i also tested it. Still, the results did not convince me.
As i dont have any other application where saved time would play larger role than in the display lib, i have given up to push it into the repo.
But feel free to get it run. If you think it could help i could share my local version.
The display lib was also partially writing larger blocks, no overall speed gain achieved however.
Once again, the cpu is slowed down strongly by the DMA running in background.
Dont understand me wrong, i see the theoretical benefit, that is why i also tested it. Still, the results did not convince me.
As i dont have any other application where saved time would play larger role than in the display lib, i have given up to push it into the repo.
But feel free to get it run. If you think it could help i could share my local version.
You forgot the overhead to setup the DMA before each transaction. And if you implement the callback at job end, this will also take time and block completely the CPU from doing other tasks.
Thus, dependent on the SPI clock speed, the overhead together with the post-processing can take the time necessary to transfer, let’s say, 25 bytes.
So if you transfer 20 bytes without DMA, it is faster than transferring it with DMA.
Hence, again, to choose the appropriate strategy strongly depends on the application.
If you always write blocks of 256 bytes or more and have a lot of tasks to do between consecutive block writes (not only to wait for the previous SPI job to finish), then using DMA is clearly a good approach. Otherwise it can be slower than the non-DMA version.
You forgot the overhead to setup the DMA before each transaction. And if you implement the callback at job end, this will also take time and block completely the CPU from doing other tasks.
Thus, dependent on the SPI clock speed, the overhead together with the post-processing can take the time necessary to transfer, let’s say, 25 bytes.
So if you transfer 20 bytes without DMA, it is faster than transferring it with DMA.
Hence, again, to choose the appropriate strategy strongly depends on the application.
If you always write blocks of 256 bytes or more and have a lot of tasks to do between consecutive block writes (not only to wait for the previous SPI job to finish), then using DMA is clearly a good approach. Otherwise it can be slower than the non-DMA version.
13.3 DMA functional description
The DMA controller performs direct memory transfer by sharing the system bus with the
Cortex®-M3 core. The DMA request may stop the CPU access to the system bus for some
bus cycles, when the CPU and DMA are targeting the same destination (memory or
peripheral). The bus matrix implements round-robin scheduling, thus ensuring at least half
of the system bus bandwidth (both to memory and peripheral) for the CPU.
So the CPU may be slowed down to the half of its speed capacity. The higher the SPI clock, the worse the situation for the CPU if performs a lot of memory accesses.
13.3 DMA functional description
The DMA controller performs direct memory transfer by sharing the system bus with the
Cortex®-M3 core. The DMA request may stop the CPU access to the system bus for some
bus cycles, when the CPU and DMA are targeting the same destination (memory or
peripheral). The bus matrix implements round-robin scheduling, thus ensuring at least half
of the system bus bandwidth (both to memory and peripheral) for the CPU.
So the CPU may be slowed down to the half of its speed capacity. The higher the SPI clock, the worse the situation for the CPU if performs a lot of memory accesses.
13.3 DMA functional description
The DMA controller performs direct memory transfer by sharing the system bus with the
Cortex®-M3 core. The DMA request may stop the CPU access to the system bus for some
bus cycles, when the CPU and DMA are targeting the same destination (memory or
peripheral). The bus matrix implements round-robin scheduling, thus ensuring at least half
of the system bus bandwidth (both to memory and peripheral) for the CPU.
So the CPU may be slowed down to the half of its speed capacity. The higher the SPI clock, the worse the situation for the CPU if performs a lot of memory accesses.
So the CPU may be slowed down to the half of its speed capacity. The higher the SPI clock, the worse the situation for the CPU if performs a lot of memory accesses.
indeed
And i’m still wondering about the performance hit for what is running simultaneous to it
. i’m really going to give that a try when i have some time, it’s very good to know
.
any suggestions on what to benchmark it with? just a dhrystone benchmark or so?
indeed
indeed
I will post my version here in the evening.
I will post my version here in the evening.
indeed
there indeed is no silver bullet or “best” solution >_<. i’d still like the cheap setup functions for when using DMA directly, but for usage in frameworks, the full setup is indeed the safer choice.
I am attaching my local version of SPI.cpp, where I started to implement a dual-buffered DMA transfer: while sending one buffer, the other buffer gets filled. When the first is sent, in the ISR is checked whether the other buffer contains data to send or not. If yes, the DMA will be setup automatically again. So far the plan, but the ISR part is not yet coded.
The interesting functions start from line 479, conditioned by the SPI_USE_DMA define.
Hopefully you can get some ideas from it.
This is based on the principle that the SPI transfer requests are non-blocking. That means that the requests are put into a queue. When taking a request from the queue the following actions are done
– the device address is sent using polling
– the register number is sent using polling
– one or two DMAs are setup depending if it is read, write, or simultaneous read and write
– interrupt is set for DMA completion
– once the interrupt is received, the “callback” is executed and the next transfer is started
This code demonstrates how the speed and other SPI settings can be changed for different devices and how the SPI configuration registers are modified only when required.
This code demonstrates how the LCD driver command/data line is driven synchronized with the SPI transfers.
Cheers, Ollie
Cheers, Ollie
Anyone looked into what Paul Stoffregen is doing around Teensy 3.1/3.2 ?
Anyone looked into what Paul Stoffregen is doing around Teensy 3.1/3.2 ?
There is another library, DmaSpi, which is the one that has DMA capabilities:
https://github.com/crteensy/DmaSpi
That one works similar to what Steve was working on, it queues transfers with a series of properties (buffer pointer, size, and a pin object to control CS), then services those transfers in order.
Not sure if all that is worth the effort, as Steve tested, for small transfers doesn’t improve performance due to all the overhead, and the libraries using it have to be heavily modified.
Its not worth adding that level of complexity as hardly any one will use it.
Basically I just separated the code that was previously in dmaSend in two parts, dmaSensdSet configured the transfer, except for the length, and enabling the channel. Next you call dmaSendRepeat (I dont really like that name, but can’t think on other appropriate one that I like) with the length of the transfer, and that sets the DMACNDTR register, which needs to be reloaded for a new transfer, and enables the channel. If a callback function has been set previously, doesn’t block and return 0 for success. If a callback function was not set, it will block.
SPI.dmaSend has the exact same functionality as before, so any code using it should not break in any way, but now it uses the other 2 functions to do the work.
This way if you want to send a number of bytes from the same buffer pointer, you only call SPI.dmaSendRepeat with 1 parameter, the number of bytes to send, everything else stays the same as the last transfer, so you only need to fill the buffer and fire the transfer.
What do you guys think? This still doesn’t implement the internal buffer Steven had been testing, but code that repeat transmissions from the same buffer should get less overhead.
EDIT: I realize it may be good to add checks to confirm whether there is a transmission already in progress before changing settings. It depends if we prefer that little overhead for safety.
uint8 SPIClass::dmaSendRepeat(uint16 length)
{
if (length == 0) return 1;
dma_set_num_transfers(_currentSetting->spiDmaDev, _currentSetting->spiTxDmaChannel, length);
dma_enable(_currentSetting->spiDmaDev, _currentSetting->spiTxDmaChannel);// enable transmit
if (_currentSetting->TXcallback){
return 0;
}
uint32_t m = millis();
uint8 b = 0;
while ((dma_get_isr_bits(_currentSetting->spiDmaDev, _currentSetting->spiTxDmaChannel) & DMA_ISR_TCIF1)==0) {//Avoid interrupts and just loop waiting for the flag to be set.
if ((millis() - m) > DMA_TIMEOUT) { b = 2; break; }
}
dma_clear_isr_bits(_currentSetting->spiDmaDev, _currentSetting->spiTxDmaChannel);
while (spi_is_tx_empty(_currentSetting->spi_d) == 0); // "5. Wait until TXE=1 ..."
while (spi_is_busy(_currentSetting->spi_d) != 0); // "... and then wait until BSY=0 before disabling the SPI."
dma_disable(_currentSetting->spiDmaDev, _currentSetting->spiTxDmaChannel);
spi_tx_dma_disable(_currentSetting->spi_d);
//uint16 x = spi_rx_reg(_currentSetting->spi_d); // dummy read, needed, don't remove!
return b;
}
void SPIClass::dmaSendSet(void * transmitBuf, bool minc)
{
uint32 flags = ( (DMA_MINC_MODE*minc) | DMA_FROM_MEM | DMA_TRNS_CMPLT);
dma_init(_currentSetting->spiDmaDev);
// TX
spi_tx_dma_enable(_currentSetting->spi_d);
dma_xfer_size dma_bit_size = (_currentSetting->dataSize==SPI_DATA_SIZE_16BIT) ? DMA_SIZE_16BITS : DMA_SIZE_8BITS;
dma_setup_transfer(_currentSetting->spiDmaDev, _currentSetting->spiTxDmaChannel, &_currentSetting->spi_d->regs->DR, dma_bit_size,
transmitBuf, dma_bit_size, flags);// Transmit buffer DMA
}
uint8 SPIClass::dmaSend(void * transmitBuf, uint16 length, bool minc)
{
dmaSendSet(transmitBuf, minc);
return dmaRepeat(length);
}
The blocking / non blocking selection via the callback argument being non null is what Nordic semi do in their SDK / API, so if its good enough for them, I think its good enough for us.
Also
dmaSendRepeat() seems a prefectly good name to me, as its descriptive and concise
It should be non-blocking if a non-null CB function is passed as parameter.
I would really tend to reserve the DMA channels for SPI, if configured by the user, in order to remove the overhead calling dmaSend/Set each time if only the buffer pointer and its length have changed. This way one could quickly switch between two buffers, one being filled while the previous one being sent.
I think it is very unlikely to use same DMA channel for other purposes if one use the DMA for SPI, which means the SPI is constantly working, and then with DMA, like display-related applications.
We either need to have on demand DMA stream request/release or static allocations with conflict detection.
We either need to have on demand DMA stream request/release or static allocations with conflict detection.
As far as I know currently the core doesn’t keep any kind of data on what channels are enabled and for what peripheral, I guess that would require an additional library, that keeps that data, and manage allocation and releasing channels.
About the F4 porting, I think the first effort we should probably try is changing to the “tubes” API that leaflabs added to manage the F4 DMA streams. I haven’t used it in the F1, but I have seen code using it, so probably works.
Now, I am of the opinion of changing things one at a time when they are already working, so not to break anything.
Personally I will finish with what I started, adding the part to manage a callback, and repeated transfers. I will add another function to cover changing the source address and length at once as Steve suggested, that will help manage doublebuffering.
If someone can start working on code that manage DMA channels reservations, we can work in parallel. Once I’m finished with the extra functions and they work, I’ll try to see if I can get it working with tubes so it works for the F4.
EDIT: One way could be try to build some table like the PIN_MAP one, that maps the DMA channels with the peripherals that can use them, and add a bool indicating if that peripheral DMA requests have been enabled in that channel.
Is there just a way read back the information from the hardware. From what I recall it was possible to effectively parse the DMA registers to figure out quite a lot of information about their setup.
Is there just a way read back the information from the hardware. From what I recall it was possible to effectively parse the DMA registers to figure out quite a lot of information about their setup.
I do propose that we are using STM terminology for these concepts. In that sense we need to identify that there are only few DMA channels and each of them have multiple streams. The streams in different channels can be active at the same time, but for the streams in a channel only one can be active at any point of time.
I do propose that we are using STM terminology for these concepts. In that sense we need to identify that there are only few DMA channels and each of them have multiple streams. The streams in different channels can be active at the same time, but for the streams in a channel only one can be active at any point of time.
1) DMA Controllers have multiple streams
2) Streams have multiple channels
3) The stream channels are hard-wired to peripheral devices
– stream configuration selects one of the devices
1) DMA Controllers have multiple streams
2) Streams have multiple channels
3) The stream channels are hard-wired to peripheral devices
– stream configuration selects one of the devices
There is another library, DmaSpi, which is the one that has DMA capabilities:
https://github.com/crteensy/DmaSpi
That one works similar to what Steve was working on, it queues transfers with a series of properties (buffer pointer, size, and a pin object to control CS), then services those transfers in order.
Not sure if all that is worth the effort, as Steve tested, for small transfers doesn’t improve performance due to all the overhead, and the libraries using it have to be heavily modified.
There is another library, DmaSpi, which is the one that has DMA capabilities:
https://github.com/crteensy/DmaSpi
That one works similar to what Steve was working on, it queues transfers with a series of properties (buffer pointer, size, and a pin object to control CS), then services those transfers in order.
Not sure if all that is worth the effort, as Steve tested, for small transfers doesn’t improve performance due to all the overhead, and the libraries using it have to be heavily modified.
There is another library, DmaSpi, which is the one that has DMA capabilities:
https://github.com/crteensy/DmaSpi
That one works similar to what Steve was working on, it queues transfers with a series of properties (buffer pointer, size, and a pin object to control CS), then services those transfers in order.
Not sure if all that is worth the effort, as Steve tested, for small transfers doesn’t improve performance due to all the overhead, and the libraries using it have to be heavily modified.
-Set callback functions that will be called when a DMA transfer completed. In case the callbacks are set dmaSend and dmaTransfer are non-blocking.
-Allow to set all the DMA related settings with one function (enable DMA controller, set transfer address, destination, data size, etc), and then a second function to reload the DMA transfer size, which needs to be reloaded before enabling the channel again, since the value is not kept at the end of a transmission. So if the buffer address, data side, are reused, only the second function needs to be called repeatedly.
I have tested the callback with the sdfat library and with an ILI spi display. Now the weird thing:
When using sdfat at spi div/2 speed, and using callbacks, somethings the DMA RX never completed, and leaves either 1 or 2 bytes pending.
So let’s say I want to receive 512bytes. For that the DMA RX is set to 512 bytes, the DR is read if RXNE is set, RX DMA enabled, and next set and enable TX DMA for 512 bytes.
After each byte goes out, one will come in, and the DMA controller reads it from DR, stores it in the RX buffer, and decrements the count of the RX DMA pending requests.
All works fine if I do not use callbacks, and just block until RX is completed. It also works fine if I set the port to 18Mb (DIV/4) while using callbacks.
But if I use callbacks and set the port to DIV/2 speed, then some times the RX never completes. The TX buffer is all sent, and 1 or 2 bytes still pending in RX. Since TX is completed, it’s not producing a clock any more, and RX will never get the last bytes in.
I have run it thru the debugger, and some times it completes several transmissions correctly before one fails, but is a different number of transmissions each time. Some times goes for longer, some for shorter.
I have tried setting the RX DMA priority to very high and the TX to medium, in case the DMA controller was servicing a TX while an RX was pending, which would overwrite the DR register and get the RX byte lost, but that did not help.
Other than setting a callback for Transfer complete event, the DMA setup is exactly the same whether blocking or not, so I can’t figure out what is happening, other than perhaps even when blocking the transfers are not always completing, but since there is a timeout check, some times the transfer is just being terminated on timeout and not because RX actually completed.
I need to test that theory by removing the timeout, but has anyone experienced any issue when receiving data from with the sdfat library and the max spi port speed when using DMA, or noticed any corruption in the data read?
Do you wait in the Tx end callback function for not TXE and BSY?
Do you wait in the Tx end callback function for not TXE and BSY?
You could try to let only the Rx part in DMA mode, the Tx part in “normal” mode to see if Rx bytes are still lost.
You could try to let only the Rx part in DMA mode, the Tx part in “normal” mode to see if Rx bytes are still lost.
When checking figures 246 and 247 overlapped, the one bit (2 APB1 clock) gap between RXNE and TXE should give priority to Rx channel if is set to highest priority (11), please double check that you have set the priorities correctly. DMA for Tx can be set to lowest priority (00).
Furthermore, instead of timeout, you could check the DMA_CNDTRx register value to determine whether there are still bytes to be received when Tx finished.
Does it fail when using only SPI_1 both Tx and Rx DMA with 36MHz clock (SPI2 not working)?
If not, then it is clear a race-condition/limitation of the hardware (bus matrix, AHB system bus and the two bridges to APB1 and APB2 peripheral buses as indicated in figure 2) which cannot handle so many data (DMA and CPU<->RAM) transfer requests within the short time period of one byte transfer at 36MHz, as SPI_1 is served on APB2 and SPI_2 on APB1.
Alternatively you could check what happens when setting the flash wait states down to 1 (CPU at 72MHz) or up to 3.
In addition, you could monitor the MODF and OVR error flags of SPI (enable these IRQs?) and/or TEIF of DMA.
EDIT
Can you please specify more details about how exactly do you test? Do you read blocks of 512 bytes on SPI 1 from SD card and read blocks (how many bytes?) from ILI display repeatedly? Do you use the older SdFat lib or the newer one, SdFat beta? I could maybe test in parallel if you share the testing code.
When checking figures 246 and 247 overlapped, the one bit (2 APB1 clock) gap between RXNE and TXE should give priority to Rx channel if is set to highest priority (11), please double check that you have set the priorities correctly. DMA for Tx can be set to lowest priority (00).
Furthermore, instead of timeout, you could check the DMA_CNDTRx register value to determine whether there are still bytes to be received when Tx finished.
Does it fail when using only SPI_1 both Tx and Rx DMA with 36MHz clock (SPI2 not working)?
If not, then it is clear a race-condition/limitation of the hardware (bus matrix, AHB system bus and the two bridges to APB1 and APB2 peripheral buses as indicated in figure 2) which cannot handle so many data (DMA and CPU<->RAM) transfer requests within the short time period of one byte transfer at 36MHz, as SPI_1 is served on APB2 and SPI_2 on APB1.
Alternatively you could check what happens when setting the flash wait states down to 1 (CPU at 72MHz) or up to 3.
In addition, you could monitor the MODF and OVR error flags of SPI (enable these IRQs?) and/or TEIF of DMA.
EDIT
Can you please specify more details about how exactly do you test? Do you read blocks of 512 bytes on SPI 1 from SD card and read blocks (how many bytes?) from ILI display repeatedly? Do you use the older SdFat lib or the newer one, SdFat beta? I could maybe test in parallel if you share the testing code.
Still, you did not try to run Tx and Rx DMA on SPI 1 with 36MHz and let SPI 2 disabled/inactive. This would give us more information.
Still, you did not try to run Tx and Rx DMA on SPI 1 with 36MHz and let SPI 2 disabled/inactive. This would give us more information.
Because as far as I know, the SdFat uses currently SPI transfers in a blocking way.
Because as far as I know, the SdFat uses currently SPI transfers in a blocking way.
I connected MISO to MOSI in SPI1, and repeatedly sent and receive with DMA to 2 buffers. Then compare the content.
All was going well when I was only using SPI1, with or without callback.
Then I started using SPI2 also (spi2 without DMA), and the problems start. Some bits are not received correctly, and changes a 0 for 1 in the last bit of some transferred bytes.
The sketch run transfers in a loop and compares the result at the end. If there is an error, it stops and wait for user input, then can repeat. If there is not error it repeats the loop without stopping. When errors happen they happen every few passes, but not every single one. The errors also happen on different bytes, not always the same.
EDIT: I have repeated the same test using SPI2 with DMA, and also setting SPI1 to Div4 speed. These are the results:
DIV2 Speed on spi1:
SPI1 Alone Without callback: OK
SPI1 alone with callback: OK
SPI1 with callback, in parallel SPI2 without DMA: errors
SPI1 with callback, in parallel SPI2 with DMA: errors
DIV4 speed on spi1:
SPI1 with callback, in parallel SPI2 without DMA: OK
SPI1 with callback, in parallel SPI2 with DMA: OK
Seems so far that errors only happen when SPI1 and SPI2 are working at the same time, and SPI1 is operating at 36Mhz (over specs).
I’ll try to post all my code to github. I’ll stop with this tests here, since it started having issues when I started writting the callback stuff, and didn’t know if the problem was on that. I’m convinced now the callback code doesn’t have any problem, but the only problem was due to using both SPI ports completely in parallel, which was only allowed by using callbacks to signal the end of transfer.
Conclusion for me: spi1 is not reliable when operating at 36Mhz, specially when using another spi port at the same time. Probably ok for only sending data, or when used alone, or non critical reception.
I tested the functions extensively and the only problem I ever found was when using SPI1 at 36Mbit with DMA, as posted above. I verified problem would happen whether using callback or not.
Now, the version I am posting is not the one I tested. I had to redo it to add the latest changes from Roger, and need to retest it, but I am posting it so more people can test it.
This version does not use the dmatubes, although I have another version that does, which I wrote for the F4 support, but Steve is the one working the most in the F4 and his core doesn’t use the dma tubes, so I see no point using that.
I’ll try to test it as soon as I can and post back, if someone finds any problem let me know.
- SPI.zip
- (10.42 KiB) Downloaded 14 times
first try to build:
C:\Users\S\Documents\Arduino\hardware\Arduino_STM32\STM32F1\libraries\SPI\src\SPI.cpp:433:26: error: 'class SPISettings' has no member named 'receiveCallback'
if (_currentSetting->receiveCallback){
^
C:\Users\S\Documents\Arduino\hardware\Arduino_STM32\STM32F1\libraries\SPI\src\SPI.cpp: At global scope:
C:\Users\S\Documents\Arduino\hardware\Arduino_STM32\STM32F1\libraries\SPI\src\SPI.cpp:463:7: error: prototype for 'uint8 SPIClass::dmaTransfer(uint8*, uint8*, uint16)' does not match any in class 'SPIClass'
uint8 SPIClass::dmaTransfer(uint8 *transmitBuf, uint8 *receiveBuf, uint16 length) {
^
In file included from C:\Users\S\Documents\Arduino\hardware\Arduino_STM32\STM32F1\libraries\SPI\src\SPI.cpp:32:0:
C:\Users\S\Documents\Arduino\hardware\Arduino_STM32\STM32F1\libraries\SPI\src\SPI.h:306:8: error: candidate is: uint8 SPIClass::dmaTransfer(void*, void*, uint16)
uint8 dmaTransfer(void * transmitBuf, void * receiveBuf, uint16 length);
^
C:\Users\S\Documents\Arduino\hardware\Arduino_STM32\STM32F1\libraries\SPI\src\SPI.cpp: In member function 'uint8 SPIClass::dmaSendRepeat(uint16)':
C:\Users\S\Documents\Arduino\hardware\Arduino_STM32\STM32F1\libraries\SPI\src\SPI.cpp:499:26: error: 'class SPISettings' has no member named 'transmitCallback'
if (_currentSetting->transmitCallback)
^
C:\Users\S\Documents\Arduino\hardware\Arduino_STM32\STM32F1\libraries\SPI\src\SPI.cpp: At global scope:
C:\Users\S\Documents\Arduino\hardware\Arduino_STM32\STM32F1\libraries\SPI\src\SPI.cpp:563:2: error: expected unqualified-id before '/' token
*/
^
C:\Users\S\Documents\Arduino\hardware\Arduino_STM32\STM32F1\libraries\SPI\src\SPI.cpp:563:2: error: expected constructor, destructor, or type conversion before '/' token
I did not test the new functions, only the old (original) DMA ones, they seem to work.
[stevestrong – Fri Jun 23, 2017 8:50 pm] –
Made some necessary changes to be able to build it.
I did not test the new functions, only the old (original) DMA ones, they seem to work.
Thanks Steve, I just finished compiling it and found similar errors and corrected them too.
I’m uploading the new. It’s pretty much the same changes you had to do, except in TransferSet and SendSet I changed it to void * rather than the uint8 * I was using before (my original code was based off an older version of the core).
I also did not use a typedef for the function pointers, but I see you did. I would imagine you added to make the code clearer to read, but you think is necessary and does not actually makes more difficult to read the code by having to check what that type is? or did you use it for some other reason?
Here is the new version that compiles right. Like I said pretty close to Steve’s except for those differences. I still need to test running my previous test sketch with it, which uses the callbacks. I do not have any code to test the new async function, I hope I didn’t break it.
- SPI.zip
- (10.44 KiB) Downloaded 19 times
For my test I used my old wav player test code, it uses FreeRTOS 900. DMA is running 3 peripherals, SPI1, SPI2, and a timer to produce the PWM output.
I modified sdFat so it set’s a callback function and after dmaSend or dmaTransfer it sets the task to block until released by the ISR, to the RTOS changes to the next task and keeps the cpu busy until the SPI DMA transfer is over. the ISR causes a new context switch upon exit and RTOS returns back to the task that was reading from the sdcard.
I have tested it with the display too, but just introduces jitter since the display task only writes a few bytes at a time, and all the context switching just wastes any cpu time that could be used for anything else. So in the display it works but doesn’t provide any performance gain.
These are my changes to the SdSpiSTM32F1.cpp file in the sdfat library in case anyone is interested in testing:
First include the RTOS (9.0 in my case, should work with 8.2.1 too):
#include <MapleFreeRTOS900.h>
[danieleff – Tue Jun 27, 2017 4:42 am] –
Instead of all of this, wouldn’t it be enough to add `yield()` to the while loop that waits for the DMA to finish? That will do a context switch for you while waiting.
Daniel I’m not sure I understand your suggestion, so correct me if I’m wrong, but you suggest to use FreeRTOS taskYIELD() is that right?
That wouldn’t work for 2 reasons.
1.- It does a yield only if a higher priority task is waiting to execute. If the task doing the sdfat access at the moment is the highest one ready to execute, it will not yield and continue running, so you effectively didn’t yield to another task to run while DMA is ongoing.
2.- If we just yielded, it will return to this task at a moment that is not synched with the DMA transfer, we still needs to know if the DMA is completed before continuing with the sdfat code. Unless you use some sort of semaphore and check for it, or you have to be polling the DMA controller to check if it’s done. The first case is exactly what’s implemented, only with task notifications that according to the FreeRTOS docs cost less cycles and RAM than a semaphore. If we did the second case, poll the DMA controller, then we are wasting cycles just polling. We already have that in the library when we run in blocking mode.
By using the task notification we achieve 2 things. First the task will yield to the next one to run, even if the ones available are lower priority. And second as soon as the Callback function is called because the DMA transfer is over, it will request a yield to the RTOS. If the task executing has the same or lower priority, it will go to the sdfat one that was blocked. If the task currently running is higher priority, then it will mark this one as ready to run and will run next time it can according to priorities etc.
If you meant with the yield() is something else please let me know. I know there is some yield() funtion in the sdfat library but I understood that was to put the cpu in sleep mode until the next interrupt, in which case it will not execute another task.
Finally this is just a usage example because I had this FreeRTOS sketch, but there is no need to use any RTOS, since the callback is declared by the user code it could be used to set a variable, change a pin level etc.
The SPI zip seems to work OK for me, but I’m not using any callbacks.
I used the code with my OV7670 camera testbed and it was fine.
But it doesnt include the dmaSendAsync
I can manually merge that new function into your code and then check my local copy back into github, but it would not be as clean as you pulling the latest repo into you github repo and then submitting a PR based on the changes.
So if you want to do that, I’ll merge the PR asap e.g. tomorrow
Thanks
Roger
PS. As you can see from my PM, I’ve merged you other PR’s
[RogerClark – Mon Jul 03, 2017 5:53 am] –
VictorThe SPI zip seems to work OK for me, but I’m not using any callbacks.
I used the code with my OV7670 camera testbed and it was fine.
But it doesnt include the dmaSendAsync
I can manually merge that new function into your code and then check my local copy back into github, but it would not be as clean as you pulling the latest repo into you github repo and then submitting a PR based on the changes.
So if you want to do that, I’ll merge the PR asap e.g. tomorrow
Thanks
Roger
PS. As you can see from my PM, I’ve merged you other PR’s
My version was redone taking your latest as the base, so it has sendAsync too.
I’ll sent a PR with those 2 files.
https://github.com/rogerclarkmelbourne/ … I.cpp#L399
I would change it like this:
uint32_t flags = (DMA_MINC_MODE | DMA_FROM_MEM);
if (!transmitBuf) {
transmitBuf = &ff;
flags &= ~DMA_MINC_MODE;
}
dma_setup_transfer(_currentSetting->spiDmaDev, _currentSetting->spiTxDmaChannel, &_currentSetting->spi_d->regs->DR, dma_bit_size,
transmitBuf, dma_bit_size, flags);
[stevestrong – Mon Jul 31, 2017 11:37 am] –
https://github.com/rogerclarkmelbourne/ … I.cpp#L399I would change it like this:
uint32_t flags = (DMA_MINC_MODE | DMA_FROM_MEM);
if (!transmitBuf) {
transmitBuf = &ff;
flags &= ~DMA_MINC_MODE;
}
dma_setup_transfer(_currentSetting->spiDmaDev, _currentSetting->spiTxDmaChannel, &_currentSetting->spi_d->regs->DR, dma_bit_size,
transmitBuf, dma_bit_size, flags);
[victor_pv – Mon Jul 31, 2017 12:31 pm] –
Could you point to the lines you are referring to?
These two lines move to line 420.
It makes sense to clear them before any further request is launched.
And it would be nice to have them readable after a transfer to check on an upper software level whether any error flags have been set or not.
[stevestrong – Mon Jul 31, 2017 12:36 pm] –[victor_pv – Mon Jul 31, 2017 12:31 pm] –
Could you point to the lines you are referring to?These two lines move to line 420.
It makes sense to clear them before any further request is launched.
And it would be nice to have them readable after a transfer to check on an upper software level whether any error flags have been set or not.
The problem would that in a situation like this:
You run a dmaTransfer. (the ISR bits are not cleared)
You go doing something with that DMA channel and enable IRQ requests. Since the interrupt request is still set, I think an interrupt will trigger right away.
I believe that’s the way they work from what I remember from the reference manual.
Note: Before setting an Enable control bit to ‘1’, the corresponding event flag should be cleared,
otherwise an interrupt is immediately generated.Not sure why I would have moved them there, can’t remember a specific reason, so we can test and confirm is all good, in case I faced a problem that made me move them.
If you are making the change as part of the code clean up, let me know once you have the PR and I will run a test with my sketch that uses then with callbacks.
EDIT:
Just a note, when using ISRs with DMA, the core dma irq handler clears them, right after calling the user dma handler:
https://github.com/rogerclarkmelbourne/ … vate.h#L45
So when using callbacks they will be cleared at the end of the transfer, but if the user code handler wants to check them, that can be done since they are not cleared yet, but on return.
While you are looking at the SPI DMA stuff, I see that the OV7670 camera sketch that will use the ILI9341 as high speed RAM to store an image..
Would benefit from a function that does a dma Read, but does not care what is transmitted.
At the moment the best way to do this would be to use dmaTransfer, and perhaps point the TX buffer at some arbitary piece of flash (as in this case I don’t know if it matters what is sent – though I’d need to double check….)
But ideally, something like the dmaSend which only receives and perhaps sends a specific value (passed to the function)
I’m not sure how complex it would be to write something like this, but I presume it would be modified copy of dmaSend
1.-We add a dmaRead function, that just sends FF repeatedly and reads to a buffer.
2.-We modify dmaTransfer to detect a Null value in the send bufffer, and if so, it sends FF repeatedly while it reads to the receive buffer.
3.-We add a MINC variable like we did for dmaSend. In case MINC is 0, it doesn’t increment the buffer for the TX channel, and so sends the first byte repeately.
The advantage on option 3 is that it allows to send an arbitrary value. Option 2 just adds a check to dmaTransfer, so should keep the code a bit shorter if both the normal transfer and the transfer with a null buffer are used in the same sketch, but should have a small performance impact from the check. Should not be much for a dmaTransfer though.
And option1 has the advantage of more closely resembling the current SPI.read(&buf, n) function, taking the same parameters and working in the same way except for using DMA, but is not as flexible as option 3.
We could also do a complete new function that works differently than the current dma ones and read(), and instead takes both a value to send repeatedly, and a receive buffer to read the data to. Something like:
dmaRead (uint8 tx_val, uint8 *rx_buf, uint8 n){}
I think we would need to have a way to repeatedly send whatever data, as I’m not sure if devices like the ILI9341 need 0x00 or 0xff when data is being read.
I suspect the ILI9341 should not care what its being sent when data is being read, but I’d need to confirm that, and its safer to be able to specify the byte that is sent when reading.
So the option which uses MINC looks like it may be the best solution
current dmaTransfer:
uint8 dmaTransfer(void * transmitBuf, void * receiveBuf, uint16 length);
[RogerClark – Mon Jul 31, 2017 10:30 pm] –
Would benefit from a function that does a dma Read, but does not care what is transmitted.
Roger see my post above. The dmaTransfer function can already do that if the transmitBuf parameter is passed as 0. In that case it sends FF while receiving to the rx buffer.
We will add the MINC feature so the value sent can be selected by the user with a variable.
I will need to test that, when I have time, on a ili9341 display

