SPI on F4

ag123
Fri May 05, 2017 3:12 pm
out of sheer curiosity i took a peek at the SPI section in RM0009 reference manual for F4
http://www.st.com/content/ccc/resource/ … 031020.pdf
vs RM0008 reference manual for F1*
http://www.st.com/content/ccc/resource/ … 171190.pdf

there a a lot of parts that looked literally ‘identical’, how i hoped pdf viewers can do a ‘diff’ and just tell me what are the differences :D

i noted the following as 1 of the difference:
STM32F1 RM0008 p710 25.3.3 Configuring the SPI in master mode
25.3.3 Configuring the SPI in master mode
Procedure
1. Select the BR[2:0] bits to define the serial clock baud rate (see SPI_CR1 register).
2. Select the CPOL and CPHA bits to define one of the four relationships between the data transfer and the serial clock (see Figure 239).
3. Set the DFF bit to define 8- or 16-bit data frame format
4. Configure the LSBFIRST bit in the SPI_CR1 register to define the frame format.
5. If the NSS pin is required in input mode, in hardware mode, connect the NSS pin to a high-level signal during the complete byte transmit sequence. In NSS software mode, set the SSM and SSI bits in the SPI_CR1 register. If the NSS pin is required in output
mode, the SSOE bit only should be set.
6. The MSTR and SPE bits must be set (they remain set only if the NSS pin is connected
to a high-level signal). In this configuration the MOSI pin is a data output and the MISO pin is a data input.

STM32F1 RM0009 p887 28.3.3. Configuring the SPI in master mode

28.3.3. Configuring the SPI in master mode
In the master configuration, the serial clock is generated on the SCK pin.
Procedure
1. Select the BR[2:0] bits to define the serial clock baud rate (see SPI_CR1 register).
2. Select the CPOL and CPHA bits to define one of the four relationships between the data transfer and the serial clock (see Figure 248). This step is not required when the TI mode is selected.
3. Set the DFF bit to define 8- or 16-bit data frame format
4. Configure the LSBFIRST bit in the SPI_CR1 register to define the frame format. This step is not required when the TI mode is selected.
5. If the NSS pin is required in input mode, in hardware mode, connect the NSS pin to a high-level signal during the complete byte transmit sequence. In NSS software mode, set the SSM and SSI bits in the SPI_CR1 register. If the NSS pin is required in output mode, the SSOE bit only should be set. This step is not required when the TI mode is selected.
6. Set the FRF bit in SPI_CR2 to select the TI protocol for serial communications.
7. The MSTR and SPE bits must be set (they remain set only if the NSS pin is connected to a high-level signal).
configuration the MOSI pin is a data output and the MISO pin is a data input.

while this is unlikely to be the only difference what seem rather distinct is this addition of the ‘TI mode’
a timing diagram is given on p887 RM0009 SPI TI protocol in master mode
apparently the ‘TI mode’ seemed to have things like ‘triggering edge’ and ‘sampling edge’ and it seem to start with single NSS high pulse sequence
the other ‘strange’ thing about this P887 RM0009 SPI TI protocol timing diagram is it seem to label MOSI as ‘input’ while MISO as ‘output’ this doesn’t seem to match the notions of how we’d usually do SPI since stm32 is the ‘master’ (seemed more like an error in the diagram)
my thoughts are in step (6) above, we’d need to *switch off* TI mode to get the ‘regular’ SPI we’d expect
in particular if we do use the NSS pin as a custom /CS pin for the SPI devices (i always do that so as not to ‘waste’) pins :lol:
but then the default for SPI_CR2 FRF at reset seem to be 0 motorola mode (rather than 1 TI mode)

just 2 cents


ag123
Fri May 05, 2017 3:37 pm
nope, this doesn’t seem like a meaningful difference since TI mode is off by default

stm32 also use the same boundary addresses and register addresses for SPI1 and SPI2, SPI3
0x4001 3000 – 0x4001 33FF SPI1
0x4000 3800 – 0x4000 3BFF SPI2/I2S
0x4000 3C00 – 0x4000 3FFF SPI3 / I2S3


stevestrong
Fri May 05, 2017 3:38 pm
ag123 wrote:the default for SPI_CR2 FRF at reset seem to be 0 motorola mode (rather than 1 TI mode)

ag123
Fri May 05, 2017 3:44 pm
i’m starting to suspect if it may be something related to the AHB clocks / prescalers or APB clocks / prescalers
as after all the F4 runs at 168 mhz while an F1 runs at 72 mhz, lots of things to hunt down to figure out differences esp if things don’t work

the TI mode it seemed is that additional use of the NSS pin which based on the RM is default off and motorola mode is used instead

another thing which we may need to check may be to see if a particular peripheral bus is clocked.

found a web that seemed useful on the topic of clocks
https://stm32f4-discovery.net/2015/01/p … x-devices/
settings seem to match that used on steves F4 black branch, the PLL multipliers seemed similar
https://github.com/stevstrong/Arduino_S … cF4.c#L431
void SetupClock168MHz() {
...
// save bus clock values
rcc_dev_clk_speed_table[RCC_AHB1] = (SystemCoreClock/1);
rcc_dev_clk_speed_table[RCC_APB2] = (SystemCoreClock/2);
rcc_dev_clk_speed_table[RCC_APB1] = (SystemCoreClock/4);


stevestrong
Fri May 05, 2017 6:11 pm
Good news, it started to work with libmaple code, the scope shows the SPI_CLK pulses (SPI3), and it does not freezes anymore.
Now need to determine which of my changes made it.
More to come.

michael_l
Fri May 05, 2017 8:09 pm
stevestrong wrote:Good news, it started to work with libmaple code, the scope shows the SPI_CLK pulses (SPI3), and it does not freezes anymore.
Now need to determine which of my changes made it.
More to come.

stevestrong
Sun May 07, 2017 3:22 pm
The actual bench values using SdFat beta, SPI3 21MHz, no DMA, card: 16GB SanDisk ultra class 10:
Type any character to start
FreeStack: 117044
Type is FAT32
Card size: 15.93 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3
OEM ID: SD
Product: SL16G
Version: 8.0
Serial number: 0X72F2CA43
Manufacturing date: 8/2015

File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
375.32,122150,759,1363
387.66,18954,1021,1319

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
945.83,1574,529,540
946.01,1071,529,540

Done


ag123
Sun May 07, 2017 4:41 pm
+5 good achievement :D

writing speed seem to have reached sd native limiting speed, for read speed, my thoughts are that if a bigger buffer is used e.g. 1k it may narrow the difference so much that DMA would not make too much difference


stevestrong
Sun May 07, 2017 5:39 pm
It seems that a buffer size of 4096 or 8182 is the best option if RAM is tidy (see table of the next post).

stevestrong
Sun May 07, 2017 10:20 pm
I have completed the table with bench results using SPI = 42MHz, all without DMA.

stm32_sd_performance.jpg
stm32_sd_performance.jpg (162.89 KiB) Viewed 921 times

ag123
Mon May 08, 2017 8:46 am
the improvement in write speeds is really dramatic going up to 4k, 8k :D

stevestrong
Mon May 08, 2017 8:50 am
Yes, if you reserve 32k buffer for it…
But even with 8k buffer it gives a nice speed.

Pito
Mon May 08, 2017 8:56 am
SdBench with F4 must give you ~4.5MB/sec rd/wr @42MHz SPI clock and SdFatEX with 512bytes large buffer and DMA on..
SdBench with F1 must give you ~3.5MB/sec rd/wr @36MHz SPI clock and SdFatEX with 512bytes large buffer and DMA on..

stevestrong
Mon May 08, 2017 9:02 am
Pito, the Ex didn’t bring any improvement.

I don’t see any way how to increase 10 times the rd/wr speed…
Can you hint to any solution?


Pito
Mon May 08, 2017 9:05 am
You must switch the SdFatEX on in SdFatConfig.h.
Again, the above results are real..

stevestrong
Mon May 08, 2017 9:07 am
Can you please post your test sketch + used core + SdFat lib?

Oh, and send me a similar card you are using :)
Btw, which color is your Sammy card? red or yellow/orange?


Pito
Mon May 08, 2017 9:23 am
SdBench from latest SdFat is the sketch.
The cards I have used are:
1. Samsung EVO 8GB, CL10 UHS-I (white/orange) – works 21, 36, 42MHz
2. Sandisk Ultra 16GB, CL10 (red/grey) – works 21, 36MHz
If you do not get such results (plus minus 0.2MB/sec) then it is something wrong with your setup/settings/SPI_DMA driver..
The same results I get with Daniel’s core, and with libmaple.
With SdFatEX use 512bytes large buffer.

stevestrong
Mon May 08, 2017 9:24 am
That’s why I’m asking you to post your setup, to be able to reproduce your results.
You can also send me BIN files for blue pill and/or maple mini and/or black F4 + indicate the used SPI and CS pins for each board in part.

Pito
Mon May 08, 2017 9:45 am
Try – MapleMini, cs=PB12, SPI1, 36MHz, USB serial, bootloader20, SdFatEX, DMA, 512bytes large buffer

You have to get something like:
File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
3533.34,25083,136,143
3594.74,15522,136,141

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
3257.12,1548,155,156
3258.89,1359,155,156

Done
Type any character to start


stevestrong
Mon May 08, 2017 11:33 am
What is the reason not to use F407 black SPI1 with 42MHz?

Pito
Mon May 08, 2017 11:45 am
Try with 21, and when ok you will get 42 :)
FYI – the min Latency is the time you need to transfer 512bytes to the Sdcard. From above results:
136us/512/8 = 33.2ns -> 30MHz
207us/512/8 = 50.5ns -> 19.8MHz

stevestrong
Mon May 08, 2017 4:07 pm
Pito, I flashed the F4 bench bin to my black F407VE board with STLink utility from 0x80000000, but have no reaction on the serial 1 interface (PA9/10).

I also observed that the F1 bench BIN is 38kB large, while the F4 BIN is 28kB large (smaller than the F1 bin). Is this really how it should be?

EDIT
Flashing the MM over DFU:
>maple_upload.bat COM3 2 1EAF:0003 sdbench_f1.bin
maple_loader v0.1
Resetting to bootloader via DTR pulse
Reset via USB Serial Failed! Did you select the right serial port?
Assuming the board is in perpetual bootloader mode and continuing to attempt dfu programming...

Searching for DFU device [1EAF:0003]...
Found it!

Opening USB Device 0x1eaf:0x0003...
Found Runtime: [0x1eaf:0x0003] devnum=1, cfg=0, intf=0, alt=2, name="STM32duino bootloader v1.0 Upload to Flash 0x8002000"
Setting Configuration 1...
Claiming USB DFU Interface...
Setting Alternate Setting ...
Determining device status: state = dfuIDLE, status = 0
dfuIDLE, continuing
Transfer Size = 0x0400
bytes_per_hash=761
Starting download: [##################################################] finished!
state(8) = dfuMANIFEST-WAIT-RESET, status(0) = No error condition is present
error resetting after download: usb_reset: could not reset device, win error: The system cannot find the file specified.
Done!

Resetting USB to switch back to runtime mode


victor_pv
Mon May 08, 2017 4:55 pm
stevestrong wrote:
I also observed that the F1 bench BIN is 38kB large, while the F4 BIN is 28kB large (smaller than the F1 bin). Is this really how it should be?

Pito
Mon May 08, 2017 4:56 pm
The F4 bin I see here is 28kB. The Serial1 is set, and I can see the ouptut as above on PA9/PA10, 115k2 8N1..
I recompiled it again with
File: SDBench.bin
CRC-32: eb38f0d4
MD4: 4d43be8aa4f0f8ad05d059f6e981fad2
MD5: 5aec1c4acfb959f8c81e480981836786
SHA-1: e7b93691e776f2df4012b1713380df174948b589

SDBench.rar
(18.25 KiB) Downloaded 17 times

stevestrong
Mon May 08, 2017 5:00 pm
Hm, same failure with the new bin, too:
nothing on serial 1…

Sh…t, it was a loose wire, sorry.

Which one is the CS pin? PB6?
FreeStack: 128032
Can't access SD card. Do not reformat.
No card, wrong chip select pin, or SPI problem?
SD errorCode: 0X20,0XFF


stevestrong
Mon May 08, 2017 5:09 pm
On PB6 i get a pulse train (clock signal) with ~330kHz…

Pito
Mon May 08, 2017 5:16 pm
PB6 is the card’s CS. The SPI is PB3,4,5.

stevestrong
Mon May 08, 2017 5:19 pm
PB3,4,5 is the alternative SPI1 pin group.
On PB6, as said, i get the clock signal..

Where is PB4 on the black board?


Pito
Mon May 08, 2017 5:22 pm
You are asking me? I do not have your Black VE board, my board is ZET.

Pito
Mon May 08, 2017 5:50 pm
This for F1 (here 103ZET) SPI on PA5,6,7 and Cs=PC4, 36MHz, serial 115k2 on PA9,10, maplebootloader20. The previous version was build with libmaple, this is with Daniel’s core. It does not wait on char, it runs in a loop.

SDBench_F1.rar
(17.86 KiB) Downloaded 15 times

stevestrong
Mon May 08, 2017 6:22 pm
OK, back to F4, I can confirm the results, successful with the first F4 bin from Pito:
File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2397.93,19155,207,212
2399.08,16532,207,212
2412.97,15940,207,211

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2446.03,1590,208,208
2447.22,972,208,208
2447.22,972,208,208

Done


stevestrong
Mon May 08, 2017 10:15 pm
It seems that I used wrongly the SdFatEx functionality til now – sorry Pito for misleading information from my side :oops:
I realized that it is not enough to set in SdFatConfig.h,
#define ENABLE_EXTENDED_TRANSFER_CLASS 1

stevestrong
Wed May 10, 2017 5:15 pm
Hot news :!:

Bench for Black F4 @ 168MHz, SPI1 @ 21MHZ, with DMA, SdFatEX, Sandisk Ultra 16GB, CL10 (red/grey):
File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2438.87,13975,204,208
2446.03,14013,204,208

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2499.84,1448,203,204
2501.09,909,203,204


Pito
Fri May 12, 2017 5:44 pm
FYI – Blue F103ZET, Sandisk Ultra 16GB red/gray, SPI @44MHz (88MHz fcpu clock), the latest SPI from victor, SdFatEX, short wires <10cm.
File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
4393.39,7474,112,114
4328.73,10052,112,116
4378.00,7742,112,115

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
3887.78,975,129,130
3887.78,836,129,130
3890.80,835,129,130


stevestrong
Fri May 12, 2017 6:10 pm
Which “latest SPI from Victor” do you mean? The generic version?

Your F1 results look very close to be proportional to my ones, which were done at 36MHz:
44MHz/36MHz = 1.222
WR speed: (my result) 3.7 * 1.22 = 4.5, your result is ~4.4
RD speeed: (my result) 3.38 * 1.22 = 4.1, your result is ~3.9

Similar proportionality is achieved for F4 @ 42MHz WR speed: ~4.6.
I find it interesting that the F4 RD result is not proportional anymore, ~4.8 being ~20% higher than F1 performs using the same card, although the SPI code for F4 is very similar to F1 code.
I only can think that the difference is made by the CPU clock.


Pito
Fri May 12, 2017 6:13 pm
Victor’s STM32Generic one. Still thinking why the reads with F1 are so slow..

stevestrong
Fri May 12, 2017 6:20 pm
The RD speed is higher than the WR speed also @ 21MHz for F4 (which, btw, looks logical for me)…
So it seems to be a general difference/issue somewhere.

Pito
Fri May 12, 2017 6:35 pm
Interestingly I cannot get the Samsung Evo working at F407 42MHz with the latest SPI generic from Victor. In the past it worked with his generic core at 42MHz, as I did a lot of testing. Now the both Samsung and Sandisk throws errors at 42MHz.. :?

stevestrong
Fri May 12, 2017 6:53 pm
You could eventually try the BIN file I attached in one of the previous posts (SPI1: PA4..7, USB serial).

stevestrong
Sat May 13, 2017 12:19 pm
Pito wrote:Still thinking why the reads with F1 are so slow..

ag123
Sat May 13, 2017 1:23 pm
f1 on APB2 runs at 72 mhz hence SPI would deliver 36 mhz if that’s the case, while on f4 it is 42 mhz and 21 mhz on the spi bus that would seem a little ‘strange’ in the sense that it implies sd cards would run faster on f1. the ‘secret’ may be that ‘ART’ accelerator which possibly make some cycles zero wait states on F4 :lol:

Leave a Reply

Your email address will not be published. Required fields are marked *