PR #184 – STM32F1 USART Improvements

edogaldo
Sun Jul 10, 2016 10:49 am
Dear all,
I’d like to draw attention to subject PR (https://github.com/rogerclarkmelbourne/ … 2/pull/184) which provides following USART improvements in the F1 library:

  • Buffered interrupt-based TX: as most of you know F1 library is using a polling TX approach which has very slow performance (even worse than in the UNO platform), buffered interrupt-based TX significantly increase the TX performance
  • Runtime level RX/TX dynamic buffer allocation: generally Arduino based libraries use a static buffer allocation approach, this implies allocating buffers for all devices even if they are not used; runtime level dynamic buffer allocation allows instead allocating buffer for the only devices effectively used and also allows users to easily change the buffers size as preferred
  • Introduction of non-blocking TX: this feature, taken from the F4 USBSerial library, allow users to specify they don’t need blocking TX; difference between blocking TX and non-blocking TX is like difference between TCP and UDP protocols: in the first case the system will engage to guarantee that the full message is sent, in the second case the system will send as much of the message as possible (this depends on the TX buffer size) and will discard the remaining part

Well, since this PR is introducing somewhat significant changes in the repository, its inclosure will probably depend on how much consensum it will get so this thread is to:

  • invite users to actively test the changes and provide feedbacks
  • evaluate the forum members interest on the improvements proposed

Thanks in advance and best regards, E.

PS: just want also to highlight that this PR doesn’t aim to remove any of the existing functionalities (like RX buffering) and that effort has been spent to guarantee backward compatibility


edogaldo
Mon Jul 11, 2016 9:24 pm
Quoting @mrburnette from: viewtopic.php?f=3&t=1189&p=15811#p15808
How thoroughly is your serial code tested? Any test cases that fail as the F1xx is currently constructed that pass flawlessly with your changes? In lieu of test code, how would you propose your changes to be tested? Are there any dependencies introduced in your proposal (limitations, reserved resources, known bombs)?
I’m using this test sketch:
//#define Serial Serial1 // TX A2

#include <libmaple/scb.h>
byte nodeID = 10;
byte packetNum = 11;
float C = 42.611;
float F = 42.611;
float H = 42.6;
float vBatt = 3.14;
int packetLoss = 10;
int packetTotal = 10000;

#define tx_buf_start 1
#define tx_buf_stop 300
uint16 tx_buf_size=tx_buf_start;

void printData() {
Serial.print(nodeID);
Serial.print(",");
Serial.print(packetNum);
Serial.print(",");
Serial.print(C);
Serial.print(",");
Serial.print(F);
Serial.print(",");
Serial.print(H);
Serial.print(",");
Serial.print(vBatt,3);
Serial.print(",");
Serial.print(packetLoss);
Serial.print(",");
Serial.println(packetTotal);
//Serial.println(SCB_BASE->VTOR,HEX);
//Serial.println("abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789");
}

void setup() {
//Serial.begin(57600);
}

void loop() {
//Serial.beginNew(9600,tx_buf_size);
//Serial.beginNew(57600,tx_buf_size);
Serial.beginNew(115200,tx_buf_size);
//Serial.disableBlockingTx();
unsigned long start = micros();
printData();
unsigned long end = micros();
Serial.print("tx_buf_size[");Serial.print(tx_buf_size);Serial.print("]: ");Serial.print(end - start);Serial.println(" us");
Serial.println();
tx_buf_size *= 2;
if(tx_buf_size > tx_buf_stop){
tx_buf_size = tx_buf_start;
//Serial.println("=================");
Serial.println("@@@@@@@@@@@@@@@@@@@");
Serial.println();
}
delay(1000);
}


Rick Kimball
Mon Jul 11, 2016 10:00 pm
If you have an interest in the implementation of the usart code, please look the pull request and comment on github.

It seems like with such significant changes, there needs to be a test sketch that thoroughly exercises all the edge cases for all the SerialX ports and does some kind of automated verification maybe using the serial ports wired to each other.

I personally would prefer that the ring_buffer implementation was left unchanged.

-rick


edogaldo
Mon Jul 11, 2016 10:10 pm
Rick Kimball wrote:(…)
I personally would prefer that the ring_buffer implementation was left unchanged.

edogaldo
Mon Jul 11, 2016 10:26 pm
It seems like with such significant changes, there needs to be a test sketch that thoroughly exercises all the edge cases for all the SerialX ports and does some kind of automated verification maybe using the serial ports wired to each other.
You are right but I don’t think it can be thoroughly tested with just one sketch, this is why I asked for support in testing and providing feedbacks.
The more people tests it, the more issues can be discovered.

Anyway referred to issues, just want to say the tests should first of all focus to regressions, that mean things that in the original implementation worked fine and in the new implementation don’t work anymore.
It’s not aim of this PR to fix possible existing issues (at least not now).


mrburnette
Mon Jul 11, 2016 11:31 pm
Assuming users want to define a small buffer, 1 byte can matter: i.e. on an overall 8 bytes buffer, 1 byte is 12,5%..

Ok, this is simply uncalled for, IMO, and is borderline contempt of our collective intelligence. We can do the math.

Ray

Edited:
It’s not aim of this PR to fix possible existing issues (at least not now).

I come from a rather strict work ethic (supervision of 10 Java coders in a $600M software capital investment in a message queing environment) and I can state that (significant) issues found in unit testing based on prior code were corrected prior to an enhancement. Non-significant issues caught in unit testing were incorporated into the enhancement. Non-significant here is meant to imply no budget or implemention date slippage. Non-significant issues caught in unit testing which did cause budget/date slippage were prioritized as hot-fixes to be applied with a change request window from the BU customer.

You make it sound like you only want your stuff injected into the core and anything else is someone’s elses problem. Having had a few discussions with you, I really do not think this is what you mean.

If we find something in testing, we will democratically decide how and when it should be fixed. So far, there has not been a shortage of talent to work issues… although Roger often over volunteers himself, IMO.


Rick Kimball
Mon Jul 11, 2016 11:42 pm
I would prefer to see this pull request split into 2 pull requests. One that deals with the failing of our current implementation to use interrupts and a buffered tx method. The second to address the ability to dynamically change the the size of the rx and tx buffer. I can imagine I would have no objections to the first one and lots of problems with the second.

-rick


mrburnette
Mon Jul 11, 2016 11:55 pm
Rick Kimball wrote:I would prefer to see this pull request split into 2 pull requests. One that deals with the failing of our current implementation to use interrupts and a buffered tx method. The second to address the ability to dynamically change the the size of the rx and tx buffer. I can imagine I would have no objections to the first one and lots of problems with the second.

-rick


mrburnette
Tue Jul 12, 2016 12:03 am
Rick Kimball wrote:
<…> all the edge cases for all the SerialX ports and does some kind of automated verification maybe using the serial ports wired to each other.
<…>
-rick

edogaldo
Tue Jul 12, 2016 7:40 am
mrburnette wrote:
You make it sound like you only want your stuff injected into the core and anything else is someone’s elses problem. Having had a few discussions with you, I really do not think this is what you mean.

If we find something in testing, we will democratically decide how and when it should be fixed. So far, there has not been a shortage of talent to work issues… although Roger often over volunteers himself, IMO.


mrburnette
Tue Jul 12, 2016 4:33 pm
FYI – Only: Code bloat
Sketch is a modified sketch rewritten to use USB serial to display all the Maple Mini’s Ax values.

Original STM32F1 code:
Sketch uses 14,148 bytes (11%) of program storage space. Maximum is 122,880 bytes.
Global variables use 2,560 bytes of dynamic memory.

Overlaying the contents of Buffered_InterruptBased_USART_TX__OK.ZIP into my existing STM32F1 directory and recompiling.
New STM32F1 serial interrupt changes:
Sketch uses 14,604 bytes (11%) of program storage space. Maximum is 122,880 bytes.
Global variables use 2,808 bytes of dynamic memory.

Observation:
Other than code-bloat for a USB-only sketch, the output of the Maple Mini over the USB-serial worked well. No issues, errors, warnings.

Sketch:
/*
hacked by Ray Burnette to loop through all analog values
Board view: http://letsmakerobots.com/files/maplemini2.jpg
Arduino 1.7.8 on Linux Mint 17.3 fully patched
Sketch uses 13,948 bytes (12%) of program storage space. Maximum is 110,592 bytes.
Global variables use 2,560 bytes of dynamic memory.
*/

int aValue ;
int settlingmS = 5000;

void setup() {
// Configure the ADC pins
for (int x =3; x < 12; x++) {
pinMode(x, INPUT_ANALOG);
}
delay(settlingmS);
}

void loop() {
for ( int analogInPin = 3; analogInPin < 12; analogInPin++)
{
// read the analog in value:
aValue = analogRead(analogInPin);
// print the results to the serial monitor:
Serial.print("Analog pin#");
Serial.print(analogInPin);
Serial.print("\t = ");
Serial.println(aValue);
}
delay(settlingmS);
}


edogaldo
Tue Jul 12, 2016 4:57 pm
Overlaying the contents of Buffered_InterruptBased_USART_TX__OK.ZIP into my existing STM32F1 directory and recompiling
Well, this version uses static TX buffer implementation (same as RX buffer), I guess this explains the memory bloating..

[EDIT] Then there is interrupt handler extension and few more code lines to init the new buffer and use it which would explain the code bloating.


Rick Kimball
Tue Jul 12, 2016 9:41 pm
edogaldo wrote:mrburnette wrote:
Also about dynamic buffering, as already said in another thread, I started working on a minimal solution for buffered interrupt-based TX which I posted here: viewtopic.php?f=3&t=1189&p=15297#p15297
(note this implementation probably still needs some change)

edogaldo
Wed Jul 13, 2016 4:53 am
Rick Kimball wrote:
I like the idea of this one as a first step. I gave it a whirl and a few things seem to need work. I created a sketch to run the uart port at 2400 baud and would have expected lower time for doing a println of 80 characters than I got. Did you do any performance testing on that version? To see how much better it actually is?

-rick


Rick Kimball
Thu Jul 14, 2016 3:03 pm
* correct link with the missing paren ‘)’ …

https://github.com/edogaldo/Arduino_STM … TX-buffer)

-rick


Rick Kimball
Thu Jul 14, 2016 3:12 pm
That branch seems to have minimal changes. I like it. However, this version also has the problem with Hardware::end() and not flushing the buffer.
https://github.com/edogaldo/Arduino_STM … sart.c#L69

I’ve not extensively tested this code. However, just doing a visual inspection it seems like there might be race conditions. For example, this code looks like their might be a race between the the irq handler and setting of the interrupt enable flag:
https://github.com/edogaldo/Arduino_STM … sart.c#L69

Have you thought about using atomic operations to safeguard against this?

-rick


edogaldo
Thu Jul 14, 2016 5:28 pm
First of all thanks for fixing the link, I don’t know why, the last “)” was left outside the link in my post (I tried to include it but unsuccessfully..)
Rick Kimball wrote:That branch seems to have minimal changes. I like it. However, this version also has the problem with Hardware::end() and not flushing the buffer.
https://github.com/edogaldo/Arduino_STM … sart.c#L69

danSTM
Tue Aug 09, 2016 7:44 pm
@edogaldo

Thanks for the work on this! I originally posted here: Blue Pill Serial.print slow, UNO faster? -> viewtopic.php?f=3&t=1189

I put my pills aside because of this, now I can use them. I wonder how many people using serial STM Arduino projects think they’re getting the most out of the processor until they hit the serialTX speed bump?

I tried the SerialTXBuffer branch at https://github.com/rogerclarkmelbourne/Arduino_STM32 In my tests, I’m now getting the expected ~4.5x TX improvement over the UNO. :D

Certainly something like this needs to be added to the core! …thanks again.


RogerClark
Tue Aug 09, 2016 9:49 pm
guys

I will merge this at the weekend, as I think enough people have had time to test and comment on it


Leave a Reply

Your email address will not be published. Required fields are marked *