I can’t seem to figure out why this code takes 6617us on the blue pill and takes 2472us on an UNO. I’ve tried both Serial and Serial1 and get the same results.
I’m using serial upload with the built-in loader. IDE is 1.6.9 and core is up to date.
Code generally runs 7x faster, until I hit “Serial.prints”.
Thanks for any feedback!
//#define Serial Serial1 // TX A2
byte nodeID = 10;
byte packetNum = 11;
float C = 42.611;
float F = 42.611;
float H = 42.6;
float vBatt = 3.14;
int packetLoss = 10;
int packetTotal = 10000;
void printData() {
Serial.print(nodeID);
Serial.print(",");
Serial.print(packetNum);
Serial.print(",");
Serial.print(C);
Serial.print(",");
Serial.print(F);
Serial.print(",");
Serial.print(H);
Serial.print(",");
Serial.print(vBatt,3);
Serial.print(",");
Serial.print(packetLoss);
Serial.print(",");
Serial.println(packetTotal);
}
void setup() {
Serial.begin(57600);
delay(3000);
unsigned long start = micros();
printData();
unsigned long end = micros();
Serial.println(end - start);
}
void loop() {
}
As the core does not support interrupt on transmit, MCU must wait for every byte before sending another (polling transmit).
UNO has interrupt based transmit with ring buffer, the Serial.print function simply pushes the bytes to ring buffer and the interrupt handler sends the bytes.
PS:I hope that this will change in the near future…
#define SerialOUT Serial1 // TX A2
byte nodeID = 10;
byte packetNum = 11;
float C = 42.611;
float F = 42.611;
float H = 42.6;
float vBatt = 3.14;
int packetLoss = 10;
int packetTotal = 10000;
void printData() {
SerialOUT.print(nodeID);
SerialOUT.print(",");
SerialOUT.print(packetNum);
SerialOUT.print(",");
SerialOUT.print(C);
SerialOUT.print(",");
SerialOUT.print(F);
SerialOUT.print(",");
SerialOUT.print(H);
SerialOUT.print(",");
SerialOUT.print(vBatt,3);
SerialOUT.print(",");
SerialOUT.print(packetLoss);
SerialOUT.print(",");
SerialOUT.println(packetTotal);
}
void setup() {
Serial.begin(57600);
//SerialOUT.begin(300);
}
void loop() {
unsigned int Baudrates[]= {300,1200,2400,4800,9600,19200,38400,57600,115200};
unsigned long start;
unsigned long end;
for ( int ind =0 ; ind < 9; ind++ ){
Serial.print("AT Baudrate it Tates: -");
Serial.print(Baudrates[ind]);
Serial.print(" , ");
SerialOUT.begin(Baudrates[ind]);
start = micros();
printData();
unsigned long end = micros();
Serial.print(end - start);
Serial.print("uS. To send 1 char it should take uS :");
Serial.println((10000000/Baudrates[ind]));
delay(3000);
SerialOUT.end();
}
}
The lack of the interrupt and buffer is certainly a problem for me and probably for others.
I use RF receivers that use a timer interrupt and I don’t begin polling that interrupt again until I process the incoming data and output it as csv.
I thought I’d give the STM a try. The faster I leave the process data function, the sooner I can get back to polling the ISR. The gain in processing the data is negated by the blocking prints.
+1 for the buffer.
Roger mentions it here http://www.rogerclark.net/updates-to-ar … tm32-code/
As the core does not support interrupt on transmit, MCU must wait for every byte before sending another (polling transmit).
UNO has interrupt based transmit with ring buffer, the Serial.print function simply pushes the bytes to ring buffer and the interrupt handler sends the bytes.
PS:I hope that this will change in the near future…
..I’ve looked at https://github.com/arduino/Arduino/tree … es/arduino
..and https://github.com/esp8266/Arduino/tree … es/esp8266
Thanks!
viewtopic.php?f=3&t=1012&start=60#p15108
I think the other small problem is that some people would not want to loose any RAM that would need to be allocated to all the hardware serial devices.
i.e The STM32F103C has 3 hardware serial devices. All of which would allocate the buffer.
I know its only 192 bytes, but I’m not sure if everyone wants to trade that for Serial performance.
Though I suppose we can have a Poll on the forum to find out.
I know its only 192 bytes, but I’m not sure if everyone wants to trade that for Serial performance.
Though I suppose we can have a Poll on the forum to find out.
Its not a good idea to allocate a fixed 3*64byte buffer as has been said, especially when you may only be using 1 port.
I did some experiments and found that if you send one character the serial port does not block unless its already sending. So it should be possible to create a Virtual Serial Port with a ring buffer with a timer or even better an “interrupt on transmission finished” that feeds the real com Port. A #define for the buffer size would be good then you just define as many buffers as you need of the size you need (Probably a couple of bytes bigger than the largest message you intend to send).
If we keep each HardwareSerialX class separately it is possible to allocate the buffers only for Serial ports that are used in the project. I think that 64 bytes for transmit buffer is rather big, especially for fast speeds, but it depends on the application. We can set the default buffer size to 16 bytes (like arduino UNO) and if someone wants to change this, a simple DEFINE before include “HardwareSerialX.h” can do the job…. without breaking arduino API.
void USART1_IRQHandler( void )
{
long xHigherPriorityTaskWoken = pdFALSE;
char cChar;
if( USART_GetITStatus( USART1, USART_IT_TXE ) == SET )
{
/* The interrupt was caused by the THR becoming empty. Are there any
more characters to transmit? */
if( xQueueReceiveFromISR( xCharsForTx[ 0 ], &cChar, &xHigherPriorityTaskWoken ) ) // take the char to be sent from the FreeRTOS queue
{
/* A character was retrieved from the buffer so can be sent to the
THR now. */
USART_SendData( USART1, cChar );
}
else
{
USART_ITConfig( USART1, USART_IT_TXE, DISABLE );
}
}
if( USART_GetITStatus( USART1, USART_IT_RXNE ) == SET )
{
cChar = USART_ReceiveData( USART1 );
xQueueSendFromISR( xRxedChars[ 0 ], &cChar, &xHigherPriorityTaskWoken );
}
portEND_SWITCHING_ISR( xHigherPriorityTaskWoken );
}
- Buffered_InterruptBased_USART_TX.zip
- (22.12 KiB) Downloaded 32 times
Here the modified files:
– Arduino_STM32-master\STM32F1\system\libmaple\include\libmaple\usart.h: added ring_buffer wb in struct “usart_dev”
– Arduino_STM32-master\STM32F1\system\libmaple\usart_private.h: modified interrupt handling function to handle tx
– Arduino_STM32-master\STM32F1\cores\maple\libmaple\usart.c: modified usart_tx() to use wb if TXE not ready
– Arduino_STM32-master\STM32F1\cores\maple\libmaple\usart_f1.c: modified USARTs definition to add new tx buffer “wb”
I attached also the corresponding original versions I had as I don’t know if they have been modified since when I got them (more or less 1 month ago).
Hope this helps..
PS: could not test it on a real device but at least it compiles successfully..
Best, E.
I think this is definitely something that should be put into the core.
I will test it at the weekend and do a separate post / poll to see what the community things.
Thanks again
Roger
I am sure everyone will be very appreciative, as we all want the core to work as well as possible
If we keep each HardwareSerialX class separately it is possible to allocate the buffers only for Serial ports that are used in the project. I think that 64 bytes for transmit buffer is rather big, especially for fast speeds, but it depends on the application. We can set the default buffer size to 16 bytes (like arduino UNO) and if someone wants to change this, a simple DEFINE before include “HardwareSerialX.h” can do the job…. without breaking arduino API.
I recall trying to do something similar to prevent the SPI class getting instantiated at all, but it didnt work, because those definitions would need to be in a header that is read in my the core.
The only way to control it would either be to add something into the compiler directives e.g. -DUSE_SERIAL_TX_BUFFER=1 and then have an option menu in boards.txt that set flags that were picked up by platform.txt
Or alternatively, overload Serial.begin to add an additional argument to enable TX buffering.
Or the user could just edit one of the core headers e.g. Arduino.h and add the define in there if they wanted to have a TX buffer, in which case we should probably allow the size to be set, e.g.
#define SERIAL_TX_BUF_SIZE 128
and use
#ifdef SERIAL_TX_BUF_SIZE
around the new code
However I suspect the majority of people will prefer to have this even if it means they loose some RAM
I’ve not looked at you code, but what happens if they print more than a buffers worth? I presume it blocks until just 1 buffer’s worth is left, and then continues ?? (Sorry not had chance to review the code yet)
Thanks
Roger
We are talking actually for an improvement/enhancement of Arduino API…
I think that 64 bytes is a very good value, for TX/RX buffers, nobody will miss the lost bytes…. we are in ARM world… (even AVR arduino allocates 64 bytes for TX/RX for Atmega168/328 devices). The most important is to allocate the buffers only for the Serial ports that are used in project.
Although we could pass the buffer as an additional param to Serial.begin
I think its easier to just have the buffer size defined somewhere that people can get to it if they really need bigger buffers e.g. in one of the top level core headers.
And we should probably set it to the same value that the AVR uses. (as I presume some thought went into that value)
NB. I’ve not looked at the buffer value on the Due or Zero etc, to see if the buffer size is bigger, or whether its implemented at all.
– variant.cpp
– RingBuffer.h
Some more notes about the STM32F1 implementation:
– the buffer size is not only used in rb_init() but particularly in the “usart_dev” struct definition so I’m not surewe can allocate it in a dynamic way keeping the same approach:
/** USART device type */
typedef struct usart_dev {
usart_reg_map *regs; /**< Register map */
ring_buffer *rb; /**< RX ring buffer */
ring_buffer *wb; /**< TX ring buffer */
uint32 max_baud; /**< @brief Deprecated.
* Maximum baud rate. */
uint8 rx_buf[USART_RX_BUF_SIZE]; /**< @brief Deprecated.
* Actual RX buffer used by rb.
* This field will be removed in
* a future release. */
uint8 tx_buf[USART_RX_BUF_SIZE];
rcc_clk_id clk_id; /**< RCC clock information */
nvic_irq_num irq_num; /**< USART NVIC interrupt */
} usart_dev;
I based my implementation on the standard Arduino implementation which seems to me quite well optimized, the “write()” code works like this:
if [ready to send] then send
else
while [buffer full]
if not ["ready to send" interrupt enabled] then
if [ready to send] then send first buffered char
else do nothing (wait for interrupt to free the buffer)
put input char in buffer
enable "ready to send" interrupt
so the buffer full is a supported scenario.
Best, E.
<…>
– asking users to modify library files (i.e. asking them to change the value of USART_RX_BUF_SIZE/USART_TX_BUF_SIZE) I think is against the Arduino “MCUs for dummies” spirit
<…>
I must admit I had thought the TX serial was buffered untill I saw this thread but I spotted and understood the problem and am capable of doing a crood fix. I have not done many projects with STM32 but I now undertand why at least one was not being as responsive as I thought it should be compared to the Arduino Pro Micro.
For me TX Buffering is something that should be available in stm32duino or optionally by passing a *buf and a size of a virtual library which feeds the real com port. I can see that some projects may break if the serial is suddenly faster (non blocking) but the library would be better for having buffered serial. Even in the 90s we had buffered uarts 16550A made using RS232 much better and more reliable.
If you implemented both the old serial and say serialbuf would serialbuf be linked if it wasn’t used in the sketches code ? If so the user could just use serialbuf or serial. For new projects most people aren’t going to want serial to be blocking unless of course they are just going to wait for a reply to what they have sent.
The only thing is that USART_TX_BUF_SIZE/USART_RX_BUF_SIZE are hard coded to 256 characters.
It will be easy to add the new setTxBufSize() and setRxBufSize() functions to make them dynamically allocated.
For the blocking/non-blocking feature, it would be easy to do the same things that is already done in usb_serial, where fonctions enableBlockingTx() and disableBlockingTx() are present.
So, there is no needs to create a new class…
So, there is no needs to create a new class…
Recently, I’d measured the throughput of usb_serial where the source was a file located on SdCard and discovered that non-blocking was defaulted.
viewtopic.php?f=3&t=1159#p14759
I think this piece of code on F4 is not even present under F1. (I think F4 is more advanced than F1 in those area)
Recently, I’d measured the throughput of usb_serial where the source was a file located on SdCard and discovered that non-blocking was defaulted.
viewtopic.php?f=3&t=1159#p14759
I think this piece of code on F4 is not even present under F1. (I think F4 is more advanced than F1 in those area)
so I’m not that sure it’s a very well tested feature…
The call is NOT “this->usbEnableBlockingTx()”, so it not calling itself …
Don’t forget that the code is calling the C variant usbEnableBlockingTx() located in usbF4/usb.h, and therefore in usbF4/usb.c, it is calling the VCP_SetUSBTxBlocking() from usbF4/usb.c. (I know, it looks confusing, but it is LeafLab code …
)
Yes, it is only in USBSerial, I was just saying that it should be implemented also in HardwareSerial too so we can get both behaviour options.
And under F1, is not implemented too, not even on USBSerial, which means that it has a non-blocking behaviour, preventing send large amount of data without losing some bytes.
The call is NOT “this->usbEnableBlockingTx()”, so it not calling itself …
Don’t forget that the code is calling the C variant usbEnableBlockingTx() located in usbF4/usb.h, and therefore in usbF4/usb.c, it is calling the VCP_SetUSBTxBlocking() from usbF4/usb.c. (I know, it looks confusing, but it is LeafLab code …
)
I will send a PR to Roger …
EDIT : PR sent …
Thanks to @edogaldo for having find this untested bug !
BTW, I just started learning something about using github and prs, next time I’ll try opening a pr myself.
Best, E.
Though in this case I’d need to branch then merge the PR
- added buffered-interrupt based TX
- buffers can be dynamically allocated via HardwareSerial.begin()
- min RX buffer size: 16
- default RX buffer size: SERIAL_RX_BUFFER_SIZE
- min TX buffer size: 1
- default TX buffer size: SERIAL_TX_BUFFER_SIZE
- added HardwareSerial.enableBlockingTx()/disableBlockingTx() (blocking enabled by default)
- improved ring_buffer to support the full buffer size instead of buffer size – 1
I’d like to send you the changes via PR but I’m a complete nebie to Github, could you please explain me the steps to submit a PR for your project?
Thanks and best, E.
First, you will need a github account and login into it.
Second, from Roger’s github, press “fork” link in the upper-right corner, it will create a fork from Roger’s repo to your account.
On your PC, do a “git clone <the_forked_repo_in_your_account>”
Edit/merge your changes there.
Commit them and push them into github, it will then appear into your account github web page with all commits history.
Now press “compare” in your github account fork, it will shows the diffs against Roger’s repo.
If you are satisfy with them, press “Create Pull Request” button, fill out the message to associate with the PR and submit.
Roger will then have to do the merge …
Best, E.
I’ve taken a look at the PR, but I have a worry about the change you made to Serial::begin() and a few other issues.
with
void HardwareSerial::begin(uint32 baud, uint16 tx_buf_size, uint16 rx_buf_size, uint8_t config)
I’m a concerned about the way the buffers are dynamically allocated using malloc.
I’m sure the core doesnt use new(), as if I use new() in a sketch the binary gets much much bigger as loads of extra libs have to be linked in.
These buffers would need to statically allocated
#define TX_BUF_SIZE 64
#define RX_BUF_SIZE 64
char rxBuf[RX_BUF_SIZE];
char txBuf[TX_BUF_SIZE];
Please do not edit anything else apart from the change you are making
Do not tidy up any other code including removing any whitespaces or newlines, as Git treats all these changes as important and clutters up the difference displays that I have to wade though when looking at a PR
(Note I”m not the only one who has an issue with whitespace removal in PR’s Paul (Teensy) has posted several times about this to various places)
These buffers would need to statically allocated
These buffers would need to statically allocated
- about white spaces, tabs, etc..: I changed them for consistency anyway ok, got it, should I revert space changes?
- about dynamic buffer size: I was for static buffer size (as in the standard Arduino UNO & DUE implementations) but I got from this forum it would have been a plus to have dynamic buffers and this implies dynamic allocation; my tests are fine on this
- about the begin method:
- arguments 2, 3 and 4 are optional: if you don’t specify them they are defaulted with the specified values
- I left “config” as argument 4 (last) due to comment:
/*
* Roger Clark.
* Note. The config parameter is not currently used. This is a work in progress.
* Code needs to be written to set the config of the hardware serial control register in question.
*
*/
I still think a PR is the best approach
I can pull to a new branch if you like, at the moment I don’t feel comfortable with pulling to either master or development
I think the consensus from another thread as that people wanted me to pull new enhancements to a separate branch first, so I think I better start doing this, even though it ends up with loads of branches
Re: Dynamic allocation.
I thought that people wanted to dynamically allocate buffers themselves and pass into the function
Re: comment about the config
Thats a really old comment, I can’t remember if it has been superseded. Either way, even if its not currently supported, I’d still prefer to add new optional parameters at the end of the function signature, so that when config is finally implemented, that the API will be as close as possible to the Arduino.cc API e.g. on AVR boards
Of course if you think not worth the PR approach then let’s cancel it.
Re: comment about the config
Thats a really old comment, I can’t remember if it has been superseded. Either way, even if its not currently supported, I’d still prefer to add new optional parameters at the end of the function signature, so that when config is finally implemented, that the API will be as close as possible to the Arduino.cc API e.g. on AVR boards
On this purpose let me list my concerns respect to the solution you have in mind for dynamic buffer allocation (letting the users allocate custom buffers and pass them to the library):
- this solution allows a compile-time level dynamic buffer allocation, not a run-time level dynamic allocation so, from a functional point of view it’s almost the same as defining the buffers sizes at pre-processor level and letting the user to change these defines
- letting the user to allocate custom buffers does not prevent the need for the system to allocate default buffers for each device (even those not used) and respect to the pre-processor based solution is sub-optimal because leads to a situation in which, when the user allocates some custom buffer then the application will end-up having 2 buffers allocated for the same purpose but only one used (either the default one or the custom one) thus resulting in wasting memory
All said, do you think we can find a trade off solution to release at least buffered interrupt-based TX functionality?
What would you suggest?
Thanks and bye, E.
I think this resolves the problems of using malloc (which I think a lot of people will have an issue with)
I think this resolves the problems of using malloc (which I think a lot of people will have an issue with)
The IDE silently includes Arduino.h in the first line of the program (and this includes all other includes), so it is impossible to define the buffers in the program.
The other solution is to define the buffers with defines in command line arguments, but this requires changes in arduino IDE build system, or to have defined other variants with different buffer sizes.
The IDE silently includes Arduino.h in the first line of the program (and this includes all other includes), so it is impossible to define the buffers in the program.
The other solution is to define the buffers with defines in command line arguments, but this requires changes in arduino IDE build system, or to have defined other variants with different buffer sizes.
Yes slammer, you are right, I was trying to find a way to make it easy set defines but no success..
In the “old” days, one would write a separate GUI “conf utility” to fine tune the flatfiles used to actually specify the compile environment. Today, such a utility could exist in the /tools directory. In this way, it would be easy to set #defines and other configuration parameters.
Ray
But just a few more things to throw into the mix
Unless I’m much mistaken we already have a RX buffer (ring buffer) in the uart class ???? (there is definitely code for this)
So, whatever the final solution is. We can’t change the existing default state of having a 64 byte Serial RX buffer
From usart.h
#ifndef USART_RX_BUF_SIZE
#define USART_RX_BUF_SIZE 64
#endif
/** USART device type */
typedef struct usart_dev {
usart_reg_map *regs; /**< Register map */
ring_buffer *rb; /**< RX ring buffer */
uint32 max_baud; /**< @brief Deprecated.
* Maximum baud rate. */
uint8 rx_buf[USART_RX_BUF_SIZE]; /**< @brief Deprecated.
* Actual RX buffer used by rb.
* This field will be removed in
* a future release. */
rcc_clk_id clk_id; /**< RCC clock information */
nvic_irq_num irq_num; /**< USART NVIC interrupt */
} usart_dev;
<…>
Other ideas worth considering, is adding very small TX buffering by default e.g. 8 or 16 bytes, and the let the user edit the #defines ??
<…>
Ray
It would work perfectly if the TX speed was the same as the RX which I understand it is. you would just need to check that the RX buffer count + the TX buffer count was less than the Buffer size – 1. So you have 30 characters in the TX queue and 32 characters in the RX queue lets say another character is received you then have 33 chars in the RX queue. As they operate at the same speed the TX queue will drop by one char before the next RX character is received. Thus we need only one 64 char buffer for both TX and receive. The only thing that would be blocking is adding a TX char to the queue if the the total characters in both queues was 63.
This way we use no more memory than is already used in the current system and face it most people send a load of chars and then wait for a reply.
I may of not explained how this works very well I will try to put it in code to explain.
<…>
This way we use no more memory than is already used in the current system and face it most people send a load of chars and then wait for a reply.<…>
This does not mean that a new core cannot be developed radically departing from Arduino purity, but the original thinking of the old members was for “as-close-as-possible….”
@Ray, the main point of this thread is that Maple library does not provide buffered interrupt-based TX for the USART devices, which is ootb for Arduino boards (both UNO and DUE) and which makes USART TX for STM32F1 boards very slow (much more slow than in UNO boards) so, this improvement would be, imo, just the coverage of a lack in the libmaple implementation respect to the Arduino standard features.
Best, E.
Yea, I understand that.
But we have another responsibility to the real Maple folks since we completely inherited the Leaflabs codebase. We now carry that torch by default…
You can still read the docs. The forums will remain active until August 2016, but at that point they will be converted to a static archive. Consider checking out the resources and community at http://www.stm32duino.com instead.
Honestly, I do not know how many of our members own real Maple hardware and may be using STM32duino… but I see from time to time a large number of users that are not bots and are not logged-in. Anyway, it is something to think about. Hell, I don’t have any issue with changing F2, F3, or F4. It’s a small thing, but IMO worthy of some serious neuron activity.
Additionally, and this is a personal statement of fact, in 3 years I have not had a serial issue. Maybe I am just lucky or perhaps my code does not stress the uC significantly. I’m sure I could write something to force the issue to appear, but nothing I have done normally in 3 years has caused me concern.
All the above is why I abstained for making a big forum issue out of what I think, architecturally. Those that have been in this forum for a few years know that I’m not a shy member. I just think that members that have experienced issues or expect to experience issues have a voice. As for me, I just do anything I want anyway.
Ray
<…>
Yes, you carry the torch, but if you don’t go ahead with that torch, what do you carry it on for?
Why the F0, F2, F3, and F4 libraries can grow but the F1 cannot?
Come on, the name of this site is “Arduino for STM32” not “Libmaple forever” (by the way the original Libmaple is already protected in the libmaple repository available for posterity).
<…>
I opened a dedicated thread for my improvements: viewtopic.php?f=17&t=1235
For anybody interested I’d suggest to discuss details there.
Best, E.
ps: I replied to your questions in the other thread.
Pps: let me also add that original Maple boards are at the end F1 boards so they’d benefit for the improvements too..

