I am tired to write again and again Serial.print() commands to display some values. Unfortunately the typical printf function is very big, the libc version can take almost the half or more memory of a F103C8. It is not only the function itself, is the number of functions that the linker attaches when this library is called (floating point, helpers, stdio stuff, etc)
I remember, years ago there were very optimised versions of printf() for use with 8051s taking no more than 2KB. For this reason I opened my old notebooks and I found some versions of these functions. I tried to adopt them in STM32 environment, inside Print class. I tested over 6-8 functions and I end-up with the version of sdcc compiler. I removed all mcu specific stuff and after some changes the code is running on STM32.
What I have until now:
– A printf function that supports integer (long, short, byte), string, character and pointer variables.
– ‘ ‘, -, +, b, l, c, s, p, d, i, o, u, and x format specifiers
– No static variables, no dynamic allocation, no globals, just plain functions running with local variables.
– Print.printf member function
– Very small size : 1080 bytes (for the 3 core functions) and 64 bytes for Print.printf() encapsulation.
– Until now I cant rid off the declaration of a temporary buffer inside Print.printf(), but I am sure that the buffer is not needed (WIP…). For this reason the output of the function is limited by the size of this buffer, now is 64 bytes (no test for overflow!!!).
– No floating point but it is possible to include this functionality, I have to examine the impact of this.
I think that code needs some cleanup, but it is a good start.
The code is included in one file called print_format.c, this includes the main function and some helpers.
Two files of our core files must be changed.
First the Print.h for the declaration of the function, Roger has already the function with header guards, so I wrote a new line
int printf(const char * format, ...);
– remove of dedicated pointer printing ( it is possible to print pointers as long unsigned integers with integer specifiers)
– direct call back to Print.write functions without buffering (very important for Serial or LCD printing)
I am studying also some algorithms to convert a float point number to string without float functions. This is essential for floats support without the overhead of floating point libraries. Of course, there are some limitations because these stripped functions do not support scientific notation, have fixed number of precision, etc
By best until now is about 1500 bytes more… but this is WIP….
But, when SRAM and flash are very tight (maybe a UNO, Nano…) then Mikal Hart’s Streaming macro is a no-load way of handling print output formatting. http://arduiniana.org/libraries/streaming/
One can even pull off simple “logic” within the stream:
Example by Rob Tillaart:
#include <Streaming.h>
// .....
int h = 14;
int m = 6
Serial << ((h<10)?"0":"") << h << ":" << ((m<10)?"0":"") << m << endl;
Furthermore, if only Serial.printf is used in the program without different versions of Serial.print()/Serial.println() it is possible the total size of program to be smaller.
Furthermore, if only Serial.printf is used in the program without different versions of Serial.print()/Serial.println() it is possible the total size of program to be smaller.
(On ESP, it is there since awhile, but it was much simpler to implement, since there was enough space to use vsnprintf() …)
Over on forum.43oh.com we had long discussions about printf. One user there, Opossum, had a unique approach that resulted in small code that used no buffering. See this post: http://forum.43oh.com/topic/1289-tiny-printf-c-version/
I evaluate almost 6-8 different implementations of small printf() functions with additional modifications.
I want something with small footprint (about 1K is OK), no static variables (we need reentrancy), long int support, 0 and (space) specifiers, full support of all integer types, optional float support but without linking of floating point libraries (ok, I can give 1-1.5 more KB for that), integration in Print class.
Instead of trying to encapsulate the C function and the callback inside Print Class (which actually does not have write() function, is only a virtual), it is better to write the printf() as native C++ member function of Print (aka Arduino style). The same approach is used anyway for the other functions of Print.
Now there is no C code, all printf functionality is in a function inside Print class, there is no reason to make callback to write something, it is very easy by calling the virtual function write() of Print. I also need a small internal function to calculate the digits of numerical values.
The total size of these 2 functions is 0x22+0x318 = 0x33A = 826 bytes (No buffers, no static variables)
To use add this code at the end of Print.cpp
//------------------------------------------------
#ifdef toupper
#undef toupper
#endif
#ifdef tolower
#undef tolower
#endif
#ifdef islower
#undef islower
#endif
#ifdef isdigit
#undef isdigit
#endif
#define toupper(c) ((c)&=0xDF)
#define tolower(c) ((c)|=0x20)
#define islower(c) ((unsigned char)c >= (unsigned char)'a' && (unsigned char)c <= (unsigned char)'z')
#define isdigit(c) ((unsigned char)c >= (unsigned char)'0' && (unsigned char)c <= (unsigned char)'9')
typedef union {
unsigned char byte[5];
long l;
unsigned long ul;
float f;
const char *ptr;
} value_t;
size_t Print::printDigit(unsigned char n, bool lower_case)
{
register unsigned char c = n + (unsigned char)'0';
if (c > (unsigned char)'9') {
c += (unsigned char)('A' - '0' - 10);
if (lower_case)
c += (unsigned char)('a' - 'A');
}
return write(c);
}
static void calculateDigit (value_t* value, unsigned char radix)
{
unsigned long ul = value->ul;
unsigned char* pb4 = &value->byte[4];
unsigned char i = 32;
do {
*pb4 = (*pb4 << 1) | ((ul >> 31) & 0x01);
ul <<= 1;
if (radix <= *pb4 ) {
*pb4 -= radix;
ul |= 1;
}
} while (--i);
value->ul = ul;
}
size_t Print::printf(const char *format, ...)
{
va_list ap;
bool left_justify;
bool zero_padding;
bool prefix_sign;
bool prefix_space;
bool signed_argument;
bool char_argument;
bool long_argument;
bool lower_case;
value_t value;
int charsOutputted;
bool lsd;
unsigned char radix;
unsigned char width;
signed char decimals;
unsigned char length;
char c;
// reset output chars
charsOutputted = 0;
va_start(ap, format);
while( c=*format++ ) {
if ( c=='%' ) {
left_justify = 0;
zero_padding = 0;
prefix_sign = 0;
prefix_space = 0;
signed_argument = 0;
char_argument = 0;
long_argument = 0;
radix = 0;
width = 0;
decimals = -1;
get_conversion_spec:
c = *format++;
if (c=='%') {
charsOutputted+=write(c);
continue;
}
if (isdigit(c)) {
if (decimals==-1) {
width = 10*width + c - '0';
if (width == 0) {
zero_padding = 1;
}
} else {
decimals = 10*decimals + c - '0';
}
goto get_conversion_spec;
}
if (c=='.') {
if (decimals==-1)
decimals=0;
else
; // duplicate, ignore
goto get_conversion_spec;
}
if (islower(c)) {
c = toupper(c);
lower_case = 1;
} else
lower_case = 0;
switch( c ) {
case '-':
left_justify = 1;
goto get_conversion_spec;
case '+':
prefix_sign = 1;
goto get_conversion_spec;
case ' ':
prefix_space = 1;
goto get_conversion_spec;
case 'B': /* byte */
char_argument = 1;
goto get_conversion_spec;
// case '#': /* not supported */
case 'H': /* short */
case 'J': /* intmax_t */
case 'T': /* ptrdiff_t */
case 'Z': /* size_t */
goto get_conversion_spec;
case 'L': /* long */
long_argument = 1;
goto get_conversion_spec;
case 'C':
if( char_argument )
c = va_arg(ap,char);
else
c = va_arg(ap,int);
charsOutputted+=write(c);
break;
case 'S':
value.ptr = va_arg(ap,const char *);
length = strlen(value.ptr);
if ( decimals == -1 ) {
decimals = length;
}
if ( ( !left_justify ) && (length < width) ) {
width -= length;
while( width-- != 0 ) {
charsOutputted+=write(' ');
}
}
while ( (c = *value.ptr) && (decimals-- > 0)) {
charsOutputted+=write(c);
value.ptr++;
}
if ( left_justify && (length < width)) {
width -= length;
while( width-- != 0 ) {
charsOutputted+=write(' ');
}
}
break;
case 'D':
case 'I':
signed_argument = 1;
radix = 10;
break;
case 'O':
radix = 8;
break;
case 'U':
radix = 10;
break;
case 'X':
radix = 16;
break;
default:
// nothing special, just output the character
charsOutputted+=write(c);
break;
}
if (radix != 0) {
unsigned char store[6];
unsigned char *pstore = &store[5];
if (char_argument) {
value.l = va_arg(ap, char);
if (!signed_argument) {
value.l &= 0xFF;
}
} else if (long_argument) {
value.l = va_arg(ap, long);
} else { // must be int
value.l = va_arg(ap, int);
if (!signed_argument) {
value.l &= 0xFFFF;
}
}
if ( signed_argument ) {
if (value.l < 0)
value.l = -value.l;
else
signed_argument = 0;
}
length=0;
lsd = 1;
do {
value.byte[4] = 0;
calculateDigit(&value, radix);
if (!lsd) {
*pstore = (value.byte[4] << 4) | (value.byte[4] >> 4) | *pstore;
pstore--;
} else {
*pstore = value.byte[4];
}
length++;
lsd = !lsd;
} while( value.ul );
if (width == 0) {
// default width. We set it to 1 to output
// at least one character in case the value itself
// is zero (i.e. length==0)
width = 1;
}
/* prepend spaces if needed */
if (!zero_padding && !left_justify) {
while ( width > (unsigned char) (length+1) ) {
charsOutputted+=write(' ');
width--;
}
}
if (signed_argument) { // this now means the original value was negative
charsOutputted+=write('-');
// adjust width to compensate for this character
width--;
} else if (length != 0) {
// value > 0
if (prefix_sign) {
charsOutputted+=write('+');
// adjust width to compensate for this character
width--;
} else if (prefix_space) {
charsOutputted+=write(' ');
// adjust width to compensate for this character
width--;
}
}
/* prepend zeroes/spaces if needed */
if (!left_justify) {
while ( width-- > length ) {
charsOutputted+=write( zero_padding ? '0' : ' ');
}
} else {
/* spaces are appended after the digits */
if (width > length)
width -= length;
else
width = 0;
}
/* output the digits */
while( length-- ) {
lsd = !lsd;
if (!lsd) {
pstore++;
value.byte[4] = *pstore >> 4;
} else {
value.byte[4] = *pstore & 0x0F;
}
charsOutputted+=printDigit(value.byte[4], lower_case);
}
}
} else {
charsOutputted+=write(c);
}
}
va_end(ap);
return (size_t)charsOutputted;
}
<…>
No other implementation is so small, I tried almost everything , there are smaller implementations but they don’t support all types of integers or width specifiers or they use static variables….
<…>
Now, how fast is it?
Ray
Our implementation of write to serial (the usart_putc) is not interrupt based, neither support buffer, as result of this, the MCU during writing of a character to serial just waiting to end the transmission ( I dont know the internals of STM32 but either waiting the current char to push out or the previous… but the result is almost the same if you want to push multiple bytes to uart)
In a typical application a ring buffer must be used, the usart_putc normally is the entrance point to ring buffer but is not blocking. The tx interrupt triggers the sending of the next character until the buffer gets empty.
The time that a character needs to leave uart is not small. At 115200 a character needs almost 1/11520 sec = 87 usec, it is really long time for a 72MHz MPU (at 9600 is an eternity….)
Our implementation of write to serial (the usart_putc) is not interrupt based, neither support buffer, as result of this, the MCU during writing of a character to serial just waiting to end the transmission ( I dont know the internals of STM32 but either waiting the current char to push out or the previous… but the result is almost the same if you want to push multiple bytes to uart)
In a typical application a ring buffer must be used, the usart_putc normally is the entrance point to ring buffer but is not blocking. The tx interrupt triggers the sending of the next character until the buffer gets empty.
The program memory is so huge for MCU applications that a full 50K-60K version of printf is almost nothing…. that’s why the Print.printf() is included on the core of esp8266.
From the other side, the most used MPUs in last 25 years of my professional life, are ATmega8 and 89C52… In these machines, even the Print class is a luxury… you have to live with basic itoa and ltoa….
Anyway may be more usefull for our community to try to improve some core functions like buffered transmit on uart… (lol, I want something to keeps my nights busy….)
PS: I am afraid that the measuring of transmit time with Serial.write is more complex. My previous post about UART is technically correct but the Serial.XXX functions are not using a real UART but an emulated device through USB that acts as UART. The timing of this device is not an easy task…
The concept is the same because of the blocking nature of uart_putc but the timing is unknown.
<…>
From the other side, the most used MPUs in last 25 years of my professional life, are ATmega8 and 89C52… In these machines, even the Print class is a luxury… you have to live with basic itoa and ltoa….
Anyway may be more usefull for our community to try to improve some core functions like buffered transmit on uart… (lol, I want something to keeps my nights busy….)
long uScount;
digitalWrite(BOARD_LED_PIN, HIGH); delay(500); // LED_on + half-second
uScount = micros();
Serial1.println("This is my big fat text.... 50 characters long....");
uScount = micros() - uScount;
Serial.print("\t\t\t\t printf Serial1 : uS="); Serial.println( uScount);
uScount = micros();
Serial.println("This is my big fat text.... 50 characters long....");
uScount = micros() - uScount;
Serial.print("\t\t\t\t printf USB : uS="); Serial.println( uScount);
digitalWrite(BOARD_LED_PIN, LOW); delay(500); // LED_off + half-second
Yes, agree.
Ray
Now, about printf in the print class, I understand Ray’s point, that adding 1KB of code here, 1 there, eventually adds up to a good amount, and people may not need it, but I thought if a sketch doesn’t use printf, that printf would not be included by the compiler/linker, so unless it’s actually used by the sketch, there is no difference whether the function is present in the print class or not.
Am I missing something?
Besides that, one other observation: I tested using sprintf to convert a message to a string to them be able to print it with println, in a test sketch for SWO, and it increases the sketch by 15KB of flash and 1.5KB of RAM +/-. By comparison Slammers printf increases the size by about 1KB or so only.
Given that, if printf doesn’t take space when not used, and adding it to the core may save people from having to use sprintf, my vote would go to include it in the core.
EDIT: I compiled my SWO test sketch using println and using printf. In both cases printf is part of the SWO class. It really seems it doesn’t add to the code unless used.
Then I went 1 step further and include it in the print class instead, and I get a similar result, the code size does not grow when not used. So I see no harm in adding it to the core. I am not sure why Ray’s test was showing increased size when not used.
printf in SWO class but not used in the sketch (only println):
Sketch uses 15,940 bytes (3%) of program storage space. Maximum is 524,288 bytes.
Global variables use 2,952 bytes of dynamic memory.
printf in SWO class and used:
Sketch uses 16,900 bytes (3%) of program storage space. Maximum is 524,288 bytes.
Global variables use 2,952 bytes of dynamic memory.
printf in print class and used:
Sketch uses 16,908 bytes (3%) of program storage space. Maximum is 524,288 bytes.
Global variables use 2,952 bytes of dynamic memory.
printf in print class and not used:
Sketch uses 15,940 bytes (3%) of program storage space. Maximum is 524,288 bytes.
Global variables use 2,952 bytes of dynamic memory.

