RTOS question

sheepdoll
Sun Mar 10, 2019 6:15 pm
Not sure where to put this.

I found a wonderful youtube channel https://www.youtube.com/channel/UC-CuJ6 … K3Q/videos which has some really well done tutorial.
One of these tutorials shows how to set up free RTOS and three simple threads using the USART.

Now the thread only shows writing to the serial port. I am looking to move from my home baked micro kernal written in AVR assembly. I could port this to STM32 but why re-invent the wheel.

The pipe organ controller morphed out of a floppy disk MIDI player. (which used a really butchered ASM port of the posix floppy driver and requires scheduling.)

Anyway the system is set up with back channel commands. So the main thread waits for user input. There is another high priority thread which scans the keyboard inputs. A third thread queues the outputs. Only the key-scanner thread is synchronous and on the PC version is tied to the Vertical refresh. On the micro control version this is tied to the MIDI clock.

So my question is, Doing this with RTOS. Namely in idle the default thread polls the serial and queues it. I there is an eol cr it processes the command and can run or spawn other threads which can queue into an action list which the timer queues. The high priority thread polls the keyboards, then sleeps sleeps till the next timer tick. If there are actions to process they are deferred too and will run in list order.

Anyway in the main idle user thread, that watches the serial backchannel, is there a standard way of handling the serial input? Most of this time this is buffered into a command line, there are however escape codes (literally) that have to be processed with a higher priority.

I suppose I can figure this out. I suspect it is more of a syntactical thing between my home baked use of polling timers and event loops, which do not have formal thread declarations. I do make use of semaphores, and lists of pointers which are stored in arrays and called in order and are designed to execute in fixed time windows. On the AVR I let the timers run free in reload mode and poll on them. Only serial can interrupt and that only to fill the ring buffer.

I guess the question is how much of getting buffered serial is handled in the FreeRTOS syntax what functions get called. I am not sure one would use a mutex for input the way it is used in the above tutorial for output. Commands spawn threads which have a setup and their own polling loop. These are threads are deconstructed and their resources released when complete probably to save memory space which is filled with dynamic tables. Some threads are more permanent than others. Most sit in their idle loops or simply consist of a return instruction.

I actually implemented some of the user interface part of this in Postscript (which is stack/ key value based.) Now I am attempting to convert the self un-documenting postscript to C with the Original 68K Assembly as reference.

I think I have an answer here looking for a question but what is the question I am wanting to ask simply?

Problem in that I get bogged down with the example code (I always seem to have the wrong board/processor) The above tutorial is good but it remains cut and paste. Which probably most modern programming is. Anyway if anyone has advice how to slog through some of this (or write shorter more focused questions) I am open to suggestion.


mrburnette
Mon Mar 11, 2019 1:14 pm
Julie,

I found the resources of FreeRTOS invaluable in answering my questions regarding its implementation with the Arduino port of the ESP32.
https://www.freertos.org/Documentation/RTOS_book.html

Ray


sheepdoll
Tue Mar 12, 2019 7:51 pm
Thanks for the link.

My searches keep returning me to that site, Which are a wonderful set of answers looking for questions.

There is a lot of information to digest.

The main issue in these tutorials, why I like the https://www.youtube.com/channel/UC-CuJ6 … K3Q/videos examples is that they are not built using HAL cubemx projects. Where they take advantage of the user code flags. This is a big downside to the STM examples too. The lack of a cubemx project.

I think what I am looking for is fairly straightforward. I have working assembly code to convert to C. This implements a RTOS. So a lot of the process is mapping the low level ASM functions (where one sees the data structures.) to the high level free RTOS calls and callbacks.

The base serial input is a ring buffer. There are ways of calling different serial input routines, so the actual call is a pointer in an array which are called from an ISR. As noted there is a high priority scanning task which runs at fixed interval. The serial task checks for data in. If there is no data in it swaps to the scanning task. The scanning taks checks if it is time to scan, if it is too soon, it returns to the serial in task.

The actual serial receive is handled in the ISR. All this does is look at a fixed location (driver array) for the pointer to the serial input routine.

I found some examples via the freeRTOS.org and links to github repositories, however unlike the MYaqoobEmbedded videos, they get real complex real fast and have lots of answers for questions I am not asking.

In the ASM code the call is simply called swap() which swaps out the current task. Then the ASM code looks to see if there is new data in the ring. This is pretty much the idle loop. When a return is found the command parser parses the command string. ( which may spawn other tasks which control window functions, and may respond to keyboard control characters like the function keys arrow keys and paging keys.) Each task store the window GUI, (in ANSI terminal escape codes) as well as the current input cursor location on the screen.) Most of the parsed functions are blocking in the edit mode. There are also run modes which block some of the edit tasks.

I suppose I will have to trial and error this, as there does not seem to be a standard way of implementing an RTOS receive ring buffer.


ag123
Tue Mar 12, 2019 8:25 pm
this is just my 2 cents, i tried using rtos on stm32f103, i stumbled into a couple of problems
1) running out of memory, if you have a couple of vtasks and have queues / ring buffers passing things between them, memory runs out pretty fast
this in part memory is highly fragmented as each vtask needs to reserve some stack space and in fact i think a block of it is reserved
and we only have that scarce 20k for everything on stm32f103
2) we often casually call out to ‘non rtos’ functions like Serial.println(“hello world”);
the catch is those ‘non rtos’ functions may not be thread safe, and unlikely to be thread safe
i imagine that rtos give each vtask a 1ms timeslice and (context) swtich between the vtasks, this could create catch situations
where you are writing to some (shared) memory and the next vtask that come alive write into the same memory
especially so for ‘non rtos’ functions

in the end i developed this rather elaborate event loop !
viewtopic.php?f=18&t=4299
as event loop does cooperative multitasking (there is no context switch), each function is a normal c function or c++ method.
it runs to the end and returns. if your function or method don’t return, all the other tasks get held up,
so you need to explicitly keep state and yield so that other tasks gets a chance to run

the simplier form of that event loop is what i called a round robin scheduler that is nothing more than calling each of the functions in loop()
viewtopic.php?f=18&t=2117

and one of those things is i hooked the systick event to set a flag, you could take a look at that round robin scheduler
this is so that i call the functions only when systick interrupt fired (i.e. every 1 ms as that is what is there in libmaple)
if it is some other interrupts i simply goto asm(“wfi”) and wait for the next (systick) interrupt, i.e. sleep hence saving power as well

and what is the benefit of this ‘co-operative’ multi-tasking that do not do any context switch?
there is no context switch, you plan and execute everything in the correct sequence and keep all the states correctly in each function call
this keeps all the dependencies in check *sequentially* – hence locks/semaphores are unnecessary, and it save *a lot* of memory as everything is kept on one stack, the stack (and those local variables) gets unrolled automatically each time the function / method returns (c, c++ does it all)

a *sequential* event loop works, but sacrifice performance? yes and no
just look at java and swing
https://docs.oracle.com/javase/tutorial … patch.html
Swing event handling code runs on a special thread known as the event dispatch thread. Most code that invokes Swing methods also runs on this thread. This is necessary because most Swing object methods are not “thread safe”: invoking them from multiple threads risks thread interference or memory consistency errors.

even Swing GUI is not fully multi threaded in this very day and age where we have processors up to 128 high performance superscalar cores perhaps capable of petaflops
only one slowest thread runs on a minuscule thread on that core of all that processing power holding up the whole universe
i think this is even true of those google / microsoft turing test breaking ai deep learning deep neural networks, those neural networks use thousands of gpu cuda cores perhaps harnessing petaflops of power to do that one iteration update billions of the states and weights in the network.
but stochastic gradient descent is done on a *non linear* data landscape and you’d simply need to take one step at a time down hoping to find that ‘bottom’
try to take big steps or parallelise things and you may instead simply descend into chaos and never converge

that is the way it is, but sequential solves the problem of locks and semaphores
and it is more so on stm32f103 with 20k sram to be shared among all ‘tasks’
and i actually used my own event loop mentioned
;)


Squonk42
Tue Mar 12, 2019 9:11 pm
The next step is to have each “process” in the event loop work as a coroutine.

The most efficient method is to implement it in the form of a finite-state automaton that execute the code in the current state on each loop, then yields the CPU back to the main loop. The maximum time you allow for each given state will give you your “soft latency”.

If you’d rather write your code in a linear threaded form, you can use protothreads to do the syntactic sugar on top of it.

Also, don’t forget to separate clearly IRQ handling into a separate domain from the main event loop, and convert each interrupt into a queued event (FIFO). The maximum time you will spend consecutively in the IRQ domain will give you your “hard latency”.


sheepdoll
Tue Mar 12, 2019 9:57 pm
On AVR I tend to use eventloop processing with polling on the interrupt flags. There is a bit of a state machine in the event loop, so the state determines the order that the psuedo tasks run.

I actually implemented on an AT90S8515 no less a full microkernal with message queues and semaphores. It helped that I had 128K of external SRAM. This was for a floppy disk player which used a posix type driver. Got sort of unwieldy.

These days I tend to keep everything in the event loop. Mostly for static debug. In C every call generates a stack frame, which on the Mega328 eats memory quick. So yeah I can see where the suggestions are coming from.

I did convert the UI part of the code to run in postscript (spitting out ANSI terminal codes to the back channel rather than drawing to the rendering page. So I could probably simply manage the thread structures as the 68K ASM code. It is more that since I am re writing this from scratch I wanted to use a more modern API.

I also have some code snippets from the latest version of the program for windows. However these are written in C++ and use templates and operator overloading, not to mention the windows stdafx and all the window structure overhead. I really want a more portable C solution.


ag123
Wed Mar 13, 2019 6:24 am
one of the issues with event loop is it makes the handling of states cryptic, as mostly one thinks in terms of state transitions and codes the switch or if-else statement block to handle the state transitions.
to make life simplier (for myself, i hope for others), i created that *event loop*
viewtopic.php?f=18&t=4299
and one of those things i thought about is the AsyncWait() utility class i made as part of that.
so a blinky looks like this
https://github.com/ag88/stm32duino-even … edTask.cpp
bool CLedTask::handleEvent(Event& event) {
switch(event.event) {
case EventID::LedTaskLedOn:
digitalWrite(BOARD_LED_PIN, HIGH);
event.handle_id = EHandleID::LedTask;
event.event = EventID::LedTaskLedOff;
WaitMgr.await(ledduration,event,false);
break;
case EventID::LedTaskLedOff:
digitalWrite(BOARD_LED_PIN, LOW);
event.handle_id = EHandleID::LedTask;
event.event = EventID::LedTaskLedOn;
WaitMgr.await(ledduration,event,false);
break;
default:
break;
}
return true;
}

Squonk42
Wed Mar 13, 2019 6:40 am
Look at my protothreads link above and how they really work. They efficiently mask the state machine horror into a classic thread-like programming flow. It only costs 1 word per thread.

ag123
Wed Mar 13, 2019 6:58 am
thanks i’d check out protothreads, it is possibly a better way and may consume less memory

Squonk42
Wed Mar 13, 2019 1:07 pm
BTW, the same Duff’s Device used in the protothreads can be used to implement coroutine in C too.

ag123
Wed Mar 13, 2019 2:13 pm
interesting stuff, i’ve not gone back to read those Donald Knuth’s stuff for years, maybe i should read them again, the last time i read a little and fell asleep (it is pretty thick) and later abandoned the effort
donald knuth has been somewhat critical of of multi-core processors
. I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the “Itanium” approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.

unfortunately this is still very much the state of the art today, and worse, multi-core become thousands (gpu cuda) to millions of cores
A million ARM cores to host brain simulator
https://www.eetimes.com/document.asp?doc_id=1266450
https://www.allaboutcircuits.com/news/s … itched-on/
https://hothardware.com/news/spinnaker- … uman-brain
i’m not sure where this multi core and massive number of cores madness is going to take us in future, unfortunately it seemed that is still the state of the art and people are going to try as many cores as is possible. it seemed supercomputers of the future are the new aluminium smelters using millions of amps and gigawatts but not to melt aluminium but to try to run as many cores as is possible, i’m not sure if the heat is good enough to cause nuclear fusion
:lol:


Squonk42
Wed Mar 13, 2019 2:18 pm
Yes, The Art of Computer Programming is a good bedside book :D

MoDu
Wed Mar 13, 2019 3:22 pm
I believe it may have already started. Hyper-threading is already going the way of the dodo: turns out it’s fundamentally incompatible with CPU security.

sheepdoll
Wed Mar 13, 2019 4:31 pm
My Knuth books, went to the Salvation Army when I moved and had to downsize a few things.

Personally I have never been much of a fan of threaded programming. Although I did cut my teeth on a HP timeshare system. It does however have it’s uses even if it is more illusion. Debugging is a bit of work as one never really knows what the other thread is doing, unless one delves into the shedular itself.

The microkernal I wrote years back based on a book called C and the 8051 was round robbin. The only pre-emptive thing that I have ever really needed in the serial back channel. Which I think is the basis of what I am still looking for.

I have found some interesting tutorials on serial with DMA and Idle line sensing for receiving async data . I also discovered why I was not getting serial from my Nucleo 401. I had pulled the TX and RX straps from the ST link side. I did this with my F103 Nucleo (which I seem to have misplaced.) to use the alternative serial port. On the Nucleo F401 there are two additional straps to move the uart2 TX and RX to the arduino/morpho pins. So some of the issue was hardware rather than software.

I am leaning on retaining the round robbin scheduler from the 68K asm code as it is tied to the GUI. In it each task has a stack and a window. Does however mean that I will need to learn how to manually move the stack pointer, unless I do retain the RTOS thread mechanism.

There is much to learn and I am grateful for the discussions here while they exist.


ag123
Thu Mar 14, 2019 5:03 am
on the stm32(f103) i found out that with the hardware usart you need to explicitly check framing errors and parity errors in the default setup
viewtopic.php?f=53&p=53734
otherwise bytes with framing errors or parity errors gets passed along with valid data
this confuse quite a number of apps as apparently apps assumes a clean data channel that only valid data are transmitted in the pipeline
that 2 liner patch i did to check framing errors and parity errors made a visible difference at least with stm32loader
after that patch it works each time every time, prior that i had a lot of erratic errors and i’ve been wondering what’s wrong with my sketch that does a usb-serial dongle job

that may help if you are meddling with the usart, stm32f103 is a good alternative to ft232 or even ftdi 2232h serial bridge, it has all that hardware if one wants to do usb – usart (x3), spi (x2), i2c (x2), can, adc and generally gpio bridges, all it takes in the middle is that firmware or your sketch. And stm32f103 is (much) more capable than a usb to serial bridge


MoDu
Thu Mar 14, 2019 12:09 pm
[sheepdoll – Wed Mar 13, 2019 4:31 pm] –
Personally I have never been much of a fan of threaded programming. (…)
I am leaning on retaining the round robbin scheduler from the 68K asm code as it is tied to the GUI. In it each task has a stack and a window. Does however mean that I will need to learn how to manually move the stack pointer, unless I do retain the RTOS thread mechanism.

I’ve been having great success with a co-op scheduler. I use the Object Oriented version, which allows me to keep everything separated into classes and libraries, with the only configuration being the declaration (pass the scheduler instance). This has it’s tight limitations (largest processing time for each task = minimum guaranteed response time), but the overhead is so low I easily use it on ATmega or a Maple Mini.

[sheepdoll – Wed Mar 13, 2019 4:31 pm] –
There is much to learn and I am grateful for the discussions here while they exist.

Wholeheartedly agree.


Leave a Reply

Your email address will not be published. Required fields are marked *