I tried today the attachInterrupt function and I saw that this slow approx 5 to 10usec. Is there any chance to speed up to approx 200nsec?
buaus
There is some overhead because the interrupt will not call your function right away, instead it calls a function in the core, that keeps a table of what functions to call for each interrupt, and that’s going to call yours, but but still 10uS are 720 cpu cycles, seems a bit too much.
Take a look at exti.c and ext_interrupts.cpp.
I used to measure the time a routine which was setting directly a Port Pin after receiving the Interrupt.
I saw in meantime also within an other thread this topic, which confirm “weakness” of the attachInterrupt function, so I set up today in IAR workbench a C_Project. Finally I reach now approx 470ns.
buaus
That being said, I’m surprised that the response time is as slow as you mentioned.
You didn’t describe exactly how you tested this
I presume you must have an external signal generator, creating the pulses get the ISR code to set a GPIO pin and connect both the input pin and the GPIO to a logic analyser to measure the time delay
Presumably you have factored in the time to set the GPIO bit by using digitalWrite – which in its self is not that fast, or did you directly set the BSSR register after pre-defining the device and the pin data in your global setup ?
Take a look at
https://github.com/rogerclarkmelbourne/ … #L273-L292
But I presume LeafLabs had a reason to do it that way, rather than simply using a lookup table
[RogerClark – Thu Aug 24, 2017 10:37 pm] –
I looked at how this operates and I’m not surprised its slowTake a look at
https://github.com/rogerclarkmelbourne/ … #L273-L292
But I presume LeafLabs had a reason to do it that way, rather than simply using a lookup table
I believe that’s for the lines that share 1 interrupt vector for several lines.
Yes. Thats the worst case.
There do some to be some possibilities for optimisation, depending if the compiler has already done the optimisation…
If all the handlers were initially set to go to a dummy function, there would be no need to check whether the handler was valid
This would save 1 “if” function, and “if”s seem to be where ARM code gets slowed down
However when EXTIs share an ISR, its much slower at around 1679nS on my Maple Mini
If I modify the code and add a dummyHandler() which is the default for all EXTI’s, the code can be changed to remove one “if” statement and also one variable declaration, and this reduces the time by around 100nS
If I turn up the optimisation to -O3 the time reduces to 958 nS
-O3 with LTO makes things slower
Note. I get warnings when compiling on -O3 which probably need to be investigated – but this is nothing to do with the interrupt latency
I think things could be speeded up a bit more my unwinding the for loop, as the dispatch_extis function is only ever called from 2 places
void __irq_exti9_5(void) {
dispatch_extis(5, 9);
}
void __irq_exti15_10(void) {
dispatch_extis(10, 15);
}
the code could be moved to the calling function, and the bit patterns could be pre-calculated, so that this shift and store could be removed
uint32 eb = (1U << exti);
However, I don’t think we would get a massive speed improvement, so its probably not worth the increase in code size
INTR Latency MIN=3262ns MAX=26205ns AVER=3343ns
INTR Latency MIN=3262ns MAX=26205ns AVER=3343ns
..I wonder what is making it so slow
[RogerClark – Thu Aug 24, 2017 11:56 pm] –
If all the handlers were initially set to go to a dummy function, there would be no need to check whether the handler was validThis would save 1 “if” function, and “if”s seem to be where ARM code gets slowed down
Additionally, the passed pointer to arguments seems to be unused, so it could be removed (exti_channels[exti].arg = not needed).
See here: NULL parameter passed: https://github.com/rogerclarkmelbourne/ … exti.c#L94
So the table would reduce to one column containing only the void function pointers.
This would save RAM, FLASH and run-time.
EDIT
Maybe using a while loop instead of a for loop would bring small speed increase, when using the second parameter the number of extis to check instead of the end exti.
for example: instead of
dispatch_extis(5, 9); // start and end EXTI number
...
/* Dispatch user handlers for pending EXTIs. */
for (exti = start; exti <= stop; exti++) {
uint32 eb = (1U << exti);
if (pr & eb) {
voidArgumentFuncPtr handler = exti_channels[exti].handler;
if (handler) {
handler(exti_channels[exti].arg);
handled_msk |= eb;
}
}
}
I’m having a hard time understanding the improvement from this line:
exti_channels[start].handler(); // void, should be set to dummy if not usedThe handler function is never null, unless you call attach interrupt and pass null.
The change I was testing sets all handles to the dummy handler function, in the global init, and also in the detachInterrpt.
I agree that if the board is getting interrupts on pins which you don’t expect, it will slow down a bit because of the call overhead to call the dummy handler, but even with the handler set to null and the extra if statement to check for that… it would slow the code down a hell of a lot. So I think we would have noticed if interrupts were enabled on pins that we had not expected them to be enabled upon.
Steve.
yes, if the argument that is passed to the handler is always null we may as well remove it
[victor_pv – Mon Aug 28, 2017 3:04 pm] –
We could set it to a function containing only return, but then for every line that has that dummy handler, it would still call that empty function that only returns, and that i believe could bring down the speed as the MCU would have to branch out to that function, then back again.
I would think that has a bigger speed penalty than checking a RAM value !=0, since it would not take the MCU out of that loop., and RAM access is at full speed.
You may be right with that.
So we can keep the check for NULL:
/* Dispatch user handlers for pending EXTIs. */
while (nr_entis--) {
uint32 eb = (1U << start);
if (pr & eb) {
handled_msk |= eb; // moved one line up
register voidFuncPtr handler = exti_channels[exti].handler;
if (handler) {
handler();
}
}
start++;
}
static exti_channel exti_channels[] = {
{ .handler = NULL, .arg = NULL }, // EXTI0
{ .handler = NULL, .arg = NULL }, // EXTI1
{ .handler = NULL, .arg = NULL }, // EXTI2
{ .handler = NULL, .arg = NULL }, // EXTI3
{ .handler = NULL, .arg = NULL }, // EXTI4
{ .handler = NULL, .arg = NULL }, // EXTI5
{ .handler = NULL, .arg = NULL }, // EXTI6
{ .handler = NULL, .arg = NULL }, // EXTI7
{ .handler = NULL, .arg = NULL }, // EXTI8
{ .handler = NULL, .arg = NULL }, // EXTI9
{ .handler = NULL, .arg = NULL }, // EXTI10
{ .handler = NULL, .arg = NULL }, // EXTI11
{ .handler = NULL, .arg = NULL }, // EXTI12
{ .handler = NULL, .arg = NULL }, // EXTI13
{ .handler = NULL, .arg = NULL }, // EXTI14
{ .handler = NULL, .arg = NULL }, // EXTI15
};
So it does not make too much sense to avoid the NULL pointer check by calling a dummy function (and to return from that).
I think the check the function pointer for NULL is safer when done within the ISR, to ensure that only valid functions will be executed. But your version (to check for NULL at registering time) is also feasible.
Anyway, I think we can agree on removing the ARG member from the array.
Perhaps leaflabs wanted to add some functionality and did not finish, or instead they were removing functionality and forgot to take that out?
About the branching to Dummy, I think Steve got my point, that in the F103 due to flash wait states jumping and returning to anything further than a couple of instructions ahead takes much longer than checking 1 RAM variable and then either continue execution with no penalty or do the jumping/returning.
EDIT: I see the arg is actually used. It is passed as parameter to the handler:
https://github.com/rogerclarkmelbourne/ … xti.c#L284
From what I understand, it’s intended so allow for a single user ISR to manage all ESXTI interrupt callbacks, so it gets the arg as a parameter when called back.
So I can register the same function to be called, but for line 1 the arg parameter could be a 1, and for line 2 arg could be a 2, so when my function is called I know which line is causing the call.
Of course I could write 2 different ISRs, but perhaps the function has to do exactly the same process, so writing twice the same function is a waste.
[victor_pv – Tue Aug 29, 2017 3:54 pm] –
So I can register the same function to be called, but for line 1 the arg parameter could be a 1, and for line 2 arg could be a 2, so when my function is called I know which line is causing the call.
Of course I could write 2 different ISRs, but perhaps the function has to do exactly the same process, so writing twice the same function is a waste.
I see here a kind of overkill: once you have different functions for different pins, why do you need extra different arguments?
Of course, it is possible to use same function for different pins, but it makes not much sense. Even though, you could manage that use case by defining Func1(), Func2(), Func3() each of them calling Func0(param) with different parameters, thus having minimal overhead, but speedy execution.
But if you remove it, just make sure we don’t see that being used in any library.
I doubt anyone has used this feature, so I think it would be fairly safe to remove it.
But I still don’t see the problem with removing the null pointer check before calling the handler, as it is far more efficient to put that check in the code that sets up the handler than to check it every time there is an interrupt
I think checking for Null should not cost many cycles, and calling any function, even a dummy one, may cost more, but who knows what the compiler will do when optimizing.
I’ve not tried removing the argument passing
I’ve also not ran the dhrystone test with this change to see if it would impact overall performance, but I can’t see how it could do that, unless there is some other bug in the code where an interrupt is constantly getting triggered on a null handler
[RogerClark – Tue Aug 29, 2017 10:30 pm] –
I checked the official API and the handler does not receive an argument.I doubt anyone has used this feature, so I think it would be fairly safe to remove it.
FYI: here is the official Arduino PR for this feature: https://github.com/arduino/Arduino/pull/4519
The context dependent parameter passing, when needed, can be replaced by implementing the CB function in a different way.
Lets leave this.
The Arduino API is designed for ease of use at the expense of speed and memory etc.
So what we now have works OK, and if someone needs a faster implementation they can play with the vector table directly etc



