[BY DESIGN] Slow response to external ISR when using AttachInterrupt

buaus
Wed Aug 23, 2017 4:40 pm
Hi,
I tried today the attachInterrupt function and I saw that this slow approx 5 to 10usec. Is there any chance to speed up to approx 200nsec?
buaus

Pito
Wed Aug 23, 2017 5:10 pm
How did you measure the 5-10us interrupt latency?

victor_pv
Wed Aug 23, 2017 8:46 pm
Is that the latency you measure between driving a pin to a certain level and your function being called? that seems like a lot.
There is some overhead because the interrupt will not call your function right away, instead it calls a function in the core, that keeps a table of what functions to call for each interrupt, and that’s going to call yours, but but still 10uS are 720 cpu cycles, seems a bit too much.

peekay123
Thu Aug 24, 2017 6:30 pm
Well, kind of. For external interrupts that share a common IRQ (exti9_5 and exti15_10), the firmware will scan each of the interrupt pending bits (of the shared IRQ register) and, if defined, call the ISR pointed to in the vector table. If multiple ISRs fire at the same time or the interrupt pending bit is sampled later in the sampling loop then there will be delays in calling that specific ISR (scanning delay + time to service other ISR). This may or may not explain the 5us to 10us latency in your situation.

Take a look at exti.c and ext_interrupts.cpp.


buaus
Thu Aug 24, 2017 7:26 pm
Mandy thanks for the many helpfull comments.
I used to measure the time a routine which was setting directly a Port Pin after receiving the Interrupt.
I saw in meantime also within an other thread this topic, which confirm “weakness” of the attachInterrupt function, so I set up today in IAR workbench a C_Project. Finally I reach now approx 470ns.��
buaus

RogerClark
Thu Aug 24, 2017 10:07 pm
The Arduino API is not written to be fast, so if you want quick response times, you would need to do more low level programming.

That being said, I’m surprised that the response time is as slow as you mentioned.

You didn’t describe exactly how you tested this

I presume you must have an external signal generator, creating the pulses get the ISR code to set a GPIO pin and connect both the input pin and the GPIO to a logic analyser to measure the time delay

Presumably you have factored in the time to set the GPIO bit by using digitalWrite – which in its self is not that fast, or did you directly set the BSSR register after pre-defining the device and the pin data in your global setup ?


RogerClark
Thu Aug 24, 2017 10:37 pm
I looked at how this operates and I’m not surprised its slow

Take a look at

https://github.com/rogerclarkmelbourne/ … #L273-L292

But I presume LeafLabs had a reason to do it that way, rather than simply using a lookup table


victor_pv
Thu Aug 24, 2017 10:41 pm
[RogerClark – Thu Aug 24, 2017 10:37 pm] –
I looked at how this operates and I’m not surprised its slow

Take a look at

https://github.com/rogerclarkmelbourne/ … #L273-L292

But I presume LeafLabs had a reason to do it that way, rather than simply using a lookup table

I believe that’s for the lines that share 1 interrupt vector for several lines.


RogerClark
Thu Aug 24, 2017 11:56 pm
Victor.

Yes. Thats the worst case.

There do some to be some possibilities for optimisation, depending if the compiler has already done the optimisation…

If all the handlers were initially set to go to a dummy function, there would be no need to check whether the handler was valid

This would save 1 “if” function, and “if”s seem to be where ARM code gets slowed down


Pito
Fri Aug 25, 2017 8:47 am
FYI:
http://www.stm32duino.com/viewtopic.php?f=18&t=2501

RogerClark
Fri Aug 25, 2017 11:55 am
Using Pito’s test program, I tried 2 different sets of pins, and for pins where the EXTIs do not share an IRQ, the response time was 500nS, which is almost good as @buaus gets using IAR

However when EXTIs share an ISR, its much slower at around 1679nS on my Maple Mini

If I modify the code and add a dummyHandler() which is the default for all EXTI’s, the code can be changed to remove one “if” statement and also one variable declaration, and this reduces the time by around 100nS

If I turn up the optimisation to -O3 the time reduces to 958 nS

-O3 with LTO makes things slower

Note. I get warnings when compiling on -O3 which probably need to be investigated – but this is nothing to do with the interrupt latency

I think things could be speeded up a bit more my unwinding the for loop, as the dispatch_extis function is only ever called from 2 places

void __irq_exti9_5(void) {
dispatch_extis(5, 9);
}

void __irq_exti15_10(void) {
dispatch_extis(10, 15);
}

the code could be moved to the calling function, and the bit patterns could be pre-calculated, so that this shift and store could be removed

uint32 eb = (1U << exti);

However, I don’t think we would get a massive speed improvement, so its probably not worth the increase in code size


Pito
Fri Aug 25, 2017 12:42 pm
With Pito’s code and STM32Generic (SerialUSB, default opt, MapleMini) after a minute:
INTR Latency MIN=3262ns MAX=26205ns AVER=3343ns
INTR Latency MIN=3262ns MAX=26205ns AVER=3343ns
..

RogerClark
Fri Aug 25, 2017 12:44 pm
Ummm

I wonder what is making it so slow


stevestrong
Mon Aug 28, 2017 2:17 pm
[RogerClark – Thu Aug 24, 2017 11:56 pm] –
If all the handlers were initially set to go to a dummy function, there would be no need to check whether the handler was valid

This would save 1 “if” function, and “if”s seem to be where ARM code gets slowed down

Additionally, the passed pointer to arguments seems to be unused, so it could be removed (exti_channels[exti].arg = not needed).
See here: NULL parameter passed: https://github.com/rogerclarkmelbourne/ … exti.c#L94
So the table would reduce to one column containing only the void function pointers.
This would save RAM, FLASH and run-time.

EDIT
Maybe using a while loop instead of a for loop would bring small speed increase, when using the second parameter the number of extis to check instead of the end exti.
for example: instead of
dispatch_extis(5, 9); // start and end EXTI number
...
/* Dispatch user handlers for pending EXTIs. */
for (exti = start; exti <= stop; exti++) {
uint32 eb = (1U << exti);
if (pr & eb) {
voidArgumentFuncPtr handler = exti_channels[exti].handler;
if (handler) {
handler(exti_channels[exti].arg);
handled_msk |= eb;
}
}
}


victor_pv
Mon Aug 28, 2017 3:04 pm
Steve,
I’m having a hard time understanding the improvement from this line:
exti_channels[start].handler(); // void, should be set to dummy if not used

RogerClark
Mon Aug 28, 2017 10:31 pm
Victor

The handler function is never null, unless you call attach interrupt and pass null.

The change I was testing sets all handles to the dummy handler function, in the global init, and also in the detachInterrpt.

I agree that if the board is getting interrupts on pins which you don’t expect, it will slow down a bit because of the call overhead to call the dummy handler, but even with the handler set to null and the extra if statement to check for that… it would slow the code down a hell of a lot. So I think we would have noticed if interrupts were enabled on pins that we had not expected them to be enabled upon.

Steve.

yes, if the argument that is passed to the handler is always null we may as well remove it


stevestrong
Tue Aug 29, 2017 8:36 am
[victor_pv – Mon Aug 28, 2017 3:04 pm] –
We could set it to a function containing only return, but then for every line that has that dummy handler, it would still call that empty function that only returns, and that i believe could bring down the speed as the MCU would have to branch out to that function, then back again.
I would think that has a bigger speed penalty than checking a RAM value !=0, since it would not take the MCU out of that loop., and RAM access is at full speed.

You may be right with that.
So we can keep the check for NULL:
/* Dispatch user handlers for pending EXTIs. */
while (nr_entis--) {
uint32 eb = (1U << start);
if (pr & eb) {
handled_msk |= eb; // moved one line up
register voidFuncPtr handler = exti_channels[exti].handler;
if (handler) {
handler();
}
}
start++;
}


RogerClark
Tue Aug 29, 2017 10:35 am
But you can initialise the handlers to dummyHandler() here

static exti_channel exti_channels[] = {
{ .handler = NULL, .arg = NULL }, // EXTI0
{ .handler = NULL, .arg = NULL }, // EXTI1
{ .handler = NULL, .arg = NULL }, // EXTI2
{ .handler = NULL, .arg = NULL }, // EXTI3
{ .handler = NULL, .arg = NULL }, // EXTI4
{ .handler = NULL, .arg = NULL }, // EXTI5
{ .handler = NULL, .arg = NULL }, // EXTI6
{ .handler = NULL, .arg = NULL }, // EXTI7
{ .handler = NULL, .arg = NULL }, // EXTI8
{ .handler = NULL, .arg = NULL }, // EXTI9
{ .handler = NULL, .arg = NULL }, // EXTI10
{ .handler = NULL, .arg = NULL }, // EXTI11
{ .handler = NULL, .arg = NULL }, // EXTI12
{ .handler = NULL, .arg = NULL }, // EXTI13
{ .handler = NULL, .arg = NULL }, // EXTI14
{ .handler = NULL, .arg = NULL }, // EXTI15
};


stevestrong
Tue Aug 29, 2017 11:34 am
Roger, the issue is that to call the dummy function and to return from that takes more time than just checking the function pointer for NULL.
So it does not make too much sense to avoid the NULL pointer check by calling a dummy function (and to return from that).

I think the check the function pointer for NULL is safer when done within the ISR, to ensure that only valid functions will be executed. But your version (to check for NULL at registering time) is also feasible.

Anyway, I think we can agree on removing the ARG member from the array.


victor_pv
Tue Aug 29, 2017 3:54 pm
If the arg argument is not used, then yeah let’s remove it. Any idea why was it there in the first place?
Perhaps leaflabs wanted to add some functionality and did not finish, or instead they were removing functionality and forgot to take that out?

About the branching to Dummy, I think Steve got my point, that in the F103 due to flash wait states jumping and returning to anything further than a couple of instructions ahead takes much longer than checking 1 RAM variable and then either continue execution with no penalty or do the jumping/returning.

EDIT: I see the arg is actually used. It is passed as parameter to the handler:
https://github.com/rogerclarkmelbourne/ … xti.c#L284

From what I understand, it’s intended so allow for a single user ISR to manage all ESXTI interrupt callbacks, so it gets the arg as a parameter when called back.
So I can register the same function to be called, but for line 1 the arg parameter could be a 1, and for line 2 arg could be a 2, so when my function is called I know which line is causing the call.
Of course I could write 2 different ISRs, but perhaps the function has to do exactly the same process, so writing twice the same function is a waste.


stevestrong
Tue Aug 29, 2017 4:19 pm
[victor_pv – Tue Aug 29, 2017 3:54 pm] –
So I can register the same function to be called, but for line 1 the arg parameter could be a 1, and for line 2 arg could be a 2, so when my function is called I know which line is causing the call.
Of course I could write 2 different ISRs, but perhaps the function has to do exactly the same process, so writing twice the same function is a waste.

I see here a kind of overkill: once you have different functions for different pins, why do you need extra different arguments?
Of course, it is possible to use same function for different pins, but it makes not much sense. Even though, you could manage that use case by defining Func1(), Func2(), Func3() each of them calling Func0(param) with different parameters, thus having minimal overhead, but speedy execution.


victor_pv
Tue Aug 29, 2017 7:42 pm
Yeah I guess it could be done that way too. I just think that may have been the intention for leaflabs when they added that argument, or perhaps to pass a class pointer since the argument is a pointer, so each ping could have a pointer to a class, and a pointer to a function within that class.

But if you remove it, just make sure we don’t see that being used in any library.


RogerClark
Tue Aug 29, 2017 10:30 pm
I checked the official API and the handler does not receive an argument.

I doubt anyone has used this feature, so I think it would be fairly safe to remove it.

But I still don’t see the problem with removing the null pointer check before calling the handler, as it is far more efficient to put that check in the code that sets up the handler than to check it every time there is an interrupt


victor_pv
Thu Aug 31, 2017 2:46 pm
Perhaps run a test with both options and compare code size, ram used, and latency?
I think checking for Null should not cost many cycles, and calling any function, even a dummy one, may cost more, but who knows what the compiler will do when optimizing.

RogerClark
Fri Sep 01, 2017 12:24 am
I ran Pitos test and removed the check for the null hander as as expected its a bit faster, by around 75nS I think.

I’ve not tried removing the argument passing

I’ve also not ran the dhrystone test with this change to see if it would impact overall performance, but I can’t see how it could do that, unless there is some other bug in the code where an interrupt is constantly getting triggered on a null handler


danieleff
Fri Sep 01, 2017 9:13 am
[RogerClark – Tue Aug 29, 2017 10:30 pm] –
I checked the official API and the handler does not receive an argument.

I doubt anyone has used this feature, so I think it would be fairly safe to remove it.

FYI: here is the official Arduino PR for this feature: https://github.com/arduino/Arduino/pull/4519


stevestrong
Fri Sep 01, 2017 9:24 am
I am still not convinced that this is really useful.
The context dependent parameter passing, when needed, can be replaced by implementing the CB function in a different way.

RogerClark
Fri Sep 01, 2017 10:12 am
OK

Lets leave this.

The Arduino API is designed for ease of use at the expense of speed and memory etc.

So what we now have works OK, and if someone needs a faster implementation they can play with the vector table directly etc


Leave a Reply

Your email address will not be published. Required fields are marked *