Interrupt Latency Measurement

Pito
Fri Aug 25, 2017 8:37 am
Here is an example with measuring Interrupt Latency.
AttachInterrupt function used.
In the code the PA14 and PA15 are used for INTR_IN and SIG_OUT. You may change that.

// INTERRUPT LATENCY MEASUREMENT DEMONSTRATOR v1.2
// ATTACHINTERRUPT function used
// TESTED with MAPLE MINI and BLACK F407 (Libmaple F1 F4 and STM32GENERIC F1 F4)
// Wire PA14 with PA15 together
// Pito 8/2017
// Provided As-Is, no warranties of any kind are provided

#include "Arduino.h"

#define DWTEn() (*((uint32_t*)0xE000EDFC)) |= (1<<24)
#define CpuTicksEn() (*((uint32_t*)0xE0001000)) = 0x40000001
#define CpuTicksDis() (*((uint32_t*)0xE0001000)) = 0x40000000
#define CpuGetTicks() (*((uint32_t*)0xE0001004))

#define INTR_IN PA14 // INTERRUPT INPUT PIN
#define SIG_OUT PA15 // RANDOM JITTER RISING EDGE GENERATOR

volatile uint32_t start, stop;
uint32_t lat_max = 0;
uint32_t lat_min = 99999999;
uint32_t lat_over = 0;
uint32_t lat_digwr = 0;
uint32_t i = 0;
float lat_sum = 0.0;
float lat_aver = 0.0;

void setup() {
Serial.begin(115200);
DWTEn();
CpuTicksEn();
pinMode(INTR_IN, INPUT);
pinMode(SIG_OUT, OUTPUT);

start = CpuGetTicks(); // MEASURE THE DIGWRITE LATENCY
digitalWrite(SIG_OUT, 0);
lat_digwr = CpuGetTicks() - start;

attachInterrupt(INTR_IN, intrfun, RISING);
randomSeed(2345);
}

void loop() {

digitalWrite(SIG_OUT, 0);

// CREATING "RANDOM JITTER" BEFORE WE TRIGGER THE INTERRUPT
delayMicroseconds(1 + random(2100));

start = CpuGetTicks();
digitalWrite(SIG_OUT, 1); // RISING EDGE TRIGGERS THE INTERRUPT

delay(2); // WAITING 2ms to capture and process the INTERRUPT

uint32_t elapsed = stop - start;

elapsed = elapsed - lat_digwr; // SUBTRACT the digitalWrite() LATENCY

i++;
lat_sum += (float) elapsed;
lat_aver = lat_sum / i;

if (elapsed <= lat_min) lat_min = elapsed;
if (elapsed >= lat_max) lat_max = elapsed;

if ((float)elapsed > lat_aver) lat_over++;

// 1 CLK in ns -> 5.9524ns for F407 and 13.8888ns for F103 at stock clocks
Serial.print("INTR Latency MIN=");
Serial.print(13.88*lat_min,0);
Serial.print("ns MAX=");
Serial.print(13.88*lat_max,0);
Serial.print("ns AVER=");
Serial.print(13.88*lat_aver,0);
Serial.print("ns OVER=");
Serial.println(lat_over);

if (i > 4000000000) while(1);
}

// ISR
void intrfun() {
stop = CpuGetTicks();
}


Pito
Fri Aug 25, 2017 12:45 pm
Here the latencies with MMini, STM32Generic, SerialUSB, after a minute:
INTR Latency MIN=3262ns MAX=26205ns AVER=3343ns
INTR Latency MIN=3262ns MAX=26205ns AVER=3343ns
..

zmemw16
Fri Aug 25, 2017 12:59 pm
i;ve no grounds whatsoever for the question, but i’m always suspicious of no variation results when testing.
i suppose it’s a variation of too good to be true.
i would have expected a change in the last digit at least.

stephen


peekay123
Fri Aug 25, 2017 1:36 pm
The min/max difference is substantial. It would be nice to calculate the standard deviation to get a sense of the distribution. Or count the number of instances where the time is greater than the average.

Pito
Fri Aug 25, 2017 1:45 pm
Black F407ZE, Libmaple F4, SerialUSB
After 5minutes:
INTR Latency MIN=684ns MAX=8205ns AVER=771ns
INTR Latency MIN=684ns MAX=8205ns AVER=771ns
..

Pito
Fri Aug 25, 2017 2:21 pm
UPDATE: v1.2 – added number of overshoots (when the actual latency is bigger than the average latency).

Here with MMini, Libmaple F1, SerialUSB, default opt.
After about 7minutes
INTR Latency MIN=1693ns MAX=29217ns AVER=1743ns OVER=308
INTR Latency MIN=1693ns MAX=29217ns AVER=1743ns OVER=308
..


zmemw16
Fri Aug 25, 2017 3:19 pm
my point is that ALL the columns are the same set of numbers.
srp

victor_pv
Fri Aug 25, 2017 6:39 pm
Thanks Pito I think this is a good reference.
If you still have the F4 setup, can you add this attribute to the ISR declaration to see if running it from RAM makes it any faster?

__attribute__((section (".data")))


Pito
Fri Aug 25, 2017 8:06 pm
Black F407ZE, Libmaple F4, SerialUSB, default opt:
void intrfun() __attribute__((section (".data")));
..
void intrfun() {
stop = CpuGetTicks();
}

dannyf
Fri Aug 25, 2017 9:04 pm
this is the code piece I typically use to benchmark codes.

for (i=0; i<CNT; i++) {
tick0 = ticks(); //time stamp ticks0
//benchmark here
//28 ticks without anything here
//PORTB; //29 ticks -> 1tick execution
_INT0IF = 1; //fire the eint0 isr - 48 ticks
//tick1 = ticks() - tick0; //time elapsed
tmp_ticks[i]=tick1; //save the data
TMR1 = 0; //reset the timer counter, to minimize timer isr's impact
}


RogerClark
Fri Aug 25, 2017 9:45 pm
There are some minor improvements I can make to LibMaple to speed up the ISR response time, but I am not sure if I could achieve more than a 25% speed improvement, and only on the shared ISR code.

It would make the code a bit bigger as I would need to unroll the loops etc


victor_pv
Sat Aug 26, 2017 3:22 am
Thanks for testing Pito, seems like it brings down the average a tiny bit ut at the cost of much higher maximum.
What about rising priority for those interrupts? perhaps that brings the max down. I’m just guessing those that take so long are because some other interrupt, for systick or usb or something else, is interrupting them.

Pito
Sat Aug 26, 2017 8:33 am
[victor_pv – Sat Aug 26, 2017 3:22 am] –
..seems like it brings down the average a tiny bit ut at the cost of much higher maximum..

The MAX depends on for how long the measurement runs, as the probability to catch a larger latency grows with time.
You get a good MAX after an hour or so.. :)
What you get fast is the MIN.


dannyf
Sat Aug 26, 2017 11:22 am
I ran the following:

for (i=0; i<CNT; i++) {
tick0=ticks(); //time stamp time0
//benchmark here
//20 ticks for doing nothing / consecutive read of coreticks()
//LED_PORT; //45 ticks total, 1 tick variation -> 25 ticks net
P1IFG |= (1<<0); //fire the isr. 85 ticks total, 1 tick variation -> 65 ticks net
//tick1=ticks() - tick0; //calculate time elapsed
tmp_ticks[i]=tick1; //save time measurement
}


Pito
Sat Aug 26, 2017 11:29 am
@dannyf: the 8/16bitters like pic24 (and all picXX except pic32) and 8/16bit avrs and msp’s have got easy to predict number of clocks per instruction.
Thus we constructed clock precise loops, delays, toggling the pins, etc. with it.

ARM (CM3, CM4) and MIPS (ie. pic32) are not such easy to predict stuff as they use ie. multistage pipelines, out of order instruction processing, several internal buses/bus_matrix for cpu/sram/flash/periph, DSP/FPU units, etc.

On Interrupt latency in ARM CM3/4:

https://community.arm.com/processors/b/ … processors

http://infocenter.arm.com/help/index.js … 16366.html


Pito
Sat Aug 26, 2017 12:18 pm
This is with BluePill, older Libmaple F1 (>1y), SerialUSB, wired with:
#define INTR_IN PB3 // INTERRUPT INPUT PIN
#define SIG_OUT PA15 // RANDOM RISING EDGE

victor_pv
Sat Aug 26, 2017 12:45 pm
To compare apples to apples you have to use a lower pin for the IN in the newer libmaple too, since only the lines 0 to 4 have their own interrupt vector.
From 5 up, they share vector among several lines, that’s the code Roger posted above and he commented that can be optimized to some degree.
Looks like the first tests you ran on libmaple were on PA14 and 15, so that’s the longer path.
BTW not sure if possible, but I think it is: to set the interrupt on toggle in a pin, and then toggle that same pin to trip the interrupt.

dannyf
Sat Aug 26, 2017 3:00 pm
just to add what I wrote earlier.

the following code:

for (i=0; i<CNT; i++) {
tick0=ticks(); //time stamp
//benchmark here
//nothing here / base line -> 17 ticks
//tmp=LEDG_PORT; //31 ticks
EXTI->SWIER |= (1<<0); //fire the isr - 76 ticks gross -> 59 ticks net
//tick1=ticks() - tick0; //calculate time elapsed
tmp_ticks[i]=tick1;
}


victor_pv
Sat Aug 26, 2017 4:15 pm
[dannyf – Sat Aug 26, 2017 3:00 pm] –
on a stm32f103c8t6, yielded a calculated isr latency of 59 ticks, zero variability (there is nothing else on that chip running).

That’s the difference, there are other things running in the libmaple and Generic cores when PIto run his tests. For one, the USB TX and RX which use interrupts, then there is the systick timer, using interrupts also.
Then the EXTI interrupt handling is done by the core, which reads some things form flash and/or RAM, among others a pointer to the the function that saves the stop time.
So when the interrupt triggers the path the CPU takes is very different than in your code, and then it can get interrupted multiple times for multiple reasons, even between the moment the start time is measured and the moment in which the GPIO register is written to set the pin level and trigger the interrupt.


Pito
Sat Aug 26, 2017 5:04 pm
This is with an “interrupt latencies histogram”, the latency bins are 1000ns wide, 31 bins from 0..31000ns.
After about 150k interrupts (BPill as above, Libmaple, INT_IN PB3) – the majority of latencies fall into first 0..1000ns bin, then into the second bin 1000..2000ns.
There is a latency peak in the bin 4000..5000ns, and in the bin 28000..29000ns.
INTR Latency MIN=472ns MAX=28954ns AVER=492ns OVER=407
150286
193
16
4
58
8
1
0
0
0
0
0
0
0
0
2
0
4
9
6
5
1
3
4
4
3
0
3
55
0
0

victor_pv
Sun Aug 27, 2017 3:16 pm
I bet those are due to the USB port interrupts, the USB code takes a really long path in any of the cores.
Did you try rising the EXTI interrupt priority over most of everything else?

Pito
Sun Aug 27, 2017 3:27 pm
Yep, it seems those larger latencies come from USB. Here is BPill (INT_IN PB3) under STM32GENERIC, while printing results every 1000th interrupt:
INTR Latency MIN=986ns MAX=26486ns AVER=991ns OVER=532
1 263468
2 256
3 0
4 274
5 1
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 1
28 0
29 0
30 0
31 0
32 0
33 0
34 0
35 0

victor_pv
Sun Aug 27, 2017 6:51 pm
This libmaple function should do it:
https://github.com/leaflabs/libmaple/bl … nvic.c#L48

The interrupt lines are listed here:
https://github.com/leaflabs/libmaple/bl … nvic.h#L46

You can use the number, but is probably easier to read the code if you use those names. I believe at same priority setting, the one with the lower ID has priority. I.e. By default EXTI0 to 4 should have higher priority than USB, but EXTI9-5 and 15-10 would have a lower priority.

You can test that by using a pin from 0 to 4 in any port as the input.

But still the code before the interrupt is triggered can be interrupted (the digitalWrite part, etc), so the best way to measure without the other interrupts affecting is possibly to disable USB interrupts from before reading the start time, to after the delay. I believe any pending interrupt should be serviced as soon as they are enabled again, but perhaps we have to do something else than disable them, perhaps some masking, but to be honest I don’t remember the details on how the NVIC works.
I remember if you disable interrupts completely, like Steve is doing in the SPI code, they get services as soon as enabled again, but that is a global disable, and if you used it, the pin interrupt would not be serviced either.


Pito
Mon Aug 28, 2017 8:12 pm
For fun I’ve done following experiment: BPill as an external 1mil pulses generator (500us period incl. 100us jitter) and MapleM counting pulses (rising edges) via its interrupt (INT_IN PB3, Libmaple), while measuring the interrupt latency – latency capture via polling the flag
.. while (flag != 1) start = CpuGetTicks();
flag = 0;
..

dannyf
Tue Aug 29, 2017 8:26 pm
that approach under-counts the latency, by the execution of CpuGetTicks().

Leave a Reply

Your email address will not be published. Required fields are marked *