AttachInterrupt function used.
In the code the PA14 and PA15 are used for INTR_IN and SIG_OUT. You may change that.
// INTERRUPT LATENCY MEASUREMENT DEMONSTRATOR v1.2
// ATTACHINTERRUPT function used
// TESTED with MAPLE MINI and BLACK F407 (Libmaple F1 F4 and STM32GENERIC F1 F4)
// Wire PA14 with PA15 together
// Pito 8/2017
// Provided As-Is, no warranties of any kind are provided
#include "Arduino.h"
#define DWTEn() (*((uint32_t*)0xE000EDFC)) |= (1<<24)
#define CpuTicksEn() (*((uint32_t*)0xE0001000)) = 0x40000001
#define CpuTicksDis() (*((uint32_t*)0xE0001000)) = 0x40000000
#define CpuGetTicks() (*((uint32_t*)0xE0001004))
#define INTR_IN PA14 // INTERRUPT INPUT PIN
#define SIG_OUT PA15 // RANDOM JITTER RISING EDGE GENERATOR
volatile uint32_t start, stop;
uint32_t lat_max = 0;
uint32_t lat_min = 99999999;
uint32_t lat_over = 0;
uint32_t lat_digwr = 0;
uint32_t i = 0;
float lat_sum = 0.0;
float lat_aver = 0.0;
void setup() {
Serial.begin(115200);
DWTEn();
CpuTicksEn();
pinMode(INTR_IN, INPUT);
pinMode(SIG_OUT, OUTPUT);
start = CpuGetTicks(); // MEASURE THE DIGWRITE LATENCY
digitalWrite(SIG_OUT, 0);
lat_digwr = CpuGetTicks() - start;
attachInterrupt(INTR_IN, intrfun, RISING);
randomSeed(2345);
}
void loop() {
digitalWrite(SIG_OUT, 0);
// CREATING "RANDOM JITTER" BEFORE WE TRIGGER THE INTERRUPT
delayMicroseconds(1 + random(2100));
start = CpuGetTicks();
digitalWrite(SIG_OUT, 1); // RISING EDGE TRIGGERS THE INTERRUPT
delay(2); // WAITING 2ms to capture and process the INTERRUPT
uint32_t elapsed = stop - start;
elapsed = elapsed - lat_digwr; // SUBTRACT the digitalWrite() LATENCY
i++;
lat_sum += (float) elapsed;
lat_aver = lat_sum / i;
if (elapsed <= lat_min) lat_min = elapsed;
if (elapsed >= lat_max) lat_max = elapsed;
if ((float)elapsed > lat_aver) lat_over++;
// 1 CLK in ns -> 5.9524ns for F407 and 13.8888ns for F103 at stock clocks
Serial.print("INTR Latency MIN=");
Serial.print(13.88*lat_min,0);
Serial.print("ns MAX=");
Serial.print(13.88*lat_max,0);
Serial.print("ns AVER=");
Serial.print(13.88*lat_aver,0);
Serial.print("ns OVER=");
Serial.println(lat_over);
if (i > 4000000000) while(1);
}
// ISR
void intrfun() {
stop = CpuGetTicks();
}
INTR Latency MIN=3262ns MAX=26205ns AVER=3343ns
INTR Latency MIN=3262ns MAX=26205ns AVER=3343ns
..i suppose it’s a variation of too good to be true.
i would have expected a change in the last digit at least.
stephen
After 5minutes:
INTR Latency MIN=684ns MAX=8205ns AVER=771ns
INTR Latency MIN=684ns MAX=8205ns AVER=771ns
..Here with MMini, Libmaple F1, SerialUSB, default opt.
After about 7minutes
INTR Latency MIN=1693ns MAX=29217ns AVER=1743ns OVER=308
INTR Latency MIN=1693ns MAX=29217ns AVER=1743ns OVER=308
..
srp
If you still have the F4 setup, can you add this attribute to the ISR declaration to see if running it from RAM makes it any faster?
__attribute__((section (".data")))
void intrfun() __attribute__((section (".data")));
..
void intrfun() {
stop = CpuGetTicks();
} for (i=0; i<CNT; i++) {
tick0 = ticks(); //time stamp ticks0
//benchmark here
//28 ticks without anything here
//PORTB; //29 ticks -> 1tick execution
_INT0IF = 1; //fire the eint0 isr - 48 ticks
//tick1 = ticks() - tick0; //time elapsed
tmp_ticks[i]=tick1; //save the data
TMR1 = 0; //reset the timer counter, to minimize timer isr's impact
}
It would make the code a bit bigger as I would need to unroll the loops etc
What about rising priority for those interrupts? perhaps that brings the max down. I’m just guessing those that take so long are because some other interrupt, for systick or usb or something else, is interrupting them.
[victor_pv – Sat Aug 26, 2017 3:22 am] –
..seems like it brings down the average a tiny bit ut at the cost of much higher maximum..
The MAX depends on for how long the measurement runs, as the probability to catch a larger latency grows with time.
You get a good MAX after an hour or so..
What you get fast is the MIN.
for (i=0; i<CNT; i++) {
tick0=ticks(); //time stamp time0
//benchmark here
//20 ticks for doing nothing / consecutive read of coreticks()
//LED_PORT; //45 ticks total, 1 tick variation -> 25 ticks net
P1IFG |= (1<<0); //fire the isr. 85 ticks total, 1 tick variation -> 65 ticks net
//tick1=ticks() - tick0; //calculate time elapsed
tmp_ticks[i]=tick1; //save time measurement
}
Thus we constructed clock precise loops, delays, toggling the pins, etc. with it.
ARM (CM3, CM4) and MIPS (ie. pic32) are not such easy to predict stuff as they use ie. multistage pipelines, out of order instruction processing, several internal buses/bus_matrix for cpu/sram/flash/periph, DSP/FPU units, etc.
On Interrupt latency in ARM CM3/4:
#define INTR_IN PB3 // INTERRUPT INPUT PIN
#define SIG_OUT PA15 // RANDOM RISING EDGEFrom 5 up, they share vector among several lines, that’s the code Roger posted above and he commented that can be optimized to some degree.
Looks like the first tests you ran on libmaple were on PA14 and 15, so that’s the longer path.
BTW not sure if possible, but I think it is: to set the interrupt on toggle in a pin, and then toggle that same pin to trip the interrupt.
the following code:
for (i=0; i<CNT; i++) {
tick0=ticks(); //time stamp
//benchmark here
//nothing here / base line -> 17 ticks
//tmp=LEDG_PORT; //31 ticks
EXTI->SWIER |= (1<<0); //fire the isr - 76 ticks gross -> 59 ticks net
//tick1=ticks() - tick0; //calculate time elapsed
tmp_ticks[i]=tick1;
}
[dannyf – Sat Aug 26, 2017 3:00 pm] –
on a stm32f103c8t6, yielded a calculated isr latency of 59 ticks, zero variability (there is nothing else on that chip running).
That’s the difference, there are other things running in the libmaple and Generic cores when PIto run his tests. For one, the USB TX and RX which use interrupts, then there is the systick timer, using interrupts also.
Then the EXTI interrupt handling is done by the core, which reads some things form flash and/or RAM, among others a pointer to the the function that saves the stop time.
So when the interrupt triggers the path the CPU takes is very different than in your code, and then it can get interrupted multiple times for multiple reasons, even between the moment the start time is measured and the moment in which the GPIO register is written to set the pin level and trigger the interrupt.
After about 150k interrupts (BPill as above, Libmaple, INT_IN PB3) – the majority of latencies fall into first 0..1000ns bin, then into the second bin 1000..2000ns.
There is a latency peak in the bin 4000..5000ns, and in the bin 28000..29000ns.
INTR Latency MIN=472ns MAX=28954ns AVER=492ns OVER=407
150286
193
16
4
58
8
1
0
0
0
0
0
0
0
0
2
0
4
9
6
5
1
3
4
4
3
0
3
55
0
0Did you try rising the EXTI interrupt priority over most of everything else?
INTR Latency MIN=986ns MAX=26486ns AVER=991ns OVER=532
1 263468
2 256
3 0
4 274
5 1
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 1
28 0
29 0
30 0
31 0
32 0
33 0
34 0
35 0https://github.com/leaflabs/libmaple/bl … nvic.c#L48
The interrupt lines are listed here:
https://github.com/leaflabs/libmaple/bl … nvic.h#L46
You can use the number, but is probably easier to read the code if you use those names. I believe at same priority setting, the one with the lower ID has priority. I.e. By default EXTI0 to 4 should have higher priority than USB, but EXTI9-5 and 15-10 would have a lower priority.
You can test that by using a pin from 0 to 4 in any port as the input.
But still the code before the interrupt is triggered can be interrupted (the digitalWrite part, etc), so the best way to measure without the other interrupts affecting is possibly to disable USB interrupts from before reading the start time, to after the delay. I believe any pending interrupt should be serviced as soon as they are enabled again, but perhaps we have to do something else than disable them, perhaps some masking, but to be honest I don’t remember the details on how the NVIC works.
I remember if you disable interrupts completely, like Steve is doing in the SPI code, they get services as soon as enabled again, but that is a global disable, and if you used it, the pin interrupt would not be serviced either.
.. while (flag != 1) start = CpuGetTicks();
flag = 0;
..

