micros() bug??

Pito
Wed Apr 26, 2017 4:44 pm
Continuation from http://www.stm32duino.com/viewtopic.php … 065#p27066

While running SdFat Bench demo on the Black407ZE, Daniel’s core:

Maybe a bug? While playing with SdFat I get this weird huge numbers from time to time.
The number is always the same 4294966553, it seems.
The source says it measures maxLatency via micros(), it compares and does simple addition. All uints.
Then it prints out the stuff:
cout << s/t <<',' << maxLatency << ',' << minLatency;
cout << ',' << totalLatency/n << endl;


Pito
Wed Apr 26, 2017 5:10 pm
It does mess with SdFat standard too, and small buffer 512b (printed latencies are via Serial.print()):
read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
865.00,2446,476,591
866.50,1355,476,590
867.25,1656,476,589
866.50,4294966815,476,589
866.50,1439,476,590
866.50,1380,476,590
866.50,4294966873,476,589
866.50,1367,476,590
866.50,1410,476,590
866.50,1394,476,590

Done


victor_pv
Wed Apr 26, 2017 5:44 pm
Pito wrote:It does mess with SdFat standard too, and small buffer 512b (printed latencies are via Serial.print()):
read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
865.00,2446,476,591
866.50,1355,476,590
867.25,1656,476,589
866.50,4294966815,476,589
866.50,1439,476,590
866.50,1380,476,590
866.50,4294966873,476,589
866.50,1367,476,590
866.50,1410,476,590
866.50,1394,476,590

Done


Pito
Wed Apr 26, 2017 6:09 pm
With your micros in stm32_clock.h, and printing “t” (in millis) in the first column (instead of s/t), it seems the t is ok.
inline uint32_t micros() {
// return (HAL_GetTick()*1000) + ((SysTick->LOAD - SysTick->VAL) * 1000 / SysTick->LOAD);
//return (HAL_GetTick()*1000) + (((SysTick->LOAD - SysTick->VAL) * 1000) / SysTick->LOAD);
return (HAL_GetTick()*1000) + (SysTick->VAL / (SystemCoreClock / 1000000));
}

danieleff
Wed Apr 26, 2017 6:36 pm
I belive SysTick->VAL is counting down from SysTick->LOAD to 0.
So if you use `return (HAL_GetTick()*1000) + (SysTick->VAL / (SystemCoreClock / 1000000));`
and print
Serial.println(micros());
Serial.println(micros());
delay(3000);

Pito
Wed Apr 26, 2017 6:46 pm
I am returning millis() from above micros() as a test whether the micros numbers are not overwritten somehow, repeated several times:
inline uint32_t micros() {
// return (HAL_GetTick()*1000) + ((SysTick->LOAD - SysTick->VAL) * 1000 / SysTick->LOAD);
// return (HAL_GetTick()*1000) + (((SysTick->LOAD - SysTick->VAL) * 1000) / SysTick->LOAD);
// return (HAL_GetTick()*1000) + (SysTick->VAL / (SystemCoreClock / 1000000));
return millis();
}

victor_pv
Wed Apr 26, 2017 6:51 pm
Pito wrote:With your micros in stm32_clock.h, and printing “t” (in millis) in the first column (instead of s/t), it seems the t is ok.
inline uint32_t micros() {
// return (HAL_GetTick()*1000) + ((SysTick->LOAD - SysTick->VAL) * 1000 / SysTick->LOAD);
//return (HAL_GetTick()*1000) + (((SysTick->LOAD - SysTick->VAL) * 1000) / SysTick->LOAD);
return (HAL_GetTick()*1000) + (SysTick->VAL / (SystemCoreClock / 1000000));
}

Pito
Wed Apr 26, 2017 7:08 pm
With -O0 and
inline uint32_t micros() {
return (HAL_GetTick()*1000) + ((SysTick->LOAD - SysTick->VAL) * 1000 / SysTick->LOAD);
//return (HAL_GetTick()*1000) + (((SysTick->LOAD - SysTick->VAL) * 1000) / SysTick->LOAD);
//return (HAL_GetTick()*1000) + (SysTick->VAL / (SystemCoreClock / 1000000));
}

Pito
Wed Apr 26, 2017 7:25 pm
inline uint32_t micros() {
return (HAL_GetTick()*1000) + ((SysTick->LOAD - SysTick->VAL) * 1000 / SysTick->LOAD);
//return (HAL_GetTick()*1000) + (((SysTick->LOAD - SysTick->VAL) * 1000) / SysTick->LOAD);
//return (HAL_GetTick()*1000) + (SysTick->VAL / (SystemCoreClock / 1000000));
}

Pito
Wed Apr 26, 2017 7:44 pm
Maybe this helps:
return (HAL_GetTick()*1000ULL) + ((SysTick->LOAD - SysTick->VAL) * 1000ULL / SysTick->LOAD);

victor_pv
Wed Apr 26, 2017 7:47 pm
Pito wrote:inline uint32_t micros() {
return (HAL_GetTick()*1000) + ((SysTick->LOAD - SysTick->VAL) * 1000 / SysTick->LOAD);
//return (HAL_GetTick()*1000) + (((SysTick->LOAD - SysTick->VAL) * 1000) / SysTick->LOAD);
//return (HAL_GetTick()*1000) + (SysTick->VAL / (SystemCoreClock / 1000000));
}

victor_pv
Wed Apr 26, 2017 8:02 pm
I think the best may be to break it in separate instructions with intermediate variables, so the compiler is better forced to do what want.

inline uint32_t micros() {
uint32_t tmp, tmp2;
tmp = (SysTick->LOAD - SysTick->VAL) * 1000 / SysTick->LOAD);
tmp2 = (HAL_GetTick()*1000);
return tmp+tmp2;
}


Pito
Wed Apr 26, 2017 8:55 pm
I tried, exactly the same. the splitting (several variants) into (volatile) vars a,b,c – the same.
I start thinking it is not about micros(), but something with stack or what.
When I print “m” from m=micros() I get a number, say 254 to 257. No problems. I can print it. I see it, no anomalies (I do not comment the value itself). EDIT: nope the m is compared within the loop.
Then in maxLatency (where it takes the bigger m) I see BigNumber printed.

And this is the story around maxLatency – this is done in loop, loops is n = filesize/buffersize:
m = micros() - m;
if (maxLatency < m) {
maxLatency = m;
}


Pito
Thu Apr 27, 2017 2:41 am
This worked best last evening :)
inline uint32_t micros() {
// by Pito 4/2017
uint32_t m = HAL_GetTick();
uint32_t u = SysTick->LOAD - SysTick->VAL;
uint32_t m1 = HAL_GetTick();
uint32_t u1 = SysTick->LOAD - SysTick->VAL;

if (m1 > m) {
return ( m1 * 1000 + (u1 * 1000) / SysTick->LOAD);
} else {
return ( m * 1000 + (u * 1000) / SysTick->LOAD);
}
}


Pito
Thu Apr 27, 2017 9:42 am
After a night spent with the issue – the measurements show it is a bug in the original way creating micros().
Original micros():
return (HAL_GetTick()*1000) + ((SysTick->LOAD - SysTick->VAL) * 1000 / SysTick->LOAD);

victor_pv
Thu Apr 27, 2017 1:38 pm
Pito wrote:After a night spent with the issue – the measurements show it is a bug in the original way creating micros().
Original micros():
return (HAL_GetTick()*1000) + ((SysTick->LOAD - SysTick->VAL) * 1000 / SysTick->LOAD);

Pito
Thu Apr 27, 2017 2:16 pm
I do understand my micros() implementation sounds crazy, but it works it seems.
The bug is related to the “moment” as you sample millis vs. the fractional part (metastability problem). In multistage pipeline and situation with interrupts it may happen the millis and the fractional part do not match..
Mind the occurrence of the error is ~1:xxx000 while you reads micros() 3000x per second with an external “cpu clock uncorrelated” device response time (ie an sdcard).
I printed out the fractional part – it is ok. The only source of the bug is the millis vs. fractional part.
You may create your own tests to confirm that, but my current understanding is, yes, there is a “high” probability the cpu will mess the millis vs. fractional part based on zillion factors such you get millis 1 count less as expected.

danieleff
Thu Apr 27, 2017 3:20 pm
Libmaple also does this “check if millis changed” https://github.com/rogerclarkmelbourne/ … time.h#L56

I will use your implementation if no better is found.


victor_pv
Thu Apr 27, 2017 3:33 pm
Daniel I would say at the moment Pito’s is better than anything else, so may want to push it, then if we find something better replace. His may take a bit longer to execute, but as long as it results in the right value, is much better than the risk in the current one.

About libmaple, it checks a register in the systick timer to check if it overflows after being read. (or underflow since it seems to count down)
/**
* @brief Check for underflow.
*
* This function returns 1 if the SysTick timer has counted to 0 since
* the last time it was called. However, any reads of any part of the
* SysTick Control and Status Register SYSTICK_BASE->CSR will
* interfere with this functionality. See the ARM Cortex M3 Technical
* Reference Manual for more details (e.g. Table 8-3 in revision r1p1).
*/
static inline uint32 systick_check_underflow(void) {
return SYSTICK_BASE->CSR & SYSTICK_CSR_COUNTFLAG;
}


Pito
Fri Apr 28, 2017 7:05 pm
There is another option, even sexier.. :)
There is the “special” 32bit clock counter which simply counts (up), for ever.. :)
The resolution of the counter is 5.95ns at 168MHz.
The counter overflows in 25.56secs.
So when you want to mess with nanos() it may work fine. Hopefully.. :P

Pito
Fri Apr 28, 2017 7:06 pm
There is another option, even sexier.. :)
There is the “special” 32bit cpu clock counter which simply counts (up), for ever.. :)
The resolution of the counter is 5.95ns at 168MHz (4.166ns @240MHz).
The counter overflows in 25.56secs.
So when you want to mess with micros() based on that counter, or with nanos(), it may work fine. Hopefully.. :P
http://www.stm32duino.com/viewtopic.php?f=18&t=2004
uint32_t elapsed = CpuGetTicks();
delayMicroseconds(1);
elapsed = CpuGetTicks()- elapsed;

victor_pv
Fri Apr 28, 2017 7:55 pm
That makes sense for delaymicros(), but how would you use it for micros()?

Leave a Reply

Your email address will not be published. Required fields are marked *