Inline Assembly

simonf
Wed Jun 15, 2016 9:05 pm
Anyone got any examples of inline assembly in stmduino32?

Preferably passing an int and returning an int.


RogerClark
Wed Jun 15, 2016 9:24 pm
I googled and found this

http://www.ethernut.de/en/documents/arm-inline-asm.html

The core has some imline assembly, but its just nop instructions to act as delays.

There is also an active thread with some FFT code in an assembler file, you could look at, but its not documented, so its hard to read and probably far to complicated.

BTW. What are you going to code in assembler ?


simonf
Wed Jun 15, 2016 10:34 pm
RogerClark wrote:I googled and found this

http://www.ethernut.de/en/documents/arm-inline-asm.html

The core has some imline assembly, but its just nop instructions to act as delays.

There is also an active thread with some FFT code in an assembler file, you could look at, but its not documented, so its hard to read and probably far to complicated.

BTW. What are you going to code in assembler ?


RogerClark
Wed Jun 15, 2016 11:12 pm
I just tried the example from that page and it works well

/* Example of calling an assmebler function from C, based on
* code examples by Harald Kipp
* http://www.ethernut.de/en/documents/arm-inline-asm.html
*/
unsigned long ByteSwap(unsigned long val)
{
asm volatile (
"eor r3, %1, %1, ror #16\n\t"
"bic r3, r3, #0x00FF0000\n\t"
"mov %0, %1, ror #8\n\t"
"eor %0, %0, r3, lsr #8"
: "=r" (val)
: "0"(val)
: "r3"
);
return val;
}

void setup() {
// put your setup code here, to run once:
Serial.begin(115200);
}

void loop() {
// put your main code here, to run repeatedly:
Serial.println(ByteSwap((unsigned long)0x12345678),HEX);
delay(1000);
}


RogerClark
Thu Jun 16, 2016 12:17 am
Actually, looking at the code, and how the compiler handles unsigned long and unsigned int, and unsigned long is 4 bytes, which is the same as unsigned int.

So I’ve changed the code to use unsigned int, and it still works fine

/* Example of calling an assmebler function from C, based on
* code examples by Harald Kipp
* http://www.ethernut.de/en/documents/arm-inline-asm.html
*/
unsigned long ByteSwap(unsigned int val)
{
asm volatile (
"eor r3, %1, %1, ror #16\n\t"
"bic r3, r3, #0x00FF0000\n\t"
"mov %0, %1, ror #8\n\t"
"eor %0, %0, r3, lsr #8"
: "=r" (val)
: "0"(val)
: "r3"
);
return val;
}

void setup() {
// put your setup code here, to run once:
Serial.begin(115200);
}

void loop() {
// put your main code here, to run repeatedly:
Serial.println(ByteSwap((unsigned int)0x12345678),HEX);
delay(1000);
}


zmemw16
Thu Jun 16, 2016 10:44 am
in that code snippet, this link is to a fairly comprehensive examples page, including multiple variables and register usage.

http://www.ethernut.de/en/documents/arm-inline-asm.html

and a link at the bottom points to the gcc suite documentation, including historial
i’ve already pulled 4.8.5. now about reading it :D

http://gcc.gnu.org/onlinedocs/

stephen


RogerClark
Thu Jun 16, 2016 11:30 am
zmemw16 wrote:in that code snippet, this link is to a fairly comprehensive examples page, including multiple variables and register usage.
http://www.ethernut.de/en/documents/arm-inline-asm.html
stephen

zmemw16
Thu Jun 16, 2016 12:28 pm
i was trying to point out that there were examples further down the page expanding on the specific elements you mentioned in your post, also so the OP would know that when he went there he’d it very useful.

stephen


RogerClark
Thu Jun 16, 2016 9:30 pm
OK

I hoped the OP would read the whole page.

I mainly just wanted to test whether the simple example would indeed work with the STM32 and the Arduino IDE.

I will add the example to the repo, in the dummy library that contains the examples


simonf
Fri Jun 17, 2016 1:09 am
Well I got some time to myself, Not for a good reason attack of the killer kidney stones.

Got to try some arm assembly just reproduced something I did 38 years ago in 6502 assembler

A square rout routine

unsigned long MYSQROOT(unsigned long val)
{
asm volatile (

" ldr r3, =0x8000 \n\t"
" mov r2, r3 \n\t"
"Loop2: mul r1, r3, r3 \n\t"
" cmp r0, r1 \n\t"
" beq Done2 \n\t"
" bgt Cont2 \n\t"
" sub r3,r3,r2 \n\t"
"Cont2: asrs r2, #1 \n\t"
" cbz r2,Done2 \n\t"
" add r3,r3,r2 \n\t"
" b Loop2 \n\t"
"Done2: mov r0, r3 \n\t"
" bx lr \n\t"
)
;
return val;
};


RogerClark
Fri Jun 17, 2016 3:55 am
Thanks

I think it that sqrt func would be useful with the FFT code that is in another thread.

Well, I suppose in that case it would be better if the FFT assembler just returned the pythag result, rather than returning real and imaginary values, but its someone else’s project and goes fast enough already ;-)


simonf
Tue Jul 26, 2016 2:06 am
Well I gave it another crack and managed to get a 32 bit square root down to 2uS ARM assembly takes some finessing. relative branches are much quicker than absolute but can only go forward unless I am reading it wrong.

unsigned long MYSQROOT2(unsigned long val)
{
asm volatile (

" clz r1,r0 \n\t" // Calculat the number of leading zeros
" ldr r3, =0x0021 \n\t" // Subtact it from 33
" subs r1, r3, r1 \n\t" // number of bits now in in r1
" asrs r1, #1 \n\t" // dev by 2
" ldr r2, =0x0001 \n\t" // Put 1 in r2
" lsls r2, r1 \n\t" // shift it left r1 times to setup the start mask *2
" ldr r3, =0x0000 \n\t" // Clear r3 the current SQR estimate.
"Loop3: asrs r2, #1 \n\t" // Rotate the mask 1 bit right
" cbz r2,Done3 \n\t" // If the mask is zero were done
" eors r3,r3,r2 \n\t" // Add the mask bit to r3
" mul r1, r3, r3 \n\t" // Square r1 to r3
" cmp r0, r1 \n\t" // Compare the Target r0 with the estimate r1
" it lt \n\t" // If Less Than
" eorlt r3,r3,r2 \n\t" // The estimate was too high remove the last bit
" b Loop3 \n\t" // Loop back and do the next map bit
"Done3: mov r0, r3 \n\t" // Move the result to r0
" bx lr \n\t" // Return

/*
r0 Contains the Source number
r1 Contains the square of r3
r2 Contains the add mask
r3 contains the latest SQR estimate.
*/

)
;
return val;
};


Leave a Reply

Your email address will not be published. Required fields are marked *