Question or suggestion, upload to Flash, then copy to RAM at runtime.

victor_pv

Fri May 15, 2015 3:27 pm

I was just thinking, that some people seems to need speed for some things, and as we noticed when moving PIN_MAP to flash, running from Flash is quite slower than RAM apparently.

So in that line of thiking, I thought, what if you need to run a sketch that fits in RAM (i.e. in a 64Kb device), you need all the speed you can get for whatever reason, but you need the sketch to be permanently on the board, not upload to RAM each time?
I think the solution could come in the form of a bootloader that stores the code in flash, not RAM, then on device boot up it copies that code to RAM, and start running there, like it if it was uploaded with the DFU option to upload to RAM, only it saves it permanently in Flash and copies it over to RAM each time.
That gives the advantage of the speed, while keeping the sketch permanently on the board.

Does anyone need the extra speed? I already have thought on a way to do it without much more code in the bootloader.

Rick Kimball

Fri May 15, 2015 6:24 pm

You want to use the linker and some custom startup code to copy your routines from flash to ram at reset. This post should give you an idea of what needs to happen:

https://forum.sparkfun.com/viewtopic.ph … 61&start=0

-rick

victor_pv

Fri May 15, 2015 6:56 pm

Rick, the copy to RAM part could be integrated in the DFU bootloader, just as another option, so it wouldn’t need any extra at compile time of the sketch if the bootloader manages the copy process.
My question I guess is, does this look like something useful for anyone?
It just occurred to me that should be fairly easy to add to the maple DFU bootloader, but I dont know if anyone here had found himself needing that.

Rick Kimball

Fri May 15, 2015 6:59 pm

Modify your common.inc file and add a ‘.ramtext’ section in the data area:

$ git diff -w common.inc

diff --git a/STM32F1/variants/generic_stm32f103c/ld/common.inc b/STM32F1/variants/generic_

index 0c2b6a4..ebac91d 100644

--- a/STM32F1/variants/generic_stm32f103c/ld/common.inc

+++ b/STM32F1/variants/generic_stm32f103c/ld/common.inc

@@ -140,6 +140,9 @@ SECTIONS

*(.got.plt) *(.got)

*(.data .data.* .gnu.linkonce.d.*)

+ . = ALIGN(4); + *(.ramtext) + . = ALIGN(8); __data_end__ = .; } > REGION_DATA AT> REGION_RODATA (END)

Rick Kimball

Fri May 15, 2015 7:06 pm

And yes this feature is certainly useful if you have code that you want to run with zero wait states. However, if you are bouncing back and forth between flash and ram, the veneer code overhead might wipe out any performance gains you make.

Compile up that code and disassemble it look for the ram_delay veneer function to see all the code is executed before it jumps to the ram code.

-rick

Rick Kimball

Fri May 15, 2015 7:33 pm

Sorry I think I got your question all wrong.

However, you could modify the ldscript to put only the __start__ code and the interrupt vectors into “rom” and have the rest of the code stored in the the REGION_DATA and then the linker would resolve all of the execution addresses to ram. The only veneers in this case the ctors and the call to main. Everything else would be in ram.

victor_pv

Fri May 15, 2015 9:48 pm

I was thinking on 2 possible ways. Something like that, modifying the linker script so the code goes to RAM positions. Do you think it would work and the start code would copy all that from flash to RAM like it copies the variables area? or perhaps the assembler code that needs to do the copy would need to be modified?

The other way I thought, letting the linker script like in the upload to RAM option, but then in the bootloader code, set some check, that if the reset vector
(which I believe is the second word of the upload as the first one is the stack address) is in an address in the RAM range 2xxxxxxxx address, rather than call the code right away, first copy it to RAM, then run it. That would eliminate the need to make a different linker script, but add a bit of complexity to the bootloader. Still I think it should not be to complicated, something like this should work:

if (checkUserCode(USER_CODE_FLASH0X8002000+4)) copySketchToRam (); jumpToUser(USER_CODE_RAM); }

victor_pv

Sat May 16, 2015 1:00 am

Well I just implemented pretty much as shown above, and works fine.
I can upload to RAM, upload to flash ID1 in x5000, upload to flash ID2 x2000, and upload to flash ID2 an sketch linked with the normal RAM script, and the bootloader detects it correctly and copies it from flash to RAM and run it from there.
Takes about 200bytes more than before, although I am sure that can be reduced, and is still just 7.1KB

RogerClark

Sat May 16, 2015 6:20 am

Victor,

I’m not sure how many people would want to run from RAM. I can see it could make the code run faster than in Flash, but I’m not sure how much faster.

I’ve not looked into this in detail, but the ARM processors have complicated pipelining of instructions with prefetch etc
see http://infocenter.arm.com/help/index.js … GJICF.html

So I suspect the gains you get many not be as much as just comparing flash vs ram speeds

Also.
I’m not too sure how you are working this. Are you getting the bootloader to upload to Flash, but doing a pre-copy from flash into ram each time it boots up

How does the bootloader know that the Flash contains a program that has been linked to run in RAM ? Or are you saying this only works after upload, i.e it uploads to flash and then copies to RAM?

The issue I had with the RAM based upload, was that after a RAM upload, for some reason the bootloader didn’t just jump to the RAM start location, it seemed reset the bootloader, in which case it would have no definite way to know whether the last upload was to RAM or flash

The only way I could see to do it, was to put a magic number into RAM, to indicate that the last upload was to RAM and hope that the sketch would never put the same magic number into the same location in RAM. i.e it could be made to work for 99.999% of the time, but I could not make it work 100% of the time because I’d have no way to guarantee what the sketch was going to put into the RAM (at the magic number location)

zoomx

Mon May 18, 2015 4:18 pm

Maybe this work can be done by the program itself that has a routine that copies the other part from flash to RAM and then start it.

victor_pv

Mon May 18, 2015 7:34 pm

Roger, the latest commit in my bootloader repo already included the change. Works like this:

-IDE compiled with RAM linker script (so uses RAM addresses), but uploads to ID2 (so normal upload to flash on 8002000).
Bootloader loads the code like any other ID2 to flash.
But, when the bootloader reboots and needs to check what’s in flash and RAM to run it checks like this:
1.-First check if there is a valid SP address in 20000c00. That would indicate an upload to RAM, like the original bootloader used to do. If valid SP, then runs from RAM.
2.-Next checks if 8002004 (which is the reset vector position for a sketch uploaded to 8002000), and if the reset vector points to an address that matches 2000xxxx, it means that is a reset vector to a RAM address, not to flash. In that case, it copies 64KB of flash starting on 8002000 to RAM starting at 20000c00, and next boots from there like in step 1 above when the upload was direct to RAM.
3-Next it checks if there is a valid SP in 8002000 (like yours do with ID2), and if so boot to flash in that address.
4-Finally checks valid SP in 8008000 like the old ID1, and if valid jumps to there.

If you reboot the board, Step 1 finds the code in RAM and runs from there directly, only needs to copy from flash to RAM if you completely shutdown.

So step 2 above is the new added feature. I run a sketch with some calculations, and runs faster, but I think it mostly depend on how much jumping around the code needs to do. Flash is fast enough in sequential with the prefetcher, but as we saw with PIN_MAP, it can be slow for other things.

I don’t know if this option is needed at all. I just did for the fun of it. It ocurred to me in the way home the other day, and gave it shot, and works fine. Took me may be half an hour to code and test it.
The extra to acomplish step 2 above adds about 200 bytes to the bootloader code.

It’s fully functional. The whole idea may come handy for updating the bootloader or a sketch from itself. First copy the “updating” section to a known RAM address, then execute from there, and once running in RAM, it can rewrite the whole flash.
I do not see any immediate use right now though, as I said it was just mostly to test if I could do it.

RogerClark

Mon May 18, 2015 8:36 pm

Thanks victor

I agree its interesting to know its possible.

As long as we don’t have false positives e.g bootloader thinks there is a program in ram when there isn’t, I can’t see any harm in retaining it.

The version where it copies from flash back to ram seems the most fool proof because, I think you are saying that the bootloader is determining the start address of the code by looking at the code in flash and then copying if from flash to ram.

This sounds stable to me if we can guarantee that there can’t be false positives.

I think the thing with the bootloader is that it needs to be a lot more stable then the core.
Because is more of a pain to reflash the bootloader than to just recompile the core and upload (using the bootloader)
Which is why I dropped the upload straight to ram.

Perhaps some of the other clever people on the site can let us know if they think there are any ways in which the upload to flash and copy and run from ram could go wrong.

victor_pv

Mon May 18, 2015 10:06 pm

Roger, I dont think the can be a false positive in any working code developed without heavily modifying the linker scripts.
In any code linked for flash, the reset vector has to be in an 800xxxx address. If the reset vector stored at 8002000 matches the pattern for 200xxxxxx it can not match the pattern for 800xxxx at the same time.

Regarding the other check, the one that checks for a valid SP address at the 2000C00 address, I made the upload to RAM option check for an SP that matches at least 3 or 4 zeros at the end. That prevent any false possitive in the chips with 20KB, but could allow a false positive in a chip with more than 20KB if the word at …C00 contains a pointer to another address in the RAM range higher than the initial 20KB, and that pointer address ends in 3 or 4 zeros.

To prevent that we could make that mask a #define line in the config.h file. so the mask matches the amount of RAM in the MCU. so for 20KB, it would check the SP points to the next by after 20KB, but for high density devices it could check that the SP points to the address right after the 64KB of RAM on those.
I still think that the chances of the RAM at C00 containing a pointer that fits the mask I am using are pretty slim.
I can check the code when I get home and tell you how many bits I was using.

RogerClark

Mon May 18, 2015 11:05 pm

Hi Victor,

Can we wait for feedback from Ray and Matthias etc, as I know some people have strong views on whether RAM upload is a safe or good idea

madias

Tue May 19, 2015 8:18 am

Ok, call me uncreative, but I was thinking a lot what benefit we get out of this feature. So “auto updating” the bootloader is the only scenario or we can get some speed improvements in general?
But anyway an interesting thing!

RogerClark

Tue May 19, 2015 10:11 am

Hi Matthias,

I’m not even sure the RAM thing helps updating the bootloader
We can already update the bootloader via a sketch (Victor already did this), so I’m not really sure how the ram stuff helps any more with that.

But I guess its the art of the possible.

What concerns me is whether we’d get false positives. For some reason with the old Maple bootloader I seemed to get false positives on my Maple Rev 3.

And I’m generally cautious in terms of this sort of thing, so my choice would be not to include RAM in the normal version, but perhaps have it in a special version

victor_pv

Tue May 19, 2015 2:23 pm

Roger, the upload to RAM option was giving false possitives because of the way leaflabs was testing for a valid SP address.
Basically it tested that the SP vector at C00 contained ANY valid RAM address. That means that when any sketch put any pointer in that location, it would give a false positive.
They used this:
if ((sp & 0x2FFE0000) == 0x20000000) {

That means that any address matching 20000000 to 20010000 , gives a positive. That is a the whole 64KB of maximum RAM available in any R or C series chip.
So any pointer whatsoever stored in C00, gives a positive. When we starting using the whole 20KB of RAM for sketches, some pointers ended up in that area.

Now I test it like this:
if ((sp & 0xFFFE0FFF) == 0x20000000)
That means that only addresses that are a multiple of 4KB would match the pattern (4KB, 8KB, 12KB, etc):
20001000
20002000
20003000
20004000

For a 20KB device only 4 addresses could give a false positive, 4, 8, 12, and 16. The real positive would be 20Kb.

For a 64KB device there are a few more false possitve, 15 total:
20005000 (on the 20KB this is the real positive for the SP).
20006000
…

Now, the chances of a pointer being in that location (20000C00) AND that pointer value being one of the 3 possible false positives, I believe are extremely small. If that ever happens, a power down and power up clears the RAM.

BUT, to prevent even more false positives, as we now are using different targets for the Make file, we can define that mask in config.h for each target, such as

if ((sp == SP_MASK) {

Where SP_MASK = 0x20005000 for themaple mini (20KB +1, the only SP vector we use in any of the compiler scripts)
And for the 64KB devices:
SP_MASK = 0x20010000 (the byte after 64KB).

EDIT: With all the above said, on the 20KB devices we should probably take the option out all together. I only included it back in the code I was using for testing the devices with 2KB page size, which have 48 or 64KB at least, but if keeping the RAM upload for all devices is simpler than making it conditional for only some devices, then the method above should solve the issue with false positives.

mrburnette

Tue May 19, 2015 3:06 pm

EDIT: With all the above said, on the 20KB devices we should probably take the option out all together.

2nd that.

Ray

Question or suggestion, upload to Flash, then copy to RAM at runtime.

Making examples easier to find

Custom STM32F103C8T6

Leave a Reply Cancel reply

Question or suggestion, upload to Flash, then copy to RAM at runtime.

New Posts

Related Posts

Leave a Reply Cancel reply