I was wondering about fast ways to output parallel data, and gave this a try
#include <dma_private.h>
uint32_t toggle[1000];
void setup()
{
pinMode(11, OUTPUT);
pinMode(33, OUTPUT);
pinMode(32, INPUT);
for(int i = 0; i < 1000; i++)
toggle[i] = i%2 ? 0x00010000 : 0x00000001;
while(!digitalRead(32)); //it waits until you press the maple mini button
dmaTransfer(toggle, (uint32_t*)&GPIOA->regs->BSRR, 1000);
}
void loop()
{
}
void dmaTransfer(uint32_t* from, uint32_t* to, uint32_t dataLength)
{
dma_init(DMA1);
dma_setup_transfer(DMA1, DMA_CH1, from, DMA_SIZE_32BITS, to, DMA_SIZE_32BITS, DMA_PINC_MODE | DMA_MEM_2_MEM);
dma_set_num_transfers(DMA1, DMA_CH1, dataLength);
dma_enable(DMA1, DMA_CH1);
}
But you can build a 16 Bit resistor divider DAC with that speed, the bad thing is, that you won’t get accurate 16-bit with a home build resistor divider network.
The ILI9341 is cheap and it does support 16-bit parallel interface, but I don’t know how to actually drive one or anything else.
Having some DMA for transferring something means that data come from a frame buffer :
Lets says the screen is 320×480 with rgb24 = 460K of RAM.
Having some DMA for transferring something means that data come from a frame buffer :
Lets says the screen is 320×480 with rgb24 = 460K of RAM.
<…>
I think you missed the point: I was just saying that you don’t always need a practical reason to do something just for fun, you know..
In case of LCD, it is completely useless unless attached to some bigger system, such Raspberry Pi and similar where you can easily dedicate a FrameBuffer of 512KB or 1MB, otherwise it would be useless …
Of course, in application such NeoPixel, DMA is perfectly good fit : small buffer that needs to be send over and over.
This quite the same scenario I’ve done 25 years ago : sending DMX512 stream over RS485 at 250KB over and over using MC68302 with DMA on Serial.
Don’t try to use DMA simply to make an LED blinking … ![]()
So it fills one line buffer while the other is being DMA’ed
However, this only gives a performance advantage when you are rendering something that you can efficiently do line by line.
I’m not too sure under what circumstances line by line rendering actually gives better performance. Perhaps bitmap text, where the font can be access line by line.
And of course filling the screen with the same colour in all pixels, or gradient fills even.
But, I didn’t think it was worth the effort of porting the Teensy code for amount of time and effort it was going to take.
I still like the idea of using a DMA callback and think this should be an option in the SPI DMA functions, but perhaps hardly anyone would use it.
For LCD, I’m still convince it is quite useless, even reducing the area of the framebuffer, because if you have to do drawCircle() with almost fill the screen, the CPU will have to wait probably 10+ times that DMA finished before initialize next DMA transfer, until the whole drawCircle() completed … ![]()
…the CPU will have to wait probably 10+ times that DMA finished before initialize next DMA transfer, until the whole drawCircle() completed …


