The original used a timer to trigger an ISR at the required frequency and loads 1 value a a time in the DAC.
I modified it so it uses the I2S peripheral, and loads a block of data at once to transfer with DMA.
I need to send a PR with modifications to the SDIO library, but SPI with sdfatEx should be fast enough too.
Currently takes ~11mS to decode a frame from a stereo 44100 256Kbps mp3 file. That generates 1152 samples. At that rate the I2S peripheral takes 13mSto play the same number of samples, so that leaves 2mS to read data from the card and anything else you want to do.
With a 22Khz 64Kbps file, it take around 5mS to decode a frame, and 26mS to play it, so that would leave 80% of the CPU time available for something else.
Also since the SDFat library uses DMA, it is possible to use the yield call to do something else while the DMA transfers complete.
The I2S library has several new functions not in the Arduino API, to allow the sketch to access the buffer directly, and to notify the callback function on whether the interrupt was for Half Transfer to Transfer Complete, so the DMA can work in circular mode with no interruption and just call the callback function in the sketch when the top or bottom half of the buffer has been played, so the that half can be reloaded with new data.
This is how the modified code works now:
=========================
Setup() will read the pointer to the I2S buffer and the size (in bytes, so has to be /2 to get the size in int16_t thats the sample size from the mp3 library).
Next it will set a callback function to the I2S library, called when DMA HT or TC interrupts trigger.
After that it starts the I2S peripheral in circular mode, so all that’s left is to reload the data buffer when the ISR is called.
Then calls the play function of the MP3 library. That’s modified so it doesn’t use any timer any more. Instead now checks when the two internal buffers have space for 1 frame (1152 samples), and if so, decodes a frame to the buffer. The MP3 library is repeatedly called in loop(), and just checkes whether a buffer has space for samples, if not, returns, and some other user code could run after it.
When the I2S DMA triggers the ISR, the sketch ISR function will check which half of the buffer is empty, and then call a new function within the mp3 library to copy data from the mp3 buffer to the I2S buffer, and update some variables to indicate when a buffer is fully emptied, then returns.
The loop calls to mp3 keep being called often enough to keep loading new data from the SDcard, and then decoding it to one of the buffers.
I added some pieces of code from the teensy mp3 library to increase the decoding speed a good bit.
The code is here:
https://github.com/victorpv/Adafruit_MP3
Further possible optimizations:
- Use only 1 buffer, and the I2S DMA read from the same buffer the MP3 library writes to. This would eliminate some overhead, and some time spent in copying from one buffer to the other, but the copy doesn’t take that long, so it’s not critical. Main advantage would be to save some RAM.
- Add support to use the DACs. Ideally write a library with an API similar to the I2S one, so sketches work on both
- Further speed optimizations. With 44100Hz stereo files it takes almost all the CPU time decoding.

Joking aside,,,
I do still use an several old IPods, but they are all slowly dying and since Apple decides to stop making the Nano (with a display) several years ago, there are few decent alternatives.
I did try some cheap MP3 players from eBay but the they all had major software , hardware or mechanical issues
[RogerClark – Sun Mar 25, 2018 9:07 pm] –
I was wondering how to replace my iPod when it finally fails. So building one using an STM32 is now an option![]()
Joking aside,,,
I do still use an several old IPods, but they are all slowly dying and since Apple decides to stop making the Nano (with a display) several years ago, there are few decent alternatives.
I did try some cheap MP3 players from eBay but the they all had major software , hardware or mechanical issues![]()
I am actually surprised the F1 could keep up the pace to decode the 44100 file. I need to get hold of a 320kbps file, 44100khz or more and see if I hit the limit. But to be honest, any cortex M3 or M4 over 100Mhz seems to be more than enough for a pretty decent mp3 player, using an sdcard for storage.
I wonder if any of the cheap mp3 players coming from China has a cortex mcu and can be hacked for open software… it would make for a nice piece of hardware.
Probably not as they were fairly useless.
But its probably worth looking again to see if they specifically have STM32
[victor_pv – Sun Mar 25, 2018 11:33 pm] –
I wonder if any of the cheap mp3 players coming from China has a cortex mcu and can be hacked for open software… it would make for a nice piece of hardware.
Sadly: Negative. They are using special chips.
The cheap ones without display something like this:
https://www.aliexpress.com/item/New-arr … 49748.html
and the ones with display mostly a Mediatek chip (like on smartwatches) older ones “Rockchip”
Probably dedicated chips are more cost effective ( cheaper and better performance )
The problem I find with those MP3 player boards ( modules ) is that they are not designed to be controlled by a MCU.
AFIK, none of them have a API and connectivity via I2C or SPI.
I have some older sound player modules with a simple control API, but they do not support MP3, they use a special file format.
This sounds an ideal use for the F4 Pill board, as it has SD, and perhaps the F4 has a DAC which may not be too bad…

Victor: Wich board do you use for this example? Did you use an external SD-Card slot? (I think within SDIO mode any cable length might be suboptimal)
I submitted a PR for the SDIO library that adds speed and compatibility with more cards. It seems as to use SDIO in the F1 at full speed we may need pull up resistors in all lines, but the board I use doesn’t have that, and the solder points are very small to add them.
Update the library from my last PR and should work better. You are using the hardware SDIO interface right?
I’ve tested the code in SDfat mode – only stuttering -> as expected.
I have this board on hold:
https://www.aliexpress.com/item/Free-sh … 84907.html
So I’ll wait until it is delivered, maybe a F4 is even the better solution as we need I2s for playing MP3 and RET-ZET boards aren’t cheaper than this F407 one.
So conclusio: MP3 on STMF103 might be possible, but you need at least RET-VET and onboard SDIO SD-card slot, but the F407 with everything onboard has the same price if not cheaper….
For SDIO, use the files from this PR:
https://github.com/victorpv/Arduino_STM … SdioF1.cpp
Those are the very latest I am using.
In case of problems, you can modify the library to use 1bit instead of 4 bits, by commenting out these lines:
if (!cardAcmd(m_rca, ACMD6_XFERTYP, 2)) {
return sdError(SD_CARD_ERROR_ACMD6);
}
sdio_set_dbus_width(SDIO_CLKCR_WIDBUS_4BIT);

i think we may have better luck with stm32f4 as on top of it being a cortex-m4
first of all the m4 has 2x hardware fpu in the engine
and it has that ART accelerator that ST bragged quite a bit about & that it really give that 500 mflops horse power

viewtopic.php?f=3&t=76&start=160#p26942
we’d just use single precision (hardware) floating point for it
that probably means the f4 may give high fidelity mp3 playback that rival those of the commercial near top end or top end players out there
[youtube]https://youtu.be/0ETyFmAMFjY[/youtube]
on another note i’m wondering how difficult/easy would it be to optimise codes for the m3 pipeline
http://infocenter.arm.com/help/topic/co … GJICF.html
as it seemed if the pipeline could be fully utilised, it may create the illusion of being able to execute one instruction per cycle, that gives >= 72 mips if it is at all possible, but i’d think this is only partially possible given the very limited ram (cache?) and execution from flash
i.e. certain short segments of codes may give 72 mips, but things more complicated may be difficult to impossible to optimise
I applied all the optimizations (assembler code mostly) from the teensy version, and teensy is a cortex-m4. Everything works the same.
The ART accelerator and specially running at 100Mhz or more would definitely allow for spare CPU for user code.
But as it is, I am using DMA for both the SDCard and the SDIO, and the CPU pretty much just does the decoding, and has no trouble with 44khz files with 256kb rate.
There are a few more optimizations that I could do to move data with DMA, and even read and write straigh to some buffers to avoid moving data, but moving 512bytes only dates a few uS, so wouldn’t make a tonn of difference.
As it is right now, decoding a frame at 44khz 256kb rate takes about 11mS, and playing the same frame takes 13mS, so you have 2mS out of every 13mS to do anything else you want.
I plan to put it all in tasks for FreeRTOS, then you can leave a low priority task to run anything else without affecting play.