massively parallel stm32duino blue pill

ag123

Thu Apr 06, 2017 11:21 am

‘long’ ago rasberry Pi did it
http://hackaday.com/2012/09/12/64-rasbe … rcomputer/

would it be about time stm32duino do basically the same? and to be more ‘hard core’ ethernet is not needed, i2c bus can chain 1024 stm32duino blue pill together on just 2 wires

ahull

Thu Apr 06, 2017 11:11 pm

ag123 wrote:‘long’ ago rasberry Pi did it
http://hackaday.com/2012/09/12/64-rasbe … rcomputer/

would it be about time stm32duino do basically the same? and to be more ‘hard core’ ethernet is not needed, i2c bus can chain 1024 stm32duino blue pill together on just 2 wires

Nutsy

Fri Apr 07, 2017 12:54 pm

Clustering the PI makes sense as it can run linux and is easy to network together to create high amounts of number crunching…
And being linux opens that number crunching to all sorts of possible uses…

The blue pill… Well… simple not as fast or built for that kind of use.

Paralleling the blue pill would be pointless for number crunching. Please correct me if im wrong. But i just dont see any reason to do it with this chip.

Its an embedded chip for embedded processes…

racemaniac

Fri Apr 07, 2017 1:17 pm

it indeed doesn’t make any sense, even at 2$ per bluepill, but would still be a fun project XD

ahull

Sat Apr 08, 2017 10:27 am

If you have 1024 small tasks that need to run in parallel, say 1024 ffts or something similar, then it would be a useful trick.
Paralleling smaller numbers of pills is useful, for example you can use one as a display controller, another as a radio driver, a third as a sensor controller and a fourth to read and write some storage. The tight coding loops you can achieve with individual boards and the modularity this provides are the advantages in this scenario.

mrburnette

Sat Apr 08, 2017 2:46 pm

The $5 per node RPi Zero is the least expensive commercially available microprocessor per-node implementation I have seem built into a cluster. The implementation is facilitated by the fact that the RPi is a Linux SoC board.

http://hackaday.com/2016/01/25/raspberr … s-a-punch/

For microcontrollers, a distributed application architecture presents a serious limitation since routing worker threads becomes more complicated. Rather, breaking up the roles of microcontroller to assign specific hardware tasks provides a better overall utilization of investment: peripheral controller, display controller, serial communications controller, main logic orchestrator, etc. In such a scenario, each of the microcontrollers run a single repetitive task that performs one need, such as SD card logging or collecting sensor data. SPI or high-speed serial can tie multiple uC together such that the main application controller only has to deal with pre-digested datasets from the tentacles nodes and program logic.

Ray

PS: I would highly recommend anyone who has not played around with a $5 RPi Zero to get on board. If you need WiFi and Bluetooth, the $10 U.S.D RPi Zero-W is a great play. Essentially your $5 buys you a 1GHz cpu, HDMI video, and 512MBytes of RAM, and SDIO microSD card interface, and USB OTG. More than 50% of your RAM will be available to you after the X-server starts the GUI. If you go headless and drop the Xserver/VNC and use only SSH and CLI, then you will have about 75% of the RAM. These are extraordinary interesting little Linux boards.

The downside to the SoC is the increased power dissipation… The SoC board can easily consume 750mA plus. Various sub-systems such as HDMI can be disabled to bring the current requirements < 100mA (non-W board) based upon some of my testing … putting it just north of the ESP8266 average 80mA idle current.

Ray

ag123

Sun Apr 09, 2017 6:04 am

agreed with ahull, using lots of connected mcus probably fit different niche use cases
as such mcu’s excel in io, adc and control, it could split io tasks that needs to control many different devices / element

an example scenario might be something ilke this
https://vimeo.com/46857169

using it for compute may be possible for specific niche shared litle or shared nothing use cases, e.g. simulating artificial neural networks which each mcu being a neuron and the synapses travel along the interconnecting buses. but that role is probably done much more cheaply and efficiently using present day GPUs with their SIMD vector processing functionalities.

@ahull, i posted this out of an accidental brainwave, i couldn’t really afford the 1024 blue pill experiment personally

ag123

Sun Apr 09, 2017 6:39 am

incidentally, as present day spi tft screens like the ili9341 are pretty much command driven and independent of the mcu and having its own memory, it might be possible for several blue pill to share 1 ili9341 over spi and for each stm32 blue pill to draw their own parts on the same screen

and i’d imagine with several blue pill doing adc and doing stochastic or probabilistic sampling, it may be possible to go multi mhz oscilloscope with that ‘parallel’ sampling but the trouble would be how to have them co-ordinated to assemble the original wave form

ag123

Sun Apr 09, 2017 8:02 am

@mrburnette
imho rpi, beaglebone black et.al. are of different ‘class’ in that they are full single board computers (some multi core like Rpi 3)
the main stumbling block and the transition to *bare metal* duino style developments normally is *power consumption* and *cost*.
for small projects i’d think rpi and beaglebone black are still quite managable in terms of ‘cost’ in particular if it is useful to the functional scenario.
on rpi and beagle bone one can simply run a full java or python stack that eats hundreds of megs of memory and flash.

the main thing would come to *power consumption*, i tried powering a beaglebone black on batteries. but in my case a 7 inch tft icd is connected and powered from the same board. i think the power consumption comes out to a whopping 1+ – 2 amp and a 10,000 mah usb powerbank runs dry in i’d think about 2-3 hours. that probably means the setup beaglebone black + tft lcd (more the tft lcd i’d think) probably runs between about 3 watts to 10 watts
that’s pretty high power consumption and would not be feasible to do on a sustained basis on batteries. (i’d think removing the tft lcd would last much longer but still at 1ghz and 512m dram, it would take quite a lot of energy management to conserve power)

hence, more and more i’d think dedicated functionality ‘small’ bare metal boards like stm32duino e.g. blue pill, maple mini would fill this niche of ‘small apps’ that runs on batteries (very feasible), it fits a ‘low power’ profile for a small device and dedicated application
this is off-topic from the ‘massively parallel’ theme, but it’d seem to me that running several stm32duino or alike would eventually fill a different niche compared to the likes of rpi and beagleboards

one of the reasons for mcu’s like stm32 running on low power i’d think is the use of sram, which unlike dram does not need periodic refresh.
but sram eats quite a lot of expensive chip real estate and is precious leaving us with 20k on bluepill, maplemini vs 512mb-1gb say on rpi or beagleboard

ahull

Sun Apr 09, 2017 11:20 am

Mobile phone type SOCs like the Pi and the BBB are pretty good at power sipping, but any application that runs lots of cores at full speed will obviously use quite a bit of power. 1024 bluepills are actually relatively frugal in their power needs. I wonder what would be the best mips per watt we could manage.

turboscrew

Tue Jul 24, 2018 8:32 pm

Too bad that blue pill doesn’t have enough similar communication HW to make a “transputer”.

ag123

Tue Jul 24, 2018 10:30 pm

well if you are happy with spi and/or i2c and/or uart you could get quite close to the 80s standards with BP,MM and maybe exceed it

https://en.wikipedia.org/wiki/Transputer

mrburnette

Wed Jul 25, 2018 2:26 am

Scalable cluster using ESP32 & RPI to orchestrate.

Github

The Raspberry Pi does the heavy lifting; ESP32 nodes builds out an elastic cluster.

It is my opinion that a similar approach could work for bluepills. At current prices, the dual-core ESP32 is likely the winner in cost-performance, but if one has a drawer full of STM32F103’s already, then the math could shift in favor of the STM device.

Ray

ahull

Wed Jul 25, 2018 11:30 am

The ESP32 has the advantage of being multi-core, and so far as I am aware, none of the low end STM32 devices are, so while it may cost more, the ESP32 is faster and gives more bang for your buck.

Having said that, an array of say 512 bluepills, would give you a Vax MIPS rating of about 24990.72 (would that be 25 Vax GIPS ) so you would have a moderately quick 512 node massively parallel array for around $1000 which aint bad.

zoomx

Wed Jul 25, 2018 11:50 am

I believe that it should be compared with a GPU.

ahull

Wed Jul 25, 2018 11:51 am

[zoomx – Wed Jul 25, 2018 11:50 am] –
I believe that it should be compared with a GPU.

True, but I guess it depends on the tasks you are trying to perform.

mrburnette

Wed Jul 25, 2018 1:29 pm

[ahull – Wed Jul 25, 2018 11:51 am] –

[zoomx – Wed Jul 25, 2018 11:50 am] –
I believe that it should be compared with a GPU.

True, but I guess it depends on the tasks you are trying to perform.

The silicon has not evolved (yet) to pick GPU over CPU … it is all about architecture. Future silicon is likely to muddy the waters more.

However, when thinking about such things, the $5 rPi Zero provides both CPU and GPU, Linux, and enough RAM to build a seriously powerful cluster. As the rPi Zero is a “real” Linux machine, traditional clustering techniques apply as does off-the-shelf enabling software.

Attempting to “cluster” a bunch of blueboards would in itself be a cluster, IMO. What you would end up with is a BIG I/O box… maybe we need to coin an acronym to describe such a box: MIIO or M2IO for Massively Intergrated Input Output and we could think of that as a front-end processor to a more conventional Multi-CPU machine. Multiple CPUs would spawn directed I/O requests from the little subordinate boards and local CPU power would validate and configure the datagram for easy transmission and digestion by the master computer; thus the little I/O CPU would off-load much of the “dirty” handshaking, data validation, and formatting of I/O data.

Ray

PS: I’m not really active in the STM32 space much anymore, but an injury to my dominant hand has temporarily prevented me from touch-typing and coding (quickly) … I do seem to be OK with one-finger poking on my Android tablet, however. The only good outcome from the injury is that I am getting spousal assistance with yard work this summer.

ag123

Fri Jul 27, 2018 5:10 pm

as it turns out there is an ancient effort which has ‘been there done that’ it is – the connection machine
https://en.wikipedia.org/wiki/Connection_Machine
and the forgotten thinking machines corp
https://en.wikipedia.org/wiki/Thinking_ … orporation
the idea is MIMD (multiple instruction, multiple data)
https://en.wikipedia.org/wiki/MIMD
i’d imagine that to do FFT (fast fourier transform) you could broadcast the data on i2c bus, all the bp/mm read the same signal
or for that matter distribute it to the bp/mm round robin on the usb bus, usb-serial would do
then each bp/mm does its share of work, oh but then you’d still need to collect them up again
erm maybe that can be done by the host perhaps let that be a raspberry pi 3 or beagle bone black
so that collects the result and present the fft results say in a graph

paulvdh

Fri Dec 14, 2018 12:06 pm

To me it seems that this idea would be a quite silly exercise, and that is probably the most important reason it has not been done before.

1024 devices on a 7 bit I2C address bus?
There are some extensions to I2C, but still the bandwith would be too small for such numbers of devices and you will violate the capacitance limit of the I2C bus.

STM32 does have quite a lot of higher bandwidth busses available. From USB, I2S, SPI, to the humble UsART.
As a “fast computation box” the exercise would be quite pointless, but as a distributed I/O network with per-node computation capabilities it does make sense. And the combination of UART & RS-485 drivers is an quite effective combination for that.
Have a look at the big LED installations that Mike Harrison makes:
https://www.youtube.com/user/mikeselectricstuff/videos
Some of the big distributed LED installations he designed have 100.000 LED’s and a distributed network of processors.
Recently I saw a video of a splitter box with Ethernet input, because he ran out of bandwidth with RS-485.

massively parallel stm32duino blue pill

STM32-UNO(ARMDuino)

Yet another meter.

Leave a Reply Cancel reply

massively parallel stm32duino blue pill

New Posts

Related Posts

Leave a Reply Cancel reply