I have a cluster of 5 miniservers (Raspberry Pi), each with a 8GB USB drive, just for experimenting with clustering and such.
They are connected to a switch via LAN and not to the internet for now
What i need is a way to have the same files on each server, and as the title says, the alternatives are:
. Replicating the same data over the 5 servers, having only ~8GB of space 5 times
. Have a "JBOD" over the network, so ~40GB total
Any suggestion for any of the above solutions is appreciated.
The files stored are in no way important, so no reliability/availability needed.
Have a great day.
You need to ask yourself the question of what kind of distributed computation you are planning to use. If you are looking at data-local computation as in the popular MapReduce frameworks you might want to install one of these frameworks. They are based on and coupled with distributed file systems. So basically you have a higher level file system which you can access through an API. Data you write to these file systems get split up across the cluster. In the MapReduce processing paradigm the map phase can make use of this data locality as it processes/loads data from local chunks only.
If you are more interested in the HPC/cluster approach you are probably going to look into MPI based systems. In these systems you operate a little more low level. What could work quite well in that case is, that you use a combination of NFS and OverlayFS to make the data available to all nodes. This would work like this, that each of your Pi's shares its USB via NFS. All the other Pi's mount all other Pi's file systems. So on Pi-0 you would end up with mounting to shares from Pi-1 through 4 etc. With OverlayFS you can then make the data from the individual shares show up in a single folder.
If any of your MPI workers need to read a file they could basically all read from a well defined path and data would be pulled in transparently through the network if necessary.
With NFS being around for many ages and the many performance improvements mad to it and it's generally little overhead this could even be a quite performant solution.
Keep us updated about this exciting project you are planning here!