I have been assigned an 8GB RAM desktop at work which I can't modify. My job involves data manipulation on a group of ~1GB, ~8M row tables.
Certain analyses I need to do would be considerably simpler to implement if I could merge all the files but this means R, which is the tool I currently am using, won't be able to load the merged file at all.
I've asked around and was told that using Apache Spark or setting up a local SQL server would solve the issue and let me ignore memory limitations for data processing steps (the expected output always consists of only a handful of total counts). I'd just like to be sure these will actually work like that before installing anything.
(as a bonus question, I wonder how software like SPSS manages to load and work on huge datasets without a hitch and why R can't implement a similar method)
Both Spark and SQL Server can absolutely handle and process larger data than fits into RAM.
Installing these tools shouldn't be a big deal. Uninstalling a local Spark installation is just deleting a simple directory.
Spark is intended for use on clusters of computers, but you can use it on a local workstation.
Spark will also read/write data directly in most flat file formats. With SQL Server, you have to load it into SQL Server tables.