Search code examples
javacsvhashsetfill

Filling a file who needs values from 1000 other files - Java


Suppose you have this .csv that we'll name "toComplete":

[Date,stock1, stock2, ...., stockn]
[30-jun-2015,"NA", "NA", ...., "NA"]
....
[30-Jun-1994,"NA","NA",....,"NA"]

with n = 1000 and number of row = 5000. Each row is for a different date. That's kind of a big file and I'm not really used to it. My goal is to fill the "NA" by values I'll take into other .csv. In fact, I have 1 file (still a .csv) for each stock. This means I have 1000 files for my stock and my file "toComplete".

Here are what looks like the files stock :

[Date, value1, value2]
[27-Jun-2015, v1, v2]
....
[14-Fev-2013,z1,z2]

They are less date in each stock's file than in the "toComplete" file and each date in stock's file is necessarily in "toComplete"'s file.

My question is : What is the best way to fill my file "toComplete" ? I tried by reading it line by line but this is very slow. I've been reading "toComplete" line by line and every each line I'm reading the 1000 stock's file to complete my file "toComplete". I think there are better solutions but I can't see them.

EDIT : For example, to replace the "NA" from the second row and second column from "toComplete", I need to call my file stock1, read it line by line to find the value from value1 corresponding to the date of second row in "toCompelte". I hope it makes more sense now.

EDIT2 : Dates are edited. For a lot of stocks, I won't have values. In this example, we only have dates from 14-Fev-2013 to 27-Jun-2015, which means that there will stay some "NA" at the end (but it's not a problem). I know in which files to search because my files are named stock1.csv, stock2.csv, ... I put them in a unique directory so I can use .list() method.


Solution

  • So you have 1000 "price history" CSV files for certain stocks containing up to 5000 days of price history each, and you want to combine the data from those files into one CSV file where each line starts with a date and the rest of the entries on the line are the up to 1000 different stock prices for that historical day? -back of the napkin calculations indicate the final file would likely contain less than 1 MB of data (less than 20 bytes per stock price would mean less than 20kb per line * 5k lines). There should be plenty of RAM in a 256/512MB JVM to read the data you want to keep from those 1000 files into a Map where the keys are the dates and the value for each key is another Map with 1000 stock symbol keys and 1000 stock value values. Then write out your final file by iterating the Map(s).