Search code examples
javacsvcachingcurrencybigdecimal

Java Storing market data price values effectively (BigDecimal)


I have following csv:

20120201 000000;1.306600;1.306600;1.306560;1.306560;0

where

Row Fields: DateTime Stamp;Bar OPEN Bid Quote;Bar HIGH Bid Quote;Bar LOW Bid Quote;Bar CLOSE Bid Quote;Volume

DateTime Stamp Format: YYYYMMDD HHMMSS

Legend: YYYY – Year MM – Month (01 to 12) DD – Day of the Month HH – Hour of the day (in 24h format) MM – Minute SS – Second, in this case it will be allways 00

Its EUR/USD market data (BAR 1 MIN).

Problem is: I need to store as much data as i can in memory of java program so i dont have to read them constantly as i work with them.Preferably store them all since i dont mind how much memory this will take.

I suppose i have to use BigDecimal to keep percision (I will have to do some arithmetic on these prices).I have 3 bigDecimals per row.One file includes 400 000 rows , thats a lot of objects i have to create.There might be multiple files,witch equates to milions of objects.In addition BigDecimal carries a overhead.

Question: What would be best way/data structure/collection to store this data in memory? Cache fixed number 100k at a time? Use something different then BigDecimal (bud i need to keep precision)? Or just load everything / as much as i can?

I also dont wanna spend lot of computational time creating LOT of BigDecimal objects if there is a better way.

My current thinking is just to load as much as i can.Bud im afraid of many drawbacks / as well as problems down the line when i have to port this code to C#(requirement).


Solution

  • A Big Decimal instance takes up 32 bytes in memory. A million BigDecimals would be 32000000 bytes. That is 31250 kilobyte, 30.5 megabyte so 10 million will be 305 megabyte. When approaching the 100 million you'll need 3 gigs. Still reasonable.

    So is your LOT really a lot?

    As for processing the code, I suggest you process it into chunks, and deal with those chunks before continuing with the next chunk.

    BufferedReader can really help there, by loading the file in chunks, processing as you go.

    And creating many objects will get optimised by the interpreter, so it might actually go very fast.

    just an example

    Just for example, I have a piece of code, that generates >400 mb of json files. Reading those json files later on happens in 30 seconds, whilst simultaneously a lot of other processes happen.

    Those json files are much more data/structure intensive in processing than csv files. So I really think you shouldn't worry about the processing overhead.