I'm writing a program in Python3.5 that reads a data set and does some stuff (it's DICOM data if you are familiar with it). It uses:
Now my program has many different components that will all use the same set of data. My question is what is the best practice of handling this data? Do I:
Which is the best method? I'm not accessing this data once or twice but in the order of 20+ times. Is there a method I don't know about that I should be using?
Thanks in advance, I value your help (and criticism where necessary) to always improve myself as a programmer and, a human being.
Seems like you're actually asking multiple questions here. Let me try to tease them apart:
Can you? Do you have enough memory to do so comfortably? Then do it. Load it once and pass it around, or pass around some interface to the data, as necessary. How you interface with it is your choice (see below). Otherwise you have no choice but to make multiple calls to disk I/O. But I think it's generally a bad decision to read the same data from disk redundantly, since disk I/O tends to be a bottleneck resource.
Python function arguments are passed "by assignment" so to speak. To use C terminology, although not technically precise, it's more like pass-by-reference than pass-by-value. You usually don't see this behavior because 1) lots of things in python are immutable, and 2) assignment statements in python just reassign the name to a different value. Examples where you can see this behavior are mutable objects like a list
, dict
or any kind of object with mutable member properties. Try passing a list to a function and modifying it inside. It will also be modified in the passing context after the function returns.
This depends on several things that I can think of. First, did you decide to store the data in memory or on disk (see first question)? Second, from where do you need to access the data? Third, if you're storing it in memory, do you need the data to persist between runs of the program?
If you can store the data in memory, just need to access it locally, and don't need it to persist, I would go with just some sort of nested python dict
, maybe making one or more custom class
es to simplify the interface to the data.
If you can store the data in memory, but either need to access it over a network or need it to persist between runs of the program, I would use redis or a similar key-value store to manage the data. redis is really easy to learn and there's good python library support.
If you can't store the data in memory, but don't want to parse it over and over again, you should at least index it before you write it back to disk. You might come up with your own indexing scheme just using the file system if you only need local access. If this becomes too complicated or you need network access, you should probably use a database system.