Search code examples
javaandroidarrayskotlincoding-style

Which is better and efficient memory-wise, a big array or multiple arrays?


The app I'm working on app where users can simulate tests and answer them offline. I have a software that takes the data from my database (the questions, alternatives, type of question and etc) and turn them into a array.

I don't know which is the most efficient (memory-wise): create an object with a big array with all the questions or creating separated objects (for each subject for example) with an array each or creating multiple arrays in the same object. Is it ok to create an array with about 1000 arrays inside or is it better to split that array, in... 10 arrays with 100 arrays inside each?

P.S: During the test I will only use 30 items from the array, so I'll take the entries from the big array (or from the multiple arrays) and add them to the small 30 entries array that'll be created accordingly to the user's inputs.

What I would like to use

I would like a big array, because for me it would be easier to sort and create random tests, some people are saying 1000 entries aren't too much, so I think I'll stick to a big array. What would be too big? 10k, 100k?


Solution

  • There are three kinds of efficiency you need to consider"

    • Memory efficiency; i.e. minimizing RAM utilization
    • CPU efficiency
    • Programmer efficiency; i.e minimizing the amount of your valuable time spent on writing, writing testcases, debugging, and maintaining the code.

    Note that the above criteria work against each other.

    Memory Efficiency

    The memory size in bytes of an array of references N in Java given by

      N * reference_size + array_header_size + padding
    

    where:

    • reference_size is the size of a reference in bytes (typically 4 or 8)
    • array_header_size is typically 12 bytes
    • padding is greater or equal to zero, and less than the heap node size granularity.

    The array itself also has a unique reference which must be held in memory somewhere.

    So, if you split a large array into M smaller arrays, you will be using at least (M - 1) * 16 extra bytes of RAM, and possibly more. On the other hand, we are talking about bytes here, not kilobytes or megabytes. So this is hardly significant.

    CPU Efficiency

    This is harder to predict. The CPU utilization effects will depends largely on what you do with the arrays, and how you do it.

    If you are simply subscripting (indexing) an array, that operation doesn't depend on the array size. But if you have multiple arrays (e.g. an array of arrays) then there will be additional overheads in determining which array to in subscript.

    If you are searching for something in an array, then the larger the array you have to search the longer it will take (on average). But if you split a large array into smaller arrays, that doesn't necessarily help ... unless you know before hand which of the smaller arrays to search.

    Programmer Efficiency

    It will probably make your code more complicated if you use multiple arrays rather than one. More complicated code means more programmer effort in all phases of the application's development and maintenance lifecycle. It is hard to quantify how much extra effort is involved. However programmer effort means cost (paying for salaries) and time (deadlines, time to market, etc), and this is likely to outweigh any small savings in memory and CPU.

    Scalability

    You said:

    Some people are saying 1000 entries aren't too much, so I think I'll stick to a big array. What would be too big? 10k, 100k?

    Once again, it depends on the context. In reality, the memory used for an array of 100K instances of X depends largely on the average size of X. You will most likely run out of memory to represent the X instances instead of the array.

    So, if you want your application to scale up indefinitely, you should probably change the architecture so that it fetches the questions / answers from the database on demand rather than loading them all into memory on start up.

    Premature Optimization

    Donald Knuth is often (mis-)quoted1 as saying:

    "Premature optimization is the root of all evil."

    What he is pointing out is that programmers are inclined to optimize things that don't really need optimizing, or spend their effort optimizing the wrong areas of their code based on incorrect intuitions.

    My advice on this is the following:

    • Don't do fine-grained optimization too early. (This doesn't mean that you should ignore efficiency concerns in the design and coding stages, but my advice would be to only consider on the major issues; e.g. complexity of algorithms, granularity of APIs and database queries, and so on. Especially things that would be a lot of effort to fix later.)

    • If and when you do your optimization, do it scientifically:

      • Use a benchmark to measure performance.
      • Use a profiler to find performance hotspots and focus your efforts on those.
      • Use the benchmark to see if the optimization has improved things, and abandon optimizations that don't help.
    • Set some realistic goals (or time limits) for your optimization and stop when you reach them.

    1 - The full quotation is more nuanced. Look it up. And in fact, Knuth is himself quoting Tony Hoare. For a deeper exploration of this, see https://ubiquity.acm.org/article.cfm?id=1513451