Search code examples
arrayslanguage-designoffset

Array Design Principles


There are hundreds of array topics in all the programming forums. maybe even thousands. daily. But none of them touch upon this aspect.

Because arrays are indexed from 0, i often still enter access my data starting at one. the reason being the zero fields usually work well as holding totals of columns or rows, as well as status flags, instead of a plethora of named variables. It always makes better sense reading it because the fourth entry in is visible as 4, not 3. Some people call the Ground floor of a building ‘1’ and the next storey up ‘2’ or Second floor. Others call the First floor above Ground as 1 and so on. People get lost in buildings because of this. In the demanding world of CAD anything that interrupts a train of thought, no matter how minor is a disruption, equal to a loss of income. Therefore i like arrays comprehensibly as close to the natural conceptual model as i can get it.

so the first question is,.. Is this acceptable practice? and if not, why not? Even though i’m the only one reading my own code, i rarely share it, i think there is value in not straying too far from convention. At least if i go on my own bloody-minded way i’m clear on the reason why.

there are a few more questions, and while they could be deserving of their own thread, they all relate to each other in some way so i’ll lay them out as well

I also find that there are often parts of arrays that are redundant, because in the real world for which your software is to be used those cases dont arise. So question two: rather than have a large array, is it acceptable practice to re-think the data usage into smaller like-for-like arrays where these dead areas are minimized? how it relates to the question above is that this would be yet another set of zero-zero fields which may become superfluous in themselves. I know there is a sentiment of “.. hey there’s plenty of memory” but somehow I have a view that’s just sloppy thinking.

and my last question: Occasionally i’m adding an offset to my indices to cater for extra dimensions** dealing with user history

printf("   %s \t %d \n", foo[i],bar[i][c]);
printf("   %s \t %d \n", foo[i],bar[i][c+1]);
printf("   %s \t %d \n", foo[i],bar[i][c+2]);

so i’m worried about a couple of things here. Is doing a calculation at each access a speed hit? In the world of mobile apps, does this represent some battery usage that could have been avoided?

** for the benefit of any beginner reading this, four dimensional arrays and up are easy to comprehend if you think of a three dimensional array as a Rubik’s cube. Put another cube or two alongside it within the room you are seated in and you have four. the building full of rooms is the fifth and so on for city blocks of buildings being six, states of cities 7, countries, etc and up.


Solution

  • A good jumping off point for understanding the tradeoffs is to know the name of the thing you are talking about. This is called "Zero-Based-Numbering", and you can read about it on Wikipedia:

    http://en.wikipedia.org/wiki/Zero-based_numbering

    The idea that zero is counter-intuitive has a long history, and it's debated to this day whether the term "natural number" should include it. I find it natural, and would prefer the floor of a building which contains its main entrance to be labeled 0 (and the subterranean parking levels to be -1, -2, -3...)

    (Then again, I was espousing the idea that our circular constant should be twice the current value of pi, years before that trend ever made it big on the internet.)

    Dijkstra made an argument that zero-based indexing makes sense. His rationale is that programmers want to be able to talk about the number zero in a range without having to invoke -1 as a bound:

    http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

    Among modern languages that have bucked the trend and use one-based numbering are Rebol by fiat and Lua by convention. I think Rebol is really well thought-out but this is one of the decisions I disagree with. Generally speaking, I feel that any advantage given by the one-based system is outweighed by having a standard...and processor architectures are simply not going to change on this point anytime soon.

    Now as for this bit:

    i often still enter access my data starting at one. the reason being the zero fields usually work well as holding totals of columns or rows, as well as status flags, instead of a plethora of named variables.

    That sounds like a very dangerous practice. If you abuse your arrays in this fashion you are asking for trouble, because you are subverting whatever bounds-checking features the language has. Taken to an extreme, this manner of programming would have you eliminate separate variables altogether. Structures and type systems exist for a reason.

    As for this:

    Occasionally i’m adding an offset to my indices to cater for extra dimensions (...) so i’m worried about a couple of things here. Is doing a calculation at each access a speed hit? In the world of mobile apps, does this represent some battery usage that could have been avoided?

    I'll assure you that the time it takes for those additions is the least of your concerns compared to the bugs you'll introduce by not using the correct array bounds.