Why even big-endianness computers read from lower to higher memory? For big-endianness opposite could be more optimal

I've read about endianness in wiki and tried to search for my question, found that post Does the endianness affect how structure members are stored into the memory where it is explained endianness does not affect sequence of structure members in memory (from lower to higher) for C.

Also in wiki:

The little-endian system has the property that the same value can be read from memory at different lengths without using different addresses

but that is only if we read from small to large address.

I'm wondering where (what architectures / languages) use sequence of memory from higher (to be clear: larger address) to lower? It would make big-endianness on such have same beneficial properly mentioned in wiki quoted above.

Also e.g. it could mean in language similar to C malloc return largest address and the program will fill memory by doing received_address-- not ++(hope I made myself clear).

I could not find by web search why computer development did not go that route (to read memory from large to small address) (cause if that phrase in wiki is correct, it indeed did not go).

Solution

Normally there's zero connection between endianness within a word and what order you access words in. The reasoning / benefit / etc. that motivates choices for endianness within a word doesn't apply at all to how you index arrays.

e.g. Intel invented (or at least used) little-endian to make 8008 more like a CPU with bit-serial ALU and shift-register storage that it wanted to be compatible with. (Why is x86 little endian? and see also https://retrocomputing.stackexchange.com/questions/2008/were-people-building-cpus-out-of-ttl-logic-prior-to-the-4004-8080-and-the-6800/8664#8664 Apparently Datapoint had wanted Intel to build a bit-serial machine and storing the jump-target in LSB-first order was kind of to keep them happy even though the CPU ended up not being bit-serial.)

This obviously has no relevance when doing separate accesses to separate words.

The "advantage" cited by wikipedia is more like a "fun fact", not something that's really worth anything. Bending an ISA out of shape to get it makes no sense when it makes anything else worse or more expensive, or even just harder for humans to work with. Only if you're building a CPU that decodes instructions a byte at a time or something, and can overlap fetch with decode if decode was going to be multi-cycle anyway (because carry propagates from low bits to high bits).

Although you could have made the same argument about building the first little-endian CPU in the first place, when people considered big-endian to be "natural" at the time.

Your proposed design would make the address of a word be the address of its least-significant byte. (I think).

That's more like little-endian with everything about memory addressing reversed/flipped/negated.

Otherwise it's just a software convention to return a pointer to the one-past-the-end of an allocation, which is obviously less convenient because it requires an offset to use. But if you return a pointer to the last word of an allocation, how do you know the caller wanted to treat it as words instead of bytes? malloc returns a void*. If you return a pointer to the last byte of an allocation, you have to do math to get a pointer to the last word.

So unless you do reversed-little-endian, returning anything other than a pointer to the first (or only) byte/word/doubleword/float/whatever of the allocated buffer is obviously worse, especially given an allocator like malloc that doesn't know the element size its caller is going to use to access the memory.

C's machine model is barely compatible with a reversed-little-endian system, I think. You'd want arr[i] to mean *(arr - i) instead of arr + i, and indexed addressing modes would probably support - instead of +. Then arr[i] can work transparently with a malloc that returns a pointer to the end. But C defines x[y] in terms of *(x+y), and there is code that would notice the difference and break.

Or else you'd want to count a negative index up towards zero to loop from low to high addresses, if addressing still worked like normal?

If your "normal" use case was for(i=0; i<n ; i++) and accessing arr[-i], that could work sort of the same as on a normal machine. But then you need to modify your C source to make this work on such a machine.

Or if you wanted to write loops like for(i=0 ; i>=-n ; i--) then you your largest index becomes negative while your size is still positive. This just seems much more confusing.

(@Alexei Martianov's answer raises a good point: the CPU would probably need a binary subtractor inside address-generation units and other places where normal CPUs use an adder. I think a subtractor typically requires slightly more hardware than an adder. This is outside the main ALU, which of course has to be able to do both to support efficient integer math.)