How to define memory pointers when programming AVR chips?

Preamble: after working a couple of years as application developer, the world of the software engineering became more obscure than it was before. The reason is that the real stuff is hidden under zillions layers of abstractions: OS, frameworks, etc. The young generation is deprived of the pleasure of working with PDP-like machines where all programming was done via electrical switch toggling. Another problem is the ephemeral nature of modern programming languages. Once there was Python 2.x, now it is deprecated and there is Python 3.x which in its turn will be deprecated in a couple of months. Idem for other languages. ANSI C looks like the Pyramid of Cheops: it was there in 70's and I don't doubt it will be there after the Sun will become a red dwarf.

It seems that now the only way to understand the interaction between the hardware and the software is to play with embedded development. From the pedagogical point of view physical chips are very handy because they allow to tackle the most difficult part of C language, namely pointers. When coding in OS environment, */& notation is still very confusing because it refers to some location somewhere inside of the virtual memory. And before you will got the understanding of what is the virtual memory, you have to read a couple of monographs about OS development, etc. You may find it stupid but I do really want to know which transistor is holding my bit right now. At least, I can wire physical pin voltage to programming abstractions.

Currently I am working with Atmel chips and WinAVR package because of numerous textbooks and accessible hardware. Though all books promise to teach AVR coding using plain C, the reality is that all pointers are hidden behind macros like PORTA, DDRB, etc. All code examples include header file 'io.h' which in its turn refers to other header files specific for a given chip like 'iomx8.h'. So far, I cannot find any macros definition in these headers. The code to increase the voltage on the physical pin 14 on Atmega168 looks like

DDRB = 0x01;
PORTB = 0x01;

Fortunately, Microchip site provides some basic documents where it is stated, for example, that if I want to rise the voltage on the physical pin 14, I need to follow these steps:

unsigned char *ddrB;
ddrB = (unsigned char*)0x24; // the address of ddrB is 0x24
*ddrB |= 0x01; // set up low impedance/ high current state for the transistor 0 

unsigned char *portB;
portB = (unsigned char*)0x25;
*portB |= 0x01; // voltage on
*portB &= ~(0x01); // voltage off

Unfortunately, this is the only info I got after one week of the lurking. Now I am going through USART programming and the things become more complicated with all these UBRR0H, UCSR0C. Since provided header files don't contain macros definitions for any register, where else can I find it?

A similar question was asked several years ago: accessing AVR registers with C?. However, provided answers were somewhat useless, besides the clue that GCC itself can map some mythical PORTB to real physical locations. Could someone describe the mechanism behind the mapping?

Solution

From a memory-mapping standpoint: The general purpose registers, special function+I/O registers, and SRAM share non-overlapping ranges a single address space, as described in datasheets for various processors in the AVR series. All of your pointers will reference this memory space, unless annotated as pointers to PROGMEM (which will cause different instructions to be emitted). The reference will be made without any sort of virtual memory mapping.

For example, the ATtiny 25/45/85 has the following map shown on page 18:

Your linker is aware of this memory map and will place variables accordingly. For example, a global variable declared in one of your compilation units will end up in an address above 0x0060 in the example device described above, so that it ends up in the SRAM.

From an instruction encoding standpoint: Although there is one address space, there is special functionality reserved for certain important regions. For example, IN and OUT instructions have six bits in their instruction encoding which can be used to directly refer to one of the 64 addresses within [0x20, 0x5F).

The IN and OUT instructions are unique in their ability to load and store to a fixed address encoded directly in the instruction, since the normal load and store instructions require an indirect load with the 'Z' register being loaded first.

As a result, when the compiler sees memory operations to a fixed I/O register, it may generate these more efficient instructions. However, a normal load/store via a pointer will have the same effect (although with different numbers of clock cycles required). For extended I/O registers that didn't fit into the first 64 (e.g. OSCCAL on an atmega328p), normal load/store instructions will always be generated.