types compiler-construction programming-languages declaration

Typed vs typeless language

I am learning C and assembly language together. I noticed that assembly is a type-less language compared to the C which requires declaration of the data type before data can be processed. But also I learnt that even a code written in C is first compiled into a code written in assembly and then assembled into object code. So that means data type declaration we use in C or any high level language is only meant for the ease of C compiler. They don't have any special bearing on the object code. Is that correct?

What I gathered is that type declaration tells compiler what all operations can be done on the data, size of the data (required to store data in data segment), size of the max and min decimal numbers that can be stored. Am I right to say so?

Are there are any other benefits of type declaration?

Solution

In C, we have a simple type system but it's not particular sophisticated, can be (and frequently is, in practice) circumvented with completely unchecked casts, etc. (For these reasons, C is often called "weakly typed", which is ill-defined and usually used for bashing a language, but at least tells that types aren't that important.) The size, layout and alignment of types isn't fixed, although it will generally be consistent with the same compiler (version) on the same platform. An int may be 14 bits large, etc., these things aren't guaranteed (except that the C standards requires some ordering between the the basic integral types, e.g. short must not be larger than int).

The programmer doesn't know the details, but the compiler does and in fact needs to. For example, the exact code generated for foo.y where struct Foo { int x; short y; }; and struct Foo foo; depends e.g. on the exact sizes of int and short and on the padding of struct Foo, as it compiles down to "take the adress of foo, add the offset of y in struct Foo, and use that". Even the struct Foo foo; requires exact knowledge of struct Foo (and, recursively, the types it is composed of) - the generated code must know the exact sizeof(struct Foo) to reserve the right number of bytes on the stack? Similarily, type declarations are needed to know which opcodes to use for math (iadd or fadd or addition? Must one of the operands be extended, and to what size?), comparisions, the step size when doing pointer arithmetic (p + n actually adds n * sizeof(*p)), etc. This also prevents access to nonexistent members (and, by extension, passing values to functions which would then run into this problem - i.e. type mismatch), but that's more like a convenient side effect - the compiler considers it an error because wouldn't know what code to emit, not because it believes programmers are like children that must be watched and kept in order.

In assembly language (usually - just yesterday I read about a project at Microsoft Research that develops a typed, verifyable assembly language for an OS that's safe against certain bugs by construction), you don't really have types. You have bytes. You take N bytes from some location, do some stuff to them, and store them to some location. Yeah, registers are fixed to some word size and some may be intended for special kinds of values (e.g. dedicated floating point registers with 80 or more bits), but basically, you can store anything you like anywhere. Nobody's stopping you from storing 8 bytes somewhere, later reading only the latter 4 bytes and adding them with your loop counter to form an adress to store the return value at.

In other languages, the type system is much stronger while allowing a huge range of extensions that allow higher-level programming, such as abstracting away the exact types (and hence, their layout and typing) and just taking any type that fullfills a certain contract. It allows type signatures such as [a] -> a, which is a function taking a list containing any kind of value (as long as it's homogenous, e.g. a list of integers, a list of strings, a list of lists of chars, etc.) and returns one of its elements, without "erasing" (e.g. casting to void *) the type. (Depending on the implementation, it may actually generate several implementations, each for a single type with a known layout, for performance - but that doesn't leak through to the programmer.)