Understanding low-level abstraction

I have started programming in Java this year. I understand the high level concepts and feel comfortable programming.

However I seem to keep asking me how does all of this work internally? I understand that Java is a high-level language especially made to get the programmer away from low-level stuff to alleviate development.

In essence I would like to know more about how exactly high-level languages function internally (e.g. object oriented programming). It's clear to me why they are used, but now how everything works internally (memory allocation etc.). How are objects presented internally etc.

Can someone point me into the right direction with some keywords or preferably refer to some material? Would learning a low-level language like C or C++ help this learning process?

Solution

Based on the wording of your question, your low-level is still very high level.

Object oriented has nothing to do with highness nor lowness of the langauge, it just means orent on objects, you can have object oriented assembly. It is not a language thing basically any language can be used in an object oriented way.

Memory allocation is specific to the operating system and/or whomever is managing the memory. Nothing complicated there really at a high level. I have a pizza, and 3 people, I can cut that pizza up in 3 slices or 4 or 8 or whatever, each person can allocate one slice and there are some left over, they can come back and allocate more. Now freeing that pizza allocation after consumption is not something we want to visualize. But the idea is the same, you have some memory you want to allow a program to borrow/take. you divide it up, doesnt have to be all even sizes. you might offer various sizes 1K, 2K, 4K, 8K...1Meg units, etc. and multiples of those. you create a table/chart of who has consumed what, and what is left free. when then give it back you mark them free. Old school linear thinking can make this hard but MMUs (memory management units) make this easy. And that is low or lower level thinking. They are address translators along with protection features to prevent programs from accessing memory that isnt theirs.

An easy way to see what an MMU does for is from a memory allocation perspective is think of all the free to borrow/take memory is in units of 0x1000 bytes. Say starting at address 0x10000, so 0x10000, 0x11000, 0x12000 and so on. That is the physical address the actually memory side. But we can have a virtual address space as well. I may ask for 0x3000 bytes, and may be give a pointer 0x20000000. When I access between 0x20000000 and 0x20000FFF the mmu may translate that virtual address into physical address 0x00007000 to 0x00007FFF. But 0x20001000 to 0x20001FFF may translate to physical 0x00004000 to 0x00004FFF. And naturally 0x20002000 to some other physical address. So if someone allocates 10 blocks another allocates 3, the software that manages that allocation can give the first 10 physical blocks to the first program, and the next 3 after that to the next, if the first frees then someone allocates 7 the first 7 physical can be given to that new someone giving us a map of first 7 used, 3 free, and 3 used in a physical linear view. If someone now allocates 4 we can actually give them the 3 and another one at the end because we can map them in virtual space so they feel like they are accessing them linearly.

If I have a list of students listed alphabetically that doesnt mean that their dorm room numbers match linearly. alphabetically student number 1 on the list doesnt have to live in dorm room number 1. I have a table that maps their name to their dorm room. If we add a student in the middle of the list alphabetically, doesnt mean that we have to shuffle all the dorm room numbers, we just need a table. So someone can be given 5 names out of the alphabetical list to work on a project, that doesnt mean they are in 5 adjacent dorm rooms, when needing to talk to each of those five students we can use a table of name to dorm room to find them. Virtual address is the alphabetical list, physical address is the dorm room those folks live in. manage the tables and a program can access what it thinks is linear memory space, but is really just fragments spread about. You dont have to "defrag" memory as it is allocated and freed. Without an mmu, it gets very messy.

Low level stuff that a high level language avoids is the nuances of the processor. I can go through the drive through and order a burger, or I can go buy buns, meat, pickles, tomatoes, lettuce, ketchup, etc and then cook and assemble a burger myself. a = b + c in a high level language can end up being a number of memory and/or register accesses to save one or more registers to the stack so you can free up registers to gather up those values where they are stored in memory (if not already in said registers) to perform the operation, the now or later save the result to memory as needed. system calls like printing or file access or network or video, etc, tons of code doing small individual tasks to make the whole. All the bricks and boards and nails and cement and such that it takes to make a building, like a burger, can just buy a house that someone (the compiler) built, or I can buy five zillion tools and materials and construct that house shaping and combing those materials in the right order.

The high level language gives you abstraction as well. This is C but I bet you can understand it.

unsigned int fun ( unsigned int a, unsigned int b )
{
    return(a+b+7);
}

I can compile it into its pickles and lettuce and bun ingredients along with the knives and frying pans that put it all together:

00000000 <fun>:
   0:   e52db004    push    {fp}        ; (str fp, [sp, #-4]!)
   4:   e28db000    add fp, sp, #0
   8:   e24dd00c    sub sp, sp, #12
   c:   e50b0008    str r0, [fp, #-8]
  10:   e50b100c    str r1, [fp, #-12]
  14:   e51b2008    ldr r2, [fp, #-8]
  18:   e51b300c    ldr r3, [fp, #-12]
  1c:   e0823003    add r3, r2, r3
  20:   e2833007    add r3, r3, #7
  24:   e1a00003    mov r0, r3
  28:   e24bd000    sub sp, fp, #0
  2c:   e49db004    pop {fp}        ; (ldr fp, [sp], #4)
  30:   e12fff1e    bx  lr

I can be a lot more efficient McDonalds instead of a greasy spoon diner:

00000000 <fun>:
   0:   e2811007    add r1, r1, #7
   4:   e0810000    add r0, r1, r0
   8:   e12fff1e    bx  lr

Or I can use the same code on a completely different computer:

00000000 <_fun>:
   0:   1166            mov r5, -(sp)
   2:   1185            mov sp, r5
   4:   1d40 0006       mov 6(r5), r0
   8:   65c0 0007       add $7, r0
   c:   6d40 0004       add 4(r5), r0
  10:   1585            mov (sp)+, r5
  12:   0087            rts pc

And yes with the right tools (gnu works just fine) you can easily take C/C++ and start to see the above and try to understand it. What the language is doing for you. When it comes to system calls like printf or file access, etc. The application calls library functions which are other code linked in, and those eventually ask the operating system to go do that task (using your credit card to buy that burger rather than cash, the cashier now has to swipe the card in a box, the box talks to banks somewhere in the world please do this transaction for me, rater than opening a drawer and the cashier takes care of it). Adding a couple of numbers usually doesnt involve the operating system, but accessing a controlled or complicated or shared resource like video or disk, etc, you have to ask the operating system to that for you and that is language, compiler and operating system specific.

Java and python (early pascal, etc) abstract that by compiling to a machine code that is not actually implemented nor implementable in hardware directly. Then having a platform and operating specific virtual machine (written in some other language like C) that reads those java bytecodes then performs that task, some of the tasks being push b, push c, add (a), and some being go read a file. It is possible to disassemble and see what JAVA is producing at the bytecode level, but easier to do with compiled languages.

javiergarval's answer the Tanenbaum book(s) or ones like it may cover what you are after initially the middle layer, the operating system. But depending how low you want to go, gets down into assembly language then further down into logic and busses.

You might consider the book Code: The Hidden Language of Computer Hardware and Software by Petzold. To come from the other direction.