Search code examples
cmemory-managementcross-platformlow-levelvm-implementation

C memory management for Cross-platform VM


I asked a question about C-type sizes which I get a pretty good answer but I realized that I may not formulate the question very well to be useful for my purpose.

My background was from Computer Engineer before moves to Software Engineer so I like computer architectures and always thinking about making VM. I've just finished an interesting project making VM on Java which I am quite proud with. But there is some legal problems I can't open-source it now and I am currently have some free time. So I want to see if I can make another VM on C (with better speed) just for fun and educational.

The thing is I am not a C program the last time I wrote a non-trivia C problem was more than 10 years ago. I was Pascal, Delphi, and now Java and PHP programmer.

There are number of obstacles I can foresee and I am trying to tackle one and that is accessing existing library (in Java, reflection solves this problem).

I plan to solve this by having a buffer of data (similar to stack). The client of my VM can program to put data into these stack before giving me to pointer to native function.

int main(void) {
    // Prepare stack
    int   aStackSize = 1024*4;
    char *aStackData = malloc(aStackSize);

    // Initialise stack
    VMStack aStack;
    VMStack_Initialize(&aStack, (char *)aStackData, aStackSize);

    // Push in the parameters
    char *Params = VMStack_CurrentPointer(&aStack);
    VMStack_Push_int   (&aStack, 10  ); // Push an int
    VMStack_Push_double(&aStack, 15.3); // Push a double

    // Prepare space for the expected return
    char *Result = VMStack_CurrentPointer(&aStack);
    VMStack_Push_double(&aStack, 0.0); // Push an empty double for result

    // Execute
    void (*NativeFunction)(char*, char*) = &Plus;
    NativeFunction(Params, Result); // Call the function

    // Show the result
    double ResultValue = VMStack_Pull_double(&aStack); // Get the result
    printf("Result:  %5.2f\n", ResultValue);               // Print the result

    // Remove the previous parameters
    VMStack_Pull_double(&aStack); // Pull to clear space of the parameter
    VMStack_Pull_int   (&aStack); // Pull to clear space of the parameter

    // Just to be sure, print out the pointer and see if it is `0`
    printf("Pointer: %d\n", aStack.Pointer);

    free(aStackData);
    return EXIT_SUCCESS;
}

The push, pull and invocation of native function can be triggered by a byte code (that is how VM will later be made).

For the sake of completeness (so that you can try it on you machine), here is the code for Stack:

typedef struct {
    int  Pointer;
    int  Size;
    char *Data;
} VMStack;

inline void   VMStack_Initialize(VMStack *pStack, char *pData, int pSize) __attribute__((always_inline));
inline char   *VMStack_CurrentPointer(VMStack *pStack)                    __attribute__((always_inline));
inline void   VMStack_Push_int(VMStack *pStack, int pData)                __attribute__((always_inline));
inline void   VMStack_Push_double(VMStack *pStack, double pData)          __attribute__((always_inline));
inline int    VMStack_Pull_int(VMStack *pStack)                           __attribute__((always_inline));
inline double VMStack_Pull_double(VMStack *pStack)                        __attribute__((always_inline));

inline void VMStack_Initialize(VMStack *pStack, char *pData, int pSize) {
    pStack->Pointer = 0;
    pStack->Data    = pData;
    pStack->Size    = pSize;
}

inline char *VMStack_CurrentPointer(VMStack *pStack) {
    return (char *)(pStack->Pointer + pStack->Data);
}

inline void VMStack_Push_int(VMStack *pStack, int pData) {
    *(int *)(pStack->Data + pStack->Pointer) = pData;
    pStack->Pointer += sizeof pData; // Should check the overflow
}
inline void VMStack_Push_double(VMStack *pStack, double pData) {
    *(double *)(pStack->Data + pStack->Pointer) = pData;
    pStack->Pointer += sizeof pData; // Should check the overflow
}

inline int VMStack_Pull_int(VMStack *pStack) {
    pStack->Pointer -= sizeof(int);// Should check the underflow
    return *((int *)(pStack->Data + pStack->Pointer));
}
inline double VMStack_Pull_double(VMStack *pStack) {
    pStack->Pointer -= sizeof(double);// Should check the underflow
    return *((double *)(pStack->Data + pStack->Pointer));
}

On the native function side, I created the following for testing purpose:

// These two structures are there so that Plus will not need to access its parameter using
//    arithmetic-pointer operation (to reduce mistake and hopefully for better speed).
typedef struct {
    int    A;
    double B;
} Data;
typedef struct {
    double D;
} DDouble;

// Here is a helper function for displaying void PrintData(Data *pData, DDouble *pResult) { printf("%5.2f + %5.2f = %5.2f\n", pData->A*1.0, pData->B, pResult->D); }

// Some native function void Plus(char* pParams, char* pResult) { Data *D = (Data *)pParams; // Access data without arithmetic-pointer operation DDouble *DD = (DDouble *)pResult; // Same for return DD->D = D->A + D->B; PrintData(D, DD); }

When executed, the above code returns:

10.00 + 15.30 = 25.30
Result:  25.30
Pointer: 0

This work well on my machine (Linux x86 32bits GCC-C99). It will be very nice if this works on other OS/Architecture too. But there are AT LEAST three memory-related issures we have to be aware of.

1). Data size - It seems like if I compile both VM and native functions using the same compiler on the same architecture, the size types should be the same.

2). Endianness - Same with Data size.

3). Memory alignment - Which is the problem as padding-bytes may be added in struct but it is hard to synchronize it when prepare parameter stack as (there is no way to know how padding is added except for hard coding).

My questions are:

1). If I know the size of the types, is there a way to modify push and pull function to exactly synchronize with struct padding? (modify to let compiler takes care of it like Datasize and Endians problems).

2). If I pack structure by one (using #pragma pack(1)); (2.1) Will the performance penalty be acceptable? and (2.2) Will the program stability be at risk?

3). How about padding by 2,4, or 8? Which should be good for general 32 or 64 bits system?

4). Can you guide me to a documentation for an exact padding algorithm let say for GCC on x86?

5). Is there is a better way?

NOTE: Cross-platform it is not my ultimate goal but I can't resist. Also, performance is not my target as soon as it is not so ugly. All these are for fun and learning.

Sorry for my English and the very long post.

Thanks everyone in advance.


Solution

  • Tangential Comments

    These first items are tangential to the questions you asked, but...

    // Execute
    void (*NativeFunction)(char*, char*) = &Plus;
    NativeFunction(Params, Result); // Call the function
    

    I think you should probably be using 'void *' instead of 'char *' here. I would also have a typedef for the function pointer type:

    typedef void (*Operator)(void *params, void *result);
    

    Then you can write:

    Operator NativeFunction = Plus;
    

    The actual function would be modified too - but only very slightly:

    void Plus(void *pParams, void *pResult)
    

    Also, you have a minor naming problem - this function is 'IntPlusDoubleGivesDouble()', rather than a general purpose 'add any two types' function.


    Direct answers to the questions

    1). If I know the size of the types, is there a way to modify push and pull function to exactly synchronize with struct padding? (modify to let compiler takes care of it like Datasize and Endians problems).

    There isn't an easy way to do that. For example, consider:

    struct Type1
    {
         unsigned char byte;
         int           number;
    };
    struct Type2
    {
         unsigned char byte;
         double        number;
    };
    

    On some architectures (32-bit or 64-bit SPARC, for example), the Type1 structure will have 'number' aligned at a 4-byte boundary, but the Type2 structure will have 'number' aligned on an 8-byte boundary (and might have a 'long double' on a 16-byte boundary). Your 'push individual elements' strategy would bump the stack pointer by 1 after pushing the 'byte' value - so you would want to move the stack pointer by 3 or 7 before pushing the 'number', if the stack pointer is not already appropriately aligned. Part of your VM description will be the required alignments for any given type; the corresponding push code will need to ensure the correct alignment before pushing.

    2). If I pack structure by one (using #pragma pack(1)); (2.1) Will the performance penalty be acceptable? and (2.2) Will the program stability be at risk?

    On x86 and x86_64 machines, if you pack the data, you will incur a performance penalty for the misaligned data access. On machines such as SPARC or PowerPC(per mecki), you will get a bus error or something similar instead - you must access the data at its proper alignment. You might save some memory space - at a cost in performance. You'd do better to ensure performance (which here includes 'performing correctly instead of crashing') at the marginal cost in space.

    3). How about padding by 2,4, or 8? Which should be good for general 32 or 64 bits system?

    On SPARC, you need to pad an N-byte basic type to an N-byte boundary. On x86, you will get best performance if you do the same.

    4). Can you guide me to a documentation for an exact padding algorithm let's say for GCC on x86?

    You would have to read the manual.

    5). Is there is a better way?

    Note that the 'Type1' trick with a single character followed by a type gives you the alignment requirement - possibly using the 'offsetof()' macro from <stddef.h>:

    offsetof(struct Type1, number)
    

    Well, I would not pack the data on the stack - I would work with the native alignment because that is set to give the best performance. The compiler writer does not idly add padding to a structure; they put it there because it works 'best' for the architecture. If you decide you know better, you can expect the usual consequences - slower programs that sometimes fail and are not as portable.

    I am also not convinced that I would write the code in the operator functions to assume that the stack contained a structure. I would pull the values off the stack via the Params argument, knowing what the correct offsets and types were. If I pushed an integer and a double, then I'd pull an integer and a double (or, maybe, in reverse order - I'd pull a double and an int). Unless you are planning an unusual VM, few functions will have many arguments.