Search code examples
phpc++code-translation

Translate this C function to PHP


I am trying to translate the following C code, which basically just tries to convert an arbitrary integer value into a character from a pool of characters, into PHP:

#include <cstdint>
#include <cstring>
#include <iostream>

uint8_t GetCharacter(uint32_t value) {
    static const char* valid_characters = "0123456789ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw";
    static const size_t valid_characters_l = strlen(valid_characters);
    uint8_t c = valid_characters[value % valid_characters_l];
    return valid_characters[(value << c) % valid_characters_l];
}

int main() {
    uint32_t array[] = {176, 52, 608, 855};
    for (size_t i=0; i < 4; i++) {
        uint8_t c = GetCharacter(array[i]);
        std::cout << array[i] << ": " << (uint32_t) c << "\n";
    }
    return 0;
}

Which yields

176: 109
52: 114
608: 85
855: 65

The PHP code I've been able to come up with however yields the following:

176: 109
52: 114
608: 85
855: 104   // << Here's the problem

I am very sure I translated it exactly and I am unable to find the issue.

<?php

function getCharacter($index) {
    $chars = "0123456789ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw";
    $c = ord(substr($chars, $index % strlen($chars)));
    return ord(substr($chars, ($index << $c) % strlen($chars)));
}

function main() {
    $array = array(176, 52, 608, 855);
    foreach ($array as $value) {
        echo "$value: " . getCharacter($value) . "\n";
    }
}

main();

Could someone point me into the right direction to solve this problem?


Solution

  • I believe that the problem is that the number ($index << c) is 3,586,129,920 which is > 2 billion, and cannot be properly represented by a signed 32 bit integer. Since you don't explicitly define the data type of $value in php, I think the arithmetic ends up being implementation dependent.

    Actually it is surprising that things work at all - you are shifting a 32 bit number by a value greater than 32 which is going to lead to undefined behavior, I think. You might want to re-think the underlying math, and in particular consider the underflow / overflow behavior of your code.

    As a potential solution, you might notice that you have a finite number of possible inputs and corresponding outputs - you could actually create a direct lookup table. I believe I did this correctly (using the C++ version of your code with some modifications) - it surprised me a little bit that it didn't result in a 1:1 mapping. The lookup string becomes:

    $lookupString = "6RQtrpp07TU4AP1IDKmjl8QD7WjitmwUAcjT3AT9MuAu3PUKJtIb5vS"
    

    And your php code can be reduced to

    $value = ord(substr($lookupString, $input % 55));
    

    Where 55 is the length of the lookupString.

    Interesting observation: a number of characters appear more than once; other characters are never used. This means that this is not a very "good" encoding scheme (if that is what it is trying to be).

    For reference, this is the code I used to determine the lookup string:

    #include <cstring>
    #include <iostream>
    
    static const char* valid_characters = "0123456789ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw";
    
    uint8_t GetCharacter(uint32_t value) {
        static const size_t valid_characters_l = strlen(valid_characters);
        uint8_t c = valid_characters[value % valid_characters_l];    
        return valid_characters[(value << c) % valid_characters_l];
    }
    
    int main() {
        uint32_t array[] = {176, 52, 608, 855};
        for (size_t i=0; i < 55; i++) {
            uint8_t c = GetCharacter(i + '0');
            std::cout << char(c);
        }
        std::cout << "\n";
        return 0;
    }