I am trying to translate the following C code, which basically just tries to convert an arbitrary integer value into a character from a pool of characters, into PHP:
#include <cstdint>
#include <cstring>
#include <iostream>
uint8_t GetCharacter(uint32_t value) {
static const char* valid_characters = "0123456789ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw";
static const size_t valid_characters_l = strlen(valid_characters);
uint8_t c = valid_characters[value % valid_characters_l];
return valid_characters[(value << c) % valid_characters_l];
}
int main() {
uint32_t array[] = {176, 52, 608, 855};
for (size_t i=0; i < 4; i++) {
uint8_t c = GetCharacter(array[i]);
std::cout << array[i] << ": " << (uint32_t) c << "\n";
}
return 0;
}
Which yields
176: 109
52: 114
608: 85
855: 65
The PHP code I've been able to come up with however yields the following:
176: 109
52: 114
608: 85
855: 104 // << Here's the problem
I am very sure I translated it exactly and I am unable to find the issue.
<?php
function getCharacter($index) {
$chars = "0123456789ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw";
$c = ord(substr($chars, $index % strlen($chars)));
return ord(substr($chars, ($index << $c) % strlen($chars)));
}
function main() {
$array = array(176, 52, 608, 855);
foreach ($array as $value) {
echo "$value: " . getCharacter($value) . "\n";
}
}
main();
Could someone point me into the right direction to solve this problem?
I believe that the problem is that the number ($index << c)
is 3,586,129,920
which is > 2 billion, and cannot be properly represented by a signed 32 bit integer. Since you don't explicitly define the data type of $value
in php, I think the arithmetic ends up being implementation dependent.
Actually it is surprising that things work at all - you are shifting a 32 bit number by a value greater than 32 which is going to lead to undefined behavior, I think. You might want to re-think the underlying math, and in particular consider the underflow / overflow behavior of your code.
As a potential solution, you might notice that you have a finite number of possible inputs and corresponding outputs - you could actually create a direct lookup table. I believe I did this correctly (using the C++ version of your code with some modifications) - it surprised me a little bit that it didn't result in a 1:1 mapping. The lookup string becomes:
$lookupString = "6RQtrpp07TU4AP1IDKmjl8QD7WjitmwUAcjT3AT9MuAu3PUKJtIb5vS"
And your php code can be reduced to
$value = ord(substr($lookupString, $input % 55));
Where 55
is the length of the lookupString
.
Interesting observation: a number of characters appear more than once; other characters are never used. This means that this is not a very "good" encoding scheme (if that is what it is trying to be).
For reference, this is the code I used to determine the lookup string:
#include <cstring>
#include <iostream>
static const char* valid_characters = "0123456789ABCDEFGHIJKLMOPQRSTUVWabcdefghijklmnopqrstuvw";
uint8_t GetCharacter(uint32_t value) {
static const size_t valid_characters_l = strlen(valid_characters);
uint8_t c = valid_characters[value % valid_characters_l];
return valid_characters[(value << c) % valid_characters_l];
}
int main() {
uint32_t array[] = {176, 52, 608, 855};
for (size_t i=0; i < 55; i++) {
uint8_t c = GetCharacter(i + '0');
std::cout << char(c);
}
std::cout << "\n";
return 0;
}