Search code examples
c#hindidevanagari

how to convert Hindi numbers ( २०७४) to numeric value in c#?


I have a large group of Hindi numbers which i want to convert into numeric values but i don't know how to convert them . Please suggest me appropriate way to achieve this. Note Please don't suggest me replace method.

eg. convert this number २०७४ to equivalent to 2074.


Solution

  • I believe this is what you're after but be aware that this code is written by someone who doesn't speak Hindi, read Hindi or know Hindi.

    I found the digits on the wikipedia page but I absolutely have no idea what I'm doing.

    The google page (which I found by just googling for the individual digits from the original string in the question) seems to indicate the following:

    • The digits for 0-9 are ०१२३४५६७८९
      • I clicked on a link and used the last character of the url as the digit
      • Note that 4 had to be gotten as the second digit of 14, and there seems to be a disambiguity suffix on that link as well
    • They have unicode code points ranging from 2406 through 2415, in that order
    • The double digits numbers follow the system to a tee, so it seems to be just a 10-digit numeric system using different code points
      • But note that there are far too few examples for me to be absolutely certain this holds true for all numbers

    If anyone pokes hole in this answer I will take it down but feel free to grab all the code from it first if you think it can be improved.

    Also bear in mind that the OP explicitly asked for a non-replace method. The whole thing can probably be written in a oneliner with that but since that doesn't seem to be an acceptable answer then here we are.

    With all that said, here's a non-string-replace version that mimicks basic numeric parsing using different symbols:

    Note: There's about 7 tons of error-handling that isn't present here, such as empty strings, etc.

    public static bool TryParseHindiToInt32(string text, out int value)
    {
        const int codePointForZero = 2406;
        const int codePointForNine = codePointForZero + 9;
    
        int sign = +1;
    
        int index = 0;
        if (index < text.Length && text[index] == '-') // todo: hindi minus?
        {
            index++;
            sign = -1;
        }
    
        value = 0;
        while (index < text.Length)
        {
            char c = text[index];
            if (c < codePointForZero || c > codePointForNine)
            {
                value = 0;
                return false;
            }
    
            if ((uint)value > 214748364u)
            {
                value = 0;
                return false;
            }
    
            value *= 10;
            value += (c - codePointForZero);
            index++;
        }
    
        value *= sign;
        return true;
    }
    

    Test:

    string digits = "२०७४";
    TryParseHindiToInt32(digits, out int i);
    Console.WriteLine(i);
    

    Outputs:

    2074