Search code examples
pythonprecisionrounding-error

Find least significant digit in a double in Python


I have a lot of financial data stored as floating point doubles and I'm trying to find the least significant digit so that I can convert the data to integers with exponent.

All the data is finite, e.g. 1234.23 or 0.0001234 but because it's stored in doubles it can be 123.23000000001 or 0.00012339999999 etc

Is there an easy or proper approach to this or will I just have to botch it?


Solution

  • You have a couple of options,

    Firstly and most preferably, use the stdlib Decimal, not builtin float

    This fixes most errors related to floats but not the infamous 0.1 + 0.2 = 0.3...4

    from decimal import Demical
    
    print(0.1 + 0.2)  # 0.30000000000000004
    print(Decimal(0.1) + Decimal(0.2))  # 0.3000000000000000166533453694
    
    

    An alternative option if that isn't possible, is setting a tolerance for number of repeated digits after the decimal point.

    For example:

    import re
    
    repeated_digit_tolerance = 8  # Change to an appropriate value for your dataset
    repeated_digit_pattern = re.compile(r"(.)\1{2,}")
    
    def longest_repeated_digit_re(s: str):
        match = repeated_digit_pattern.search(s)
    
        string = match.string
        span = match.span()
        substr_len = span[1] - span[0]
    
        return substr_len, string
    
    def fix_rounding(num: float) -> float:
        num_str = str(num)
        pre_dp = num_str[:num_str.index(".")]
        post_dp = num_str[num_str.index(".") + 1:]
    
        repetition_length, string = longest_repeated_digit_re(post_dp)
    
        if repetition_length > repeated_digit_tolerance:
            shortened_string = string[:repeated_digit_tolerance-1]
    
        return float(".".join([pre_dp, shortened_string]))
    
    print(0.1 + 0.2) # 0.30000000000000004
    print(0.2 + 0.4) # 0.6000000000000001
    
    print(fix_rounding(0.1 + 0.2))  # 0.3
    print(fix_rounding(0.2 + 0.4))  # 0.6
    

    It's perfectly functioning code but Decimal is practially always the better option of the two, even if it wont do 0.1 + 0.2 correctly.