Search code examples
pythonpython-3.xpandasdataframenumber-formatting

(Python) How to fix numerical representation error in dataframe column values


Just a (somewhat) quick question - if I have a dataframe with a column consisting of numbers of the form 1.305.000, 4.65, 99.9, 443.111.34000, how can I convert them to the 'correct' format: 1305.000, 4.65, 99.9, 443111.34000?

If it helps, the the values were obtained from a .csv file, from one of its columns, say 'Total Net Revenue':

In code block form:

Day Service Total Net Revenue
0   1   te  1.305.000
1   1   as  4.65
2   2   qw  99.9
3   3   al  443.111.34000
4   6   al  443.111.34000
5   6   te  1.305.000
6   7   pp  200
7   7   te  1.305.000
8   7   al  443.111.34000
9   7   te  1.305.000

And another form based on feedback:

[{'Day': 1, 'Service': 'te', 'Total Net Revenue': '1.305.000'},
 {'Day': 1, 'Service': 'as', 'Total Net Revenue': '4.65'},
 {'Day': 2, 'Service': 'qw', 'Total Net Revenue': '99.9'},
 {'Day': 3, 'Service': 'al', 'Total Net Revenue': '443.111.34000'},
 {'Day': 6, 'Service': 'al', 'Total Net Revenue': '443.111.34000'},
 {'Day': 6, 'Service': 'te', 'Total Net Revenue': '1.305.000'},
 {'Day': 7, 'Service': 'pp', 'Total Net Revenue': '200'},
 {'Day': 7, 'Service': 'te', 'Total Net Revenue': '1.305.000'},
 {'Day': 7, 'Service': 'al', 'Total Net Revenue': '443.111.34000'},
 {'Day': 7, 'Service': 'te', 'Total Net Revenue': '1.305.000'}]

I can't seem to find any reference on this, and some insight will be deeply appreciated. Thanks!


Solution

  • This isn't quite a pandas question, it's really asking about turning odd-looking strings into numbers (tag: number-formatting).

    The following function will turn those strings into the desired numbers:

    import unittest
    
    
    def cleanup(s: str) -> float:
        parts = s.split('.')
        if len(parts) > 1:
            s = ''.join(parts[:-1]) + '.' + parts[-1]
        return float(s)
    
    
    class TestCleanup(unittest.TestCase):
    
        def test_cleanup(self):
            self.assertEqual(200, cleanup('200'))
            self.assertEqual(4.65, cleanup('4.65'))
            self.assertEqual(1305, cleanup('1.305.000'))
            self.assertEqual(443111.34, cleanup('443.111.34000'))
    

    You might consider using Decimal if those are currency figures, which motivates a "scaled integer" approach.

    It's a simple matter to .apply() the cleanup() function to an existing dataframe:

    df['numeric_revenue'] = df['total_net_revenue'].apply(cleanup)