I have a column in a data frame called MARKET_VALUE that I need to pass to a downstream system in a defined format. MARKET_VALUE, a float, needs to be passed as two integer columns (significand, with no trailing zeros and exp) as follows
MARKET VALUE SIGNIFICAND EXP
6.898806e+09 6898806 3
6.898806e+05 6898806 -1
6.898806e+03 6898806 -3
I contemplated using formatted strings but am convinced there must be a smarter solution. The data frame is large, containing millions of rows, so a solution that doesn't depend on apply would be preferable.
I use a DataFrame consiting in 1e5 rows (you could try with more to test the bottleneck)
import pandas as pd
import numpy as np
df=pd.DataFrame(np.random.random((100000,2))**10, columns=['random1', 'random2'])
In this case I use the standard python formatting.
# get the signficand
df.random1.apply(lambda x: f'{x:.8E}'[:-4].replace('.', ''))
# get the exp
df.random1.apply(lambda x: int(f'{x:.0E}'[-3:]))
On my laptop it took less than 100ms.
I am thinking about faster solution (vectorized one), but for now I hope that this can help.