I'm looking for help in splitting my data. My data has spaces as thousand separators but there's also spaces between my timestamps.
This is an example of what the data looks like (this is currently 1 column):
Date/Time Var1 Var2 Var3 Var4 Var5 Var6
17/04/2020 00:00:00 133 579.20 31 978.90 377 952.81 179 412.41 203 595.20 70 447.40
17/04/2020 01:00:00 133 583.70 31 980.40 377 963.41 179 412.41 203 595.20 70 448.20
I would need it to look like this:
Date/Time Var1 Var2 Var3 Var4 Var5 Var6
17/04/2020 00:00:00 133579.20 31978.90 377952.81 179412.41 203595.20 70447.40
17/04/2020 01:00:00 133583.70 31980.40 377963.41 179412.41 203595.20 70448.20
I'm trying to go around this an odd and over-complicated way I believe. In one instance, I removed all whitespaces and then to split the dates I did:
df.iloc[:,3] = df.iloc[:,3].str.replace('/2020', '/2020 ').str.replace(':00:00', ':00:00 ')
And for then splitting all the numbers, I was trying to do something such as reading every character in a string and once it found a dot, add a whitespace 2 strings ahead but I didn't manage to get that to work.
for i in range(len(df)):
for j in df.iloc[i,:]:
for k in j:
if k == '.':
#to be continued
Surely there's a much quicker way to get around this. Could anyone give me a hand?
Assuming df
is your present dataframe and it has one column named simply 'D'
(if it is not 'D'
, change accordingly):
tmplist = df['D'].str.findall(r'(.+?[:.]\S+\s+)').to_list()
tmplist = [ [ e.replace(' ','') if i>0 else e.rstrip() for i, e in enumerate(row) ] for row in tmplist ]
col = ['Date/Time'] + [ 'Var{}'.format(i) for i in range(1,len(tmplist[0])) ]
df = pandas.DataFrame(tmplist, columns=col)
The first line converts the dataframe into a list of lists, splitting your strings as needed. A certain trick is used: a dot or colon followed by digits is the last of space-separated items that belong to one column (colon for timestamp, dot for floating-point numbers).
The second one removes all spaces from all columns except the first (timestamps), where it only removes trailing spaces.
Next one creates columns names according to your wish.
The last one rebuilds the dataframe from the list.