Search code examples
pythonexcelpandasxlsx

Manipulate excel with python pandas


I'm having issues with an excel with a weird format excel format, I was looking to put them in a suitable format with python pandas since right now they are separated by days, and it should be all followed like this suitable format. When I read it with pandas using read_excel, I want to unify them and remove the first title-date, from this:

Unnamed: 1
NaN NaN
04Oct2020 (Sunday)  NaN
date & time         cars
04/10/2020 00:00:00 1
04/10/2020 00:01:00 2

to the suitable form something like this:

date & time         cars
04/10/2020 00:00:00 1
04/10/2020 00:01:00 2
.
.
05/10/2020 00:00:00 1

(dots are to show that the days are unified). How can I do it? I have not succeeded, any help is appreciated!


Solution

  • A very hacky way that should work for your dataset.

    import pandas as pd
    
    exclude = ["Mean", "STDEV", "Median", "Min", "Max", "date & time"]
    df = pd.read_excel("test.xls", names = ["date_time", "cars"])
    df = df[~df.date_time.isin(exclude)].dropna()
    df.to_excel("testoutput.xls", index=False)
    

    Which will write this output to a new Excel file, dropping the row indices.

                  date_time cars
    1   2020-10-04 00:00:00    1
    2   2020-10-04 00:01:00    2
    3   2020-10-04 00:02:00    3
    4   2020-10-04 00:03:00    4
    5   2020-10-04 00:04:00    5
    6   2020-10-04 00:05:00    6
    7   2020-10-04 00:06:00    7
    17  2020-10-05 00:00:00    1
    18  2020-10-05 00:01:00    2
    19  2020-10-05 00:02:00    3
    20  2020-10-05 00:03:00    4
    21  2020-10-05 00:04:00    5
    22  2020-10-05 00:05:00    6
    23  2020-10-05 00:06:00    7
    24  2020-10-05 00:07:00    8
    25  2020-10-05 00:08:00    9
    26  2020-10-05 00:09:00   10