Search code examples
pythonexcelpandasxlsxxlwt

Reading and editing excel files with pandas


So, I'm reading xlsx files and I need to select 3 specific columns, then rename them, and transfer the first 100 rows of each column to a new xlsx file, deleting them from the original one.

My code as of now is reading from a manually edited xlsx file, where I rename the columns, select the first 100 of them (except for the header), etc:

df = pandas.read_excel('test.xlsx', names=['date', 'value', 'source'])
df['date'] = pandas.to_datetime(df['date'], format='%b %d %Y.%f').astype(str)

print(df.head(5))

Which returns, as expected:

         date    value               source

0  2018-05-01      1                xxxxxxx
1  2018-05-01      1      xxxxxxxxxxxxxxxxx
2  2018-05-02      1      xxxxxxxxxxxxxxxxx
3  2018-05-02      1      xxxxxxxxxxxxxxxxx
4  2018-05-03      1     xxxxxxxxxxxxxxxxxx

Is there a way to automatically read and edit the xlsx file with pandas? Other libs? I'm new to python and I'm really lost here :(


Solution

  • Seems like you need 2 files output. One with the first hundred rows, another with the remainder. This is straightforward with pandas.DataFrame.iloc:

    import pandas as pd
    
    df = pd.read_excel('test.xlsx', names=['date', 'value', 'source'])
    
    df.iloc[:100].to_excel('out1.xlsx', index=False)
    df.iloc[100:].to_excel('out2.xlsx', index=False)