So, I'm reading xlsx files and I need to select 3 specific columns, then rename them, and transfer the first 100 rows of each column to a new xlsx file, deleting them from the original one.
My code as of now is reading from a manually edited xlsx file, where I rename the columns, select the first 100 of them (except for the header), etc:
df = pandas.read_excel('test.xlsx', names=['date', 'value', 'source'])
df['date'] = pandas.to_datetime(df['date'], format='%b %d %Y.%f').astype(str)
print(df.head(5))
Which returns, as expected:
date value source
0 2018-05-01 1 xxxxxxx
1 2018-05-01 1 xxxxxxxxxxxxxxxxx
2 2018-05-02 1 xxxxxxxxxxxxxxxxx
3 2018-05-02 1 xxxxxxxxxxxxxxxxx
4 2018-05-03 1 xxxxxxxxxxxxxxxxxx
Is there a way to automatically read and edit the xlsx file with pandas? Other libs? I'm new to python and I'm really lost here :(
Seems like you need 2 files output. One with the first hundred rows, another with the remainder. This is straightforward with pandas.DataFrame.iloc
:
import pandas as pd
df = pd.read_excel('test.xlsx', names=['date', 'value', 'source'])
df.iloc[:100].to_excel('out1.xlsx', index=False)
df.iloc[100:].to_excel('out2.xlsx', index=False)