I want to build a for loop to only select row 5, row 10 and row 14 in pandas.
The actual file include thousands of rows in similar format. Please teach me a function that can go over the entire file.
Many Thanks !!!
Attached is my current progress:
df = pd.read_csv('C:/Users/ymx19/Desktop/EHS/Location/results/Batch3_enterprise_with_missing_level/HOU.csv',header = 0)
df = df.dropna(axis ='columns',how ='all')
headers_list = [x for x in df.columns]
count = len(headers_list)
k = headers_list[-1]
maxlevel = df[df[k].notna()].drop_duplicates(subset= headers_list, keep="last")
while count > 3:
k = headers_list[-1]
headers_list.pop()
z = headers_list[-1]
lower_level = df.drop_duplicates(subset=headers_list, keep="last")
lower_level = lower_level[lower_level[z].notna() & lower_level[k].isna()]
maxlevel.append(lower_level)
count -= 1
maxlevel.to_csv('C:\\Users\\ymx19\\Desktop/EHS\\Location\\results\\test\\HOU.csv', index = False)
Question: The the final maxlevel.csv didn't include any appended values from the for loop
In general, you can use slicing for this with df.iloc[start_row:end_row, start_column:end_column]
or you can select specific rows with df.iloc[[4,9,13]]
.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html
If you want to remove duplicates you can use:
df.drop_duplicates(subset=["Customer", "Level1", "Level2"], keep="last)
.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html