Search code examples
pythonpandasdataframedrop-duplicates

pandas DataFrame select specific data


I want to build a for loop to only select row 5, row 10 and row 14 in pandas.

enter image description here

The actual file include thousands of rows in similar format. Please teach me a function that can go over the entire file.

Many Thanks !!!

Attached is my current progress:

df = pd.read_csv('C:/Users/ymx19/Desktop/EHS/Location/results/Batch3_enterprise_with_missing_level/HOU.csv',header = 0)
df = df.dropna(axis ='columns',how ='all')
headers_list = [x for x in df.columns]
count = len(headers_list)
k = headers_list[-1]
maxlevel = df[df[k].notna()].drop_duplicates(subset= headers_list, keep="last")
while count > 3:
    k = headers_list[-1]
    headers_list.pop()
    z = headers_list[-1]
    lower_level = df.drop_duplicates(subset=headers_list, keep="last")
    lower_level = lower_level[lower_level[z].notna() & lower_level[k].isna()]
    maxlevel.append(lower_level)
    count -= 1 
maxlevel.to_csv('C:\\Users\\ymx19\\Desktop/EHS\\Location\\results\\test\\HOU.csv', index = False)

Question: The the final maxlevel.csv didn't include any appended values from the for loop


Solution

  • In general, you can use slicing for this with df.iloc[start_row:end_row, start_column:end_column] or you can select specific rows with df.iloc[[4,9,13]].

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html

    If you want to remove duplicates you can use:

    df.drop_duplicates(subset=["Customer", "Level1", "Level2"], keep="last).

    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html