Search code examples
pythonpandasxlsx

Python parsing xlsx based on a column name and a value


I am trying to parse an xlsx following file: enter image description here.

import pandas as pd
import numpy as np


data = pd.read_excel ("test.xlsx") 

ID = pd.DataFrame(data, columns= ['RUT23001E014'])
item_names = pd.DataFrame(data, columns= ['Riv'])
print("ID dataframe=",ID)
print("item names dataframe=",item_names)
print(ID.loc["1"])

What I am trying to do here, is to only return me rows where number "1" is found for a column named "RUT23001E014"

The answer that I am looking for in this case is something like that:

item 1 1
item 5 1

Since only these two items are assigned a value "1". The others who have assigned "0" I don't care about.

I have been looking at dataframe.loc function but I cannot fully figure out how do I use it to locate a particular value inside a column

UPDATE********

MORE DETAILED TABLE

So the RUTXXXXXXX are the serial numbers. Each serial number is assigned a different combination of items. Depending on a operation that I am doing, I need to know what items and quantities a specific Serial number is attached to


Solution

  • Change your item_names line to this and try:

    item_names = data[data['RUT23001E014'] == 1]
    print(items_df)
    

    And you can remove the line where you created ID - it is not useful