I am reading in data from a .xlsx spreadsheet using Pandas 1.3.4 and Python 3.9.6.
I want data for each item based on a code
in the 'Code' column so I am first isolating a row in the dataframe (df
) using:
row = df.loc[df['Code'] == code]
To then get individual values in the row I am accessing them using indexing and their headers as follows:
barcode = row['Barcode'].values.tolist()
I am using .values
as I otherwise get a cumbersome line number and datatype output. I am using .tolist()
so that I can effectively access the actual barcode rather than a scientific notation interpretation. Note that this means barcode
should be a list with a single value.
Here is my problem, when I print barcode
I get a sensible output, e.g.: [72934728491]
(note this is not an actual barcode, but an example 11 digits)
Update: len(barcode)
returns 1 and type(barcode)
returns <class 'numpy.ndarray'>
or <class 'list'>
if I've used .tolist()
However, if I try to access the value inside the list using barcode[0]
or barcode.pop()
I get errors saying that the index is out of range or that .pop()
cannot be used on an empty list.
Bizarrely, if I write a for loop to get the value out of the list it works just fine:
for item in barcode:
print(item)
Please can someone explain why this is happening and how to actually fix it (rather than having to write a for loop for every single column value I need to access on the row).
I have tried to construct a Minumum Viable Example to help, but am struggling to replicate the problem in other code for now, this is the section where I dissect the dataframe, unfortunately I cannot show the section where I read in the df as it contains private info, but the spreadsheet reads into the df 'normally':
for code in new_items:
row = df.loc[df['Code'] == code] # row for a single item code
product_display_name = row['Product Display Name'].values
product_type = row['Product Type'].values
description = row['Description'].values
artist = row['Artist'].values
barcode = row['Barcode'].values.tolist()
finish = row['Finish'].values
unit = row['Unit'].values
country_of_origin = row['Country of Origin'].values
job_number = row['Job No.'].values
samples_date = row['Samples Date'].values
print(barcode.pop()) # this does not work
for item in barcode:
print(item) # this does work
The problem turned out to be to do with pandas' handling of empty cell NaN
values.
While dataframes display NaN
, somewhere in the .values
and .tolist()
conversion I was losing that info and having empty arrays. This was further complicated by some strange spreadsheet formatting that meant the dataframe had a 0th row of all NaN
values so my loop never even hit a valid value before failing.