Search code examples
pythonpandaslistdataframeseries

Data type assigned inside nested for loop isn't as expected


I get the error:

AttributeError: 'float' object has no attribute 'lower'

When trying to compile this triple nested for loop:

for row_data in df_row_list:
    for row_item_data in row_data:
        for param in search_params:
            if row_item_data.lower() == param.lower():
                row_index = df_row_list.index(row_data)

df_row_list is a list of 18 series. I am trying to iterate through it and comb through the data. How do I assign the str data type to row_item_data so that I can use the .lower() attribute?

This is kinda what the data I am working with looks like:

0         NaN        NaN  ...             NaN              NaN
1      REV. :         NC  ...             NaN              NaN
2      OP.# :  0200-00-0  ...             NaN              NaN
3         NaN        NaN  ...             NaN              NaN
4    WI ASM #   HOLDER #  ...  TOOL STICK OUT  TOOL LIFE (MIN)
5         NaN        NaN  ...            0.55              120
6         NaN        NaN  ...            0.55              120
7         NaN        NaN  ...            0.55              120
8         NaN        NaN  ...            0.55              240
9         NaN        NaN  ...            0.55              300

The search parameters are looking for series containing words such as: HOLDER DESCRIPTION, CUTTER #, Operation, TOOL DESCRIPTION I created a spreadsheet that has hundreds of options stored in it that I'm going to compare with.

I would expect it to spit it out index of the series from the df_row_list (list with a number of series in it) so that I can know where the row of data I want to use as a "Title Row" is.

Or is this not even the best way to attempt to comb through a list of series for specific keywords? I'm relatively new to python and I'm open to any help.


Solution

  • Just posting incase anyone has a similar issue and is looking for a different solution

    This is how I found a solution:

    import os
    import pandas as pd
    
    #the file path I want to pull from
    in_path = r'W:\R1_Manufacturing\Parts List Project\Tool_scraping\Excel'
    #the file path where row search items are stored
    search_parameters = r'W:\R1_Manufacturing\Parts List Project\search_params.xlsx'
    #the file I will write the dataframes to
    outfile_path = r'W:\R1_Manufacturing\Parts List Project\xlsx_reader.xlsx'
    
    #establishing my list that I will store looped data into
    file_list = []
    main_header_list = []
    
    #open the file path to store the directory in files
    files = os.listdir(in_path)
    
    #database with terms that I want to track
    search = pd.read_excel(search_parameters)
    length_search = search.index 
    
    #turn search dataframe into string to do case-insensitive compare
    search_string = search.to_string(header = False, index = False)
    
    #function for case-insenitive string compare
    def insensitive_compare(x1, y1):
        if x1.lower() == y1.lower():
            return True  
    
    #function to iterate through current_file for strings and compare to 
    #search_parameters to grab data column headers
    def filter_data(columns, rows): #I need to fix this to stop getting that A
        for name in columns:
            for number in rows:
                cell = df.at[number, name]
                if cell == '':
                    continue
                for place in length_search:
                    #main compare, if str and matches search params, then do...
                    if isinstance(cell, str) and insensitive_compare(search.at[place, 'Parameters'], cell) == True:
                        #this is to prevent repeats in the header list
                        if cell in header_list:
                            continue
                        else:
                            header_list.append(cell) #store data headers
                            row_list.append(number)  #store row number where it is in that data frame
                            column_list.append(name) #store column number where it is in that data frame
                    else:
                        continue
    
    #searching only for files that end with .xlsx
    for file in files:
        if file.endswith('.xlsx'):
            file_list.append(in_path + '/' + file)
            
            
    #read in the files to a dataframe, main loop the files will be maninpulated in
    for current_file in file_list:
        df = pd.read_excel(current_file)
        
        header_list = []
        
        #get columns headers and a range for total rows
        columns = df.columns
        total_rows = df.index
        
        #adding to store where headers are stored in DF
        row_list = []
        column_list = []
        storage_list = []
        
        #add the file name to the header file so it can be separated by file
        #header_list.append(current_file)
        main_header_list.append(header_list)
        
        #running function to grab header names
        filter_data(columns, total_rows)
    

    So now when I compile and output the data I get:

    WI ASM #
    HOLDER #
    HOLDER DESCRIPTION
    A.63.140.1/8z
    A.63.140.1/8z
    A.63.140.1/8z
    A.63.140.1/8z
    A.63.140.1/8z
    CUTTER #
    Harvey 980215
    Harvey 980215
    Harvey 28178
    Harvey 28178
    Harvey 74362-C3
    OPERATION
    GROOVE
    ROUGHING
    SEMI-FINISH
    FINISH
    DEBURR & BLEND
    TOOL DESCRIPTION
    CREM_.125_.015R_1
    CREM_.125_.015R_2
    CREM_.0781_.015R_1
    CREM_.0781_.015R_2
    BEM_.0625
    Starting Radial Wear
    -
    -
    -
    -0.0002
    -
    TOOL STICK OUT
    0.55
    0.55
    0.55
    0.55
    0.55
    TOOL LIFE (MIN)
    120
    120
    120
    240
    300
    

    Which is cleaned up and in the order I was looking for.