Search code examples
pythonpandasdataframeclassdata-processing

How to use Pandas to work on a data loaded from my own created class?


I got a small problems on working with Pandas. The problem is I created a file that stores class to read and clean data from a .csv file. and I import my own library to load the data and then i want to use the pandas dataframe for other operations. But for some reason, I can't do it.

So, here is the code I created a class for loading/reading the file:

import pandas as pd

class Load_Data:
    def __init__(self, filename):
        self.__filename = filename

    def load(self): 
        df = pd.read_csv(self.__filename)
        del df["Remarks"]
        df = df.dropna()

    return df

and in another file, i was trying to import this self-created library for data processing step and then try to work on it with Pandas DataFrame.

from Load_Data import Load_Data
import pandas as pd

test_df = Load_Data("Final_file.csv")
test_df.load()

There is no problem printing the table of the content from my file. But when I tried to use it (test_df) as a Pandas dataframe, for example, I want to GroupBy some of the attributes

test_df.groupby(['width','length])

it ends up showing:

'Load_Data' object has no attribute 'groupby'

which means if i want to use the groupby function, i have to write it on my own in my own class. but I don't want to do that. I just want to convert my class to a Pandas DataFrame and work using their package directly for some complex operations.

I would be really appreciate for any kindly helps


Solution

  • Can you share the next line or two which throw an error? Are you referencing the returned data, or the class?

    I.e.

    df2= test_df.load()
    df2.groupby()
    

    Or

    test_df.groupby()
    

    Are you trying to create a new data frame class build on pandas? If so you'd need something like this (might work)

    class LoadDF(pd.DataFrame)
        def __init__(self, filename):
            self.__filename = filename
    
        def load(self): 
            df = pd.read_csv(self.__filename)
            del df["Remarks"]
            df = df.dropna()
            self = df