I got a small problems on working with Pandas. The problem is I created a file that stores class to read and clean data from a .csv file. and I import my own library to load the data and then i want to use the pandas dataframe for other operations. But for some reason, I can't do it.
So, here is the code I created a class for loading/reading the file:
import pandas as pd
class Load_Data:
def __init__(self, filename):
self.__filename = filename
def load(self):
df = pd.read_csv(self.__filename)
del df["Remarks"]
df = df.dropna()
return df
and in another file, i was trying to import this self-created library for data processing step and then try to work on it with Pandas DataFrame.
from Load_Data import Load_Data
import pandas as pd
test_df = Load_Data("Final_file.csv")
test_df.load()
There is no problem printing the table of the content from my file. But when I tried to use it (test_df) as a Pandas dataframe, for example, I want to GroupBy some of the attributes
test_df.groupby(['width','length])
it ends up showing:
'Load_Data' object has no attribute 'groupby'
which means if i want to use the groupby function, i have to write it on my own in my own class. but I don't want to do that. I just want to convert my class to a Pandas DataFrame and work using their package directly for some complex operations.
I would be really appreciate for any kindly helps
Can you share the next line or two which throw an error? Are you referencing the returned data, or the class?
I.e.
df2= test_df.load()
df2.groupby()
Or
test_df.groupby()
Are you trying to create a new data frame class build on pandas? If so you'd need something like this (might work)
class LoadDF(pd.DataFrame)
def __init__(self, filename):
self.__filename = filename
def load(self):
df = pd.read_csv(self.__filename)
del df["Remarks"]
df = df.dropna()
self = df