Search code examples
pythonpandasdataframetarget

How to Create Target(y) and X variables from CSV file


I am reading a CSV file and I am needing, for modeling purposes, to create a Target (Y) and X variables. Not sure how to set that up. I am new to coding and needing some guidance that I can't seem to understand from Pandas docs. I would like to have Target as 'Bad Indicator' and 'X' as all other columns.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import pandas as pd
project = pd.read_csv('c:/users/Brandon Thomas/Project.csv')
project=pd.DataFrame(project)
df = pd.DataFrame(project.data, columns = project.feature_names)
df["Bad Indicator"] = x.target
X = df.drop("Bad Indicator",axis=1)   #Feature Matrix
y = df["Bad Indicator"]          #Target Variable
df.head()

AttributeError Traceback (most recent call last) in 1 # Build dataframe ----> 2 df = pd.DataFrame(project.data, columns = project.feature_names) 3 df["Bad Indicator"] = x.target 4 X = df.drop("Bad Indicator",axis=1) #Feature Matrix 5 y = df["Bad Indicator"] #Target Variable

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, 
name)
   5065             if 
self._info_axis._can_hold_identifiers_and_holds_name(name):
   5066                 return self[name]
-> 5067             return object.__getattribute__(self, name)
   5068 
   5069     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'data'

Solution

  • In your code above you create a dataframe 3 separate times. Once with pd.read_csv, once with project = pd.DataFrame(project) and once more with with df = pd.DataFrame(...). By default, pd.read_csv object will be a dataframe.

    I have taken out currently unnecessary imports such as numpy, scipy, and matplotlib. You can add them back if you need them later. To set up Y and X, all you need to do is:

    import pandas as pd
    
    df = pd.read_csv('c:/users/Brandon Thomas/Project.csv') # this will automatically name your columns if your csv has headers
    
    #if your csv does not have headers:
    df.columns = ['Bad Indicator', 'ColumnName1', 'ColumnName2',..]
    
    X = df.drop("Bad Indicator",axis=1)   #Feature Matrix
    Y = df["Bad Indicator"]          #Target Variable
    
    df.head()
    

    If your csv does have headers, remove the df.columns line.