Search code examples
pythonpandaspandas-loc

Need to extract or remove columns from python


I have a list that looks like this:

    categorical_features = \
    ['FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond', 
     'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1', 
     'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
     'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallQual',
     'OverallCond', 'YrSold', 'MoSold']

I need to remove these columns from the dataset by doing this:

all_data = all_data.loc[:,categorical_features]

Unfortunately, this step only selects these columns. How would I reverse the process by excluding them instead?


Solution

  • You can use pandas.drop to exclude those columns:

    all_data = all_data.drop(categorical_features, axis = 1)
    

    Look to the following example as a test:

    import pandas as pd
    import numpy as np
    
    dates = pd.date_range('20130101', periods=6)
    
    df = pd.DataFrame(np.random.randn(6, 4), index = dates, columns = list('ABCD'))
    
    print(df)
    
    features = ['B', 'D']
    df = df.drop(features, axis = 1)
    
    print(df)
    

    The output:

                       A         B         C         D
    2013-01-01  1.365473 -0.445448  0.244377  0.416889
    2013-01-02 -0.307532  0.095569  1.356229 -0.306618
    2013-01-03  0.971216  1.100189  0.932189  0.808151
    2013-01-04 -0.030160 -0.796742 -0.383336 -0.409233
    2013-01-05  0.006601  0.093678 -1.013768  1.439921
    2013-01-06  0.560771 -0.452491  1.050500 -1.545958
                       A         C
    2013-01-01  1.365473  0.244377
    2013-01-02 -0.307532  1.356229
    2013-01-03  0.971216  0.932189
    2013-01-04 -0.030160 -0.383336
    2013-01-05  0.006601 -1.013768
    2013-01-06  0.560771  1.050500