Search code examples
pythonpandasmultiple-columnsxlsx

How to use pandas to select certain columns in csv file


I only just started my coding journey in order to and have watched a bunch of tutorials on youtube and am now trying to 'import' a dataset from SPSS into python using jupyter.

So far I've managed to convert the .sav into a .csv file and read it using the code below. I want to select certain columns in my data and store them in a new csv file in order to do some analysis on them and try to build a script to predict certain things and characteristics. Problem is i have hundreds of data columns and only want 3 or 4 to start with.

i tried using the data.drop() function but soon realized there must be a better way to do this?

I apologize in advance for my inability to explain this in a better way as this is my very first post here.

import pandas as pd
df = pd.read_csv('csvfile.csv')
df

Solution

  • Use this:

    import pandas as pd
    df = pd.read_csv('csvfile.csv' , usecols = ['col1','col2'])
    df
    

    Inplace of 'col1' and 'col2' enter the column names. Then to write them into another csv , do this:

    df.to_csv('csv_file_1.csv' , index = False)