Search code examples
pythondirectorysubdirectory

How to import folders in python?


I have 3 folders of excel data and was asked to create a Machine Learning model using that data. But the problem is that the data does not have headers.

How to import all those folders of data in Python.


Solution

  • Python won't tell you the name of the columns. What python can do is help you import and/or concatenate easily all of the excels.

    In order to import them massively:

    import os
    import pandas as pd
    
    # List files in an specific folder
    os.listdir(source_directory)
    
    # Set source and destination directories
    source_directory = "xx"
    
    # Open all files and assign them to a variable whose name will be df + name of file
    for file in os.listdir(source_directory):
        file_name = file.split(".")[0]
        name = "df_" + file_name
        vars()[name] = pd.read_excel(f"{source_directory}/{file}")
    

    You could as well use another loop to read data in every directory you need

    In case you need to concatenate all of the excels, suppossing them have the same structure, you could use pandas append, then it would be something like this:

    df = pd.DataFrame()
    for file in os.listdir(source_directory):
        file_name = file.split(".")[0]
        df.append(pd.read_excel(f"{source_directory}/{file}"))
    

    Regarding how to add a header row on the files, here is an answer