Search code examples
pythonpandasmergeappendconcatenation

How to combine a large number of dataframes?


I have many .txt files in a folder. For example, each .txt file is like below.

FileA = pd.DataFrame({'Id':["a","b","c"],'Id2':["a","b","z"],'Amount':[10, 30,50]})
FileB= pd.DataFrame({'Id':["d","e","f","z"],'Id2':["g","h","i","j"],'Amount':[10, 30,50,100]})
FileC= pd.DataFrame({'Id':["r","e"],'Id2':["o","i"],'Amount':[6,33]})
FileD...

I want to extract the first row of each dataframe in the folder, and then combine all of them. So what I did is below.

To make a list of the txt files, I did the following.

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)  

To extract first row and combine all of them, I did below.

pd.read_table(txtfiles[0])[:1].append([pd.read_table(txtfiles[1])[:1],pd.read_table(txtfiles[2])[:1]],pd.read_table.......)

If the number of txt. files is small, I can do in this way, but in case there are many .txt files, I need an automation method. Does anyone know how to automate this? Thanks for your help!


Solution

  • Based on Sid's answer to this post:

    input_path = r"insert/your/path" # use the patk where you stored the txt files
    all_files = glob.glob(os.path.join(input_path, "*.txt"))     
    df_from_each_file = (pd.read_csv(f, nrows=1) for f in all_files)
    concatenated_df = pd.concat(df_from_each_file, ignore_index=True)
    

    Update Using pd.read_csv was not properly ingesting the file. Replacing read_csv with read_table should give the expected results

    input_path = r"insert/your/path" # use the patk where you stored the txt files
    all_files = glob.glob(os.path.join(input_path, "*.txt"))     
    df_from_each_file = (pd.read_table(f, nrows=1) for f in all_files)
    concatenated_df = pd.concat(df_from_each_file, ignore_index=True)