I have more than 100 Parquet files in a folder. I am not sure if all the files are having same feature name(column name). I want to write some python codes, through pandas which could read all the file in directory and return the name of columns with file name as prefix.
I tried 'for loop', but not sure how to structure the query. Being a beginner I could not write looped script.
import glob
path = r'C:\Users\NewFOlder1\NewFOlder\Folder'
all_files = glob.glob(path + '\*.gzip')
col=[]
for paths in all_files:
df=pd.read_parquet(paths)
col.append(df.columns)
print(col)
IIUC, use pandas.concat
with pandas.DataFrame.columns
:
import glob
import pandas as pd
path = r'C:\Users\NewFOlder1\NewFOlder\Folder'
all_files = glob.glob(path + '\*.gzip')
list_dfs = []
for paths in all_files:
df = pd.read_parquet(paths)
list_dfs.append(df)
col_names = pd.concat(list_dfs).columns.tolist()