Search code examples
pythonpython-3.xpandasglob

Search column names in many dfs


Hi how could I search many CSVs and get a list of filenames that have Apple in the column? I need the search for Apple to have an option (1) to search case insensitive or case sensitively, and (2) search for whole words, so "appletree" can be rejected if needed.

Some CSV have column names like: "Apple Price", "Apple Discount", "apple", "appletree"

I tried

filenames = "C:\Documents\*.csv"
for file in glob.glob(filenames):
   df = pd.read_csv(file)
   if "Apple" in df.columns:
           print(file)


Solution

  • You could loop through each column name an see if Apple is part of that column, like this:

    filenames = "C:\Documents\*.csv"
    for file in glob.glob(filenames):
       df = pd.read_csv(file)
       if any("Apple".lower() in col.lower().split() for col in df.columns):
               print(file)