I am using this code to pull in two CSVs of a similar naming convention, place their filenames in a "File" column and concatenate the dataframes into one dataframe called NatHrs.
import glob
from pathlib import Path
path = r'C:\Users\ThisUser\Desktop\AC Mbr Analysis'
all_files = glob.glob(path + '\\Natl_hours_YTD_OC_*.csv')
Nat_dfs = []
for file in all_files:
df = pd.read_csv(file, index_col=None, encoding='windows-1252', header=1 )
df['File'] = file
Nat_dfs.append(df)
NatHrs = pd.concat(Nat_dfs)
Now, I want to take the "File" column, which returns a filename object with entries looking like "C:\Users\ThisUser\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2018-2019", extract ONLY the end of the filename--in this case "2018-2019"--and place those characters into a new colum "Program Year", reflecting the entry "2018-2019". I am not having success with manipulating strings or series--should I be using path.replace? I am lost. When I describe the column I'm looking to parse...
NatHrs['File'].describe
...I get this:
Name: File, dtype: object>
This ultimately ended up working. Thank you much for helping me along!
globbed_files = glob.glob("Natl_hours_YTD_OC_*.csv")
globbed_files
data = []
for csv in globbed_files:
frame = pd.read_csv(csv, encoding='windows-1252', header=1)
frame['filename'] = os.path.basename(csv)
file = os.path.basename(csv)
#create a new column to store the portion of the file name that denotes the Program Year to which the data belongs
frame['Program Year'] = re.search('\d+[-]*\d*',file).group()
data.append(frame)
NatHrs = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes
NatHrs.copy().head()