I have a folder containing images (.jpg), and I need to extract the file names to CSV, split them using '_'
into multiple columns (with headers), and strip out multiple characters.
I have partially completed this using the following:
import os, csv
with open('filepath.csv', 'w') as f:
writer = csv.writer(f)
for path, dirs, files in os.walk('dirpath'):
for item in files:
writer.writerow([item])
with open('filepath.csv', 'w') as inf:
with open ('outfile.csv', 'w') as outf:
for line in inf:
outf.write(','.join(line.split('_')))
Example file name:
firstname_lastname_uniqueid_date_latUKN_longUKN_club.jpg
The result of my code above returns firstname
, lastname
, uniqueid
, date
, latUKN
,longUKN
, and club.jpg
.
This is the schema I'm looking for but I'd also like to parse out the 'lat'
and 'long'
from latUKN
, and longUKN
, as well as remove the .jpg
at the end of the string. I need to remove the strings 'lat'
and 'long'
because there are file names containing latitude/longitude, but the 'lat'
and 'long'
are brought along in the parsing (e.g. lat12.34, long54.67
)
How can I remove/strip out these other characters, and add headers? If there is no latitude or longitude, how can I leave this part empty instead of populating the string 'latUKN'
,'longUKN'
. Is it possible to run this over a whole directory and output a single csv?
Sample Data
John_Doe_2259153_20171102_latUKN_longUKN_club1.jpg
John_Doe_2259153_20171031_lat123.00_long456.00_club1.jpg
Jane_Doe_5964264_20171101_latUKN_longUKN_club2.jpg
Jane_Doe_5964264_20171029_lat789.00_long012.00_club2.jpg
Joe_Smith_1234564_20171001_lat345.00_long678.00_club3.jpg
How data looks with current code:
John|Doe|2259153|20171102|latUKN|longUKN|club1.jpg
John|Doe|2259153|20171031|lat123.00|long456.00|club1.jpg
Jane|Doe|5964264|20171101|latUKN|longUKN|club2.jpg
Jane|Doe|5964264|20171029|lat789.00|long012.00|club2.jpg
Joe|Smith|1234564|20171001|lat345.00|long678.00|club3.jpg
How I want the data to look:
John|Doe|2259153|20171102|UKN|UKN|club1
John|Doe|2259153|20171031|123.00|456.00|club1
Jane|Doe|5964264|20171101|UKN|UKN|club2
Jane|Doe|5964264|20171029|789.00|l012.00|club2
Joe|Smith|1234564|20171001|345.00|678.00|club3
Since both answers revolved around using find/replace, and did not fully resolve the problem, I used the following to I complete the task:
import csv
infile = open('path', 'r')
outfile = open('path', 'r')
findlist = ['lat', 'long', '.jpg.']
replacelist = ["", "", ""]
rep = dict(zip(findlist, replacelist))
s = infile.read()
for item, replacement in zip(findlist, replacelist):
s = s.replace(item, replacement)
outfile.write(s)