I am looking to delete files from multiple folders using a .csv file. The csv file contains a list of file names that need to be deleted(example: Box4, 60012-01). How the data is stored is in multiple folder and also had additional extension (example: /tiles_20X_299/Box20/660491-3_mag20_xpos5980_ypos6279.jpg. Is there a way to get these files deleted. Help would be really appreciated. This is what I have till now but not sure if I'm going the right direction. [sample of the csv file to delete][1]
fin = open('files_to_delete.csv', 'r')
fin.readline()
print(fin)
file_to_delete = set()
while True:
line = fin.readline().strip()
#print(line)
if not line:
break
array = line.split(',')
file_to_delete.add("Box" + array[0] + "/" + array[1])
fin.close()
print(file_to_delete)
#
for path in glob.glob('/home/sshah/Tiles/tiles_20X_299/*'):
for f in file_to_delete:
print(f)
os.chdir(path)
#print(path)
if os.path.exists(f):
print('delete')
#os.remove(f)```
[1]: https://i.sstatic.net/dFCxk.png
You're definitely going in the right direction.
Assuming you're running at least version 3.5 of Python, you can use glob.iglob()
to recursively iterate over every file in every subdirectory.
I've tweaked your code to make it a bit more pythonic.
Some specific changes:
Renamed the file_to_delete
set
to files_to_delete
because it contains multiple files and should be plural.
Used a with
statement with the file object's context manager to avoid worrying about exceptions and explicitly calling .close()
.
Looped over fin
to get each line without explicitly calling .readline()
.
Used os.path.sep
instead of hardcoding /
.
Removed both unnecessary os.chdir(path)
and os.path.exists(f)
calls.
It works by iterating over every file in every subdirectory (which gives us the full filepath as a str
), then we iterate over the files_to_delete
set
to check if every file_to_delete
is a substring of the filepath
. If it is, delete the file, and break
out of that loop to continue with the next filepath.
If you know there are no other filenames with a similar base, you can uncomment this line: files_to_delete.remove(file_to_delete)
. For example, if you have a file called:
/tiles_20X_299/Box20/660491-3_mag20_xpos5980_ypos6279.jpg
but not another one called:
/tiles_20X_299/Box20/660491-3_mag10_xpos2000_ypos4000.jpg
To be safe, leave it commented out.
import glob, os
files_to_delete = set()
with open('files_to_delete.csv', 'r') as fin:
fin.readline() # Consume header
for line in fin:
line = line.strip()
if line:
files_to_delete.add('Box' + line.replace(',', os.path.sep)) # Assume none of the files contain a comma
print(files_to_delete)
for filepath in glob.iglob(r'/home/sshah/Tiles/tiles_20X_299/**/*', recursive=True):
for file_to_delete in files_to_delete:
if file_to_delete in filepath:
print('Delete:', filepath)
#os.remove(filepath)
#files_to_delete.remove(file_to_delete)
break