Search code examples
pythongoogle-colaboratoryshutil

for loop with shutil.copytree wrongly copying ALL files for EVERY single folder


In this for loop it is copying ALL the folders for every sub-folder. so its taking crazily long and not what i want. So for e.g. for for folder “SPRING2158” it is copying everything starting from “SPRING00001” to the end “SPRING09999”. But it only should copy for images starting with “SPRING2158”. Not for everything.

import os
import shutil 

path = os.path.expanduser('/content/images/FemaleImages')   
for file_name in os.listdir(path):
  if file_name[:10] in nameTrainF:
    shutil.copytree(path+'/', '/content/OutTrainF/'+file_name[:10]+'/')   
  if file_name[:10] in nameValF:
    shutil.copytree(path+'/', '/content/OutValF/'+file_name[:10]+'/')   
  if file_name[:10] in nameTestF:
    shutil.copytree(path+'/', '/content/OutTestF/'+file_name[:10]+'/')   

To give you an idea of the data structure, below is how the data looks like when extracting it, when extracting from RAR format.

 !unrar e "/content/drive/My Drive/femaleset.rar" "/content/images/FemaleImages/" 

Extracting  /content/images/FemaleImages/SPRING4796-D7-V40-H50.png    OK 
Extracting  /content/images/FemaleImages/SPRING4796-D7-V40-H60.png    OK 
Extracting  /content/images/FemaleImages/SPRING4796-D7-V40-H70.png    OK 
Extracting  /content/images/FemaleImages/SPRING4796-D7-V40-H80.png    OK 
Extracting  /content/images/FemaleImages/SPRING4796-D7-V40-H90.png    OK 
Extracting  /content/images/FemaleImages/SPRING4798-D7-V0-H0.png      OK 
Extracting  /content/images/FemaleImages/SPRING4798-D7-V0-H10.png     OK 
Extracting  /content/images/FemaleImages/SPRING4798-D7-V0-H100.png    OK 
Extracting  /content/images/FemaleImages/SPRING4798-D7-V0-H110.png    OK 
Extracting  /content/images/FemaleImages/SPRING4798-D7-V0-H120.png    OK 
Extracting  /content/images/FemaleImages/SPRING4798-D7-V0-H130.png    OK 
Extracting  /content/images/FemaleImages/SPRING4798-D7-V0-H140.png    OK 
Extracting  /content/images/FemaleImages/SPRING4798-D7-V0-H150.png    OK 
Extracting  /content/images/FemaleImages/SPRING4798-D7-V0-H160.png    OK 

screenshot of output folder in Colab

screenshot of output folder in Colab

Added the updated code below for additional question from the comments:

path = os.path.expanduser('/content/images/FemaleImages')  
for file_name in os.listdir(path):
  if file_name[:10] in nameTrainF:
    shutil.copy2(os.path.join(path, file_name), os.path.join('/content/OutTrainF/', file_name[:10],'/'))   
  if file_name[:10] in nameValF:
    shutil.copy2(os.path.join(path, file_name), os.path.join('/content/OutValF/', file_name[:10],'/')) 
  if file_name[:10] in nameTestF:
    shutil.copy2(os.path.join(path, file_name), os.path.join('/content/OutTestF/', file_name[:10],'/'))

Solution

  • You're not copying just the current file of the iteration, you're copying everything in path each time. Use shutil.copy2() to copy a file, not shutil.copytree().

    Also, you should use os.path.join() to combine directories and filenames, rather than concatenation.

    for file_name in os.listdir(path):
      prefix = name[:10]
      if prefix in nameTrainF:
        target = os.path.join('/content/OutTrainF/', prefix, '')
        os.makedirs(target, exist_ok=True)
        shutil.copy2(os.path.join(path, file_name), target)   
      if prefix in nameValF:
        target = os.path.join('/content/OutValF/', prefix, '')
        os.makedirs(target, exist_ok=True)
        shutil.copy2(os.path.join(path, file_name), target)   
      if prefix in nameTestF:
        target = os.path.join('/content/OutTestF/', prefix, '')
        os.makedirs(target, exist_ok=True)
        shutil.copy2(os.path.join(path, file_name), target)