How do I split my folder containing multiple video files into train and test folders based on dataframe variables that tell me the which video should be in the train folder and which video should be in the test folder? (in Python 3.0). In which multiple videos are located in separate category folders
Each of the videos can be found in for instance the following category directories:
C:\Users\Me\Videos\a
C:\Users\Me\Videos\b
Which means that for every category I need a "train" and "test" folder like:
C:\Users\Me\Videos\a\train
C:\Users\Me\Videos\a\test
While I have an (EDIT) csv-file containing the following information. Thus, I dont want my train and split to be random, but based on the binary code in my sheet.
videoname |test|train|category|
-------------------------------
video1.mp4| 1 |0 |a |
video2.mp4| 1 |0 |b |
video3.mp4| 1 |0 |c |
video4.mp4| 0 |1 |c |
Can anyone point me in the direction of how I can use the file to do this for me? Can I somehow put the file in a dataframe which tells Python where to move the files?
EDIT:
import os
import csv
from collections import defaultdict
videoroot = r'H:\Desktop'
transferrable_data = defaultdict(list)
with open(r'H:\Desktop\SVW.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
video_path_source = os.path.join(videoroot, row['Genre'], row['FileName'])
if (row['Train 1?'] == 0):
split_type = 'test'
else:
split_type = 'train'
video_destination_path = os.path.join(videoroot, row['Genre'], split_type, row['FileName'])
transferrable_data[video_path_source].append(video_destination_path)
Well the first thing to do is to read your excel and construct a mapping from source file to destination folders :
VIDEO_ROOT_FOLDER = 'C:\Users\Me\Videos'
transferrable_data = defaultdict(list)
for row in excel_iteratable:
video_source_path = os.path.join(VIDEO_ROOT_FOLDER, row['category'], row['videoname'])
if (row['test'] == 1):
split_type = 'test'
else: # I suppose you can only dispatch to test or train in a row
split_type = 'train'
video_destination_path = os.path.join(VIDEO_ROOT_FOLDER, row['category'], split_type, row['videoname']))
transferrable_data[video_path_source].append(video_destination_path)
then you can write a script where you move your files to the correct paths, using one of the two following methods :
import os
os.rename("path/to/current/video", "path/to/destination/folder")
or if you need to copy (you don't want to alter your video folder) :
from shutil import copyfile
copyfile("path/to/current/video", "path/to/destination/folder")
Let's say for example that your mapping is :
transferrable_data = {'C:\Users\Me\Videos\a\video1.mp4' : ['C:\Users\Me\Videos\a\train\video1.mp4'], 'C:\Users\Me\Videos\a\video2.mp4': ['C:\Users\Me\Videos\b\test\video2.mp4', 'C:\Users\Me\Videos\c\test\video2.mp4']}
you can do something like:
from shutil import copyfile
transferrable_data = {'C:\Users\Me\Videos\a\video1.mp4' : ['C:\Users\Me\Videos\a\train\video1.mp4'], 'C:\Users\Me\Videos\a\video2.mp4': ['C:\Users\Me\Videos\b\test\video2.mp4', 'C:\Users\Me\Videos\c\test\video2.mp4']}
for src, destination_list in transferrable_data.items():
for dest in destination_list:
copyfile(src, dest)