I have a folder full of videos. Some of them have audio and others are mute( literally no audio stream). My goal with the follwoing small program i've made is to move the videos without audio to a folder named gifs.
My questions is : How can i optimize?
Here it is the progamm:
from subprocess import check_output
from os.path import join,splitext
from os import rename,listdir
from shutil import move
def noAudio(path):
cmd =("ffprobe -i {0} -show_streams -select_streams a -loglevel error".format(path))
output = check_output(cmd,shell=True)
boolean = (output == b'')
return boolean
def del_space(file):
rename(join(src,file),join(src,file.replace(' ','')))
Newf = file.replace(' ','')
return Newf
def StoreNoAudio(src,dist):
target = [".mp4",".MP4",".gif"]
GifMoved = 0
print("processing...")
for file in listdir(src):
direction,extension = splitext(file)
try:
if extension in target:
#find space related errors and correct them
if ' ' in file:
file = del_space(file)
path = join(src,file)
distination = join(dist,file)
#move file without audio streams
if(extension == '.gif' or noAudio(path) == True):
move(path,distination)
GifMoved += 1
except Exception as e:
print(e)
print('Mute videos moved:',GifMoved)
print('finished!')
dist = (r"C:\Users\user\AppData\Roaming\Phyto\G\Gif")
src = (r"C:\Users\user\AppData\Roaming\Phyto\G\T")
StoreNoAudio(src,dist)
*I'm new to stackoverflow feel free to tell me if i'm doing something wrong.
If I understand correctly, your program works correctly already and you are looking for ways to reduce running time.
You could use the multiprocessing package to parallelize your program into per-file subprocesses.
To do so, put the code in your for
loop into a function (let's call it process_file
), and then:
import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count())
pool.map(process_file, listdir(src))
This will create as many subprocesses as there are cpus/cores and will distribute the work onto those. This should result in a significant reduction of running time, depending on the number of available cores in your machine of course.
Keeping track of the number of moved files does not directly work anymore, however, because the variable GifMoved
is not accessible to the child processes. Your function could return a 1
if the file was moved and a 0
if not, then you could sum up the results of all calls to process_file
like this:
GifMoved = sum(pool.map(process_file, listdir(src))) # instead of last line above