Search code examples
pythonsubprocesspopen

Python multiple tasks / sub-process


At the moment, I am struggling to understand how I bring a python-script to do multiple tasks along to each other.

For this case, I set my self a target:

make a script that took an URL, passed by a HTTP-GET, download a Video behind the URL, convert it to an mp3-file and perform some "post-download things" like setting mp3-tags. The challenge here should be, to accept new "download-requests" while another download/convert/post-download process is active.

If this usage makes sense or not, should not be the point of this question (since I know there is already software available to achieve video-to-mp3-downloads). I am just trying to understand how I use python to serve a certain service (httpd) while performing other tasks (download/convert/post-download).

For the start, I tried to be as basic as possible. So I decided to use BaseHttpServer and youtube-dl. BaseHttpServer let me serve and handle HTTP inside my script and youtube-dl manages the download/convert. Handling the post-download-actions is my problem.

At the moment I am able to accept multiple "download-requests" and start multiple child-process', but ... how can I start "post-download" things (like setting mp3-tags) after one download/convert has finished. Since I have no clue how I get to the information that the download/convert on a specific file has finished successfully.

Here is my code so far

#!/usr/bin/env python
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
import SocketServer
import subprocess

class S(BaseHTTPRequestHandler):
    def _set_headers(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html')
        self.end_headers()

    def do_GET(self):
        # set youtube-dl command and arguments
        args = ['youtube-dl', '--extract-audio', '--audio-format', 'mp3', '--output', '%(title)s.%(ext)s', '--no-playlist', '--quiet']

        # building HTTP Header and extract path from it
        self._set_headers()
        passed = self.path      # catch the passed url
        url = passed[1:]     # cutoff leading /

        if url:
            # append the url as the last argument to args
            args.append(url)
            # download
            subprocess.Popen(args)
        else:
            print('empty request')

def run(server_class=HTTPServer, handler_class=S, port=8000):
    server_address = ('', port)
    httpd = server_class(server_address, handler_class)
    print 'Starting httpd...'
    httpd.serve_forever()

if __name__ == "__main__":
    from sys import argv

    if len(argv) == 2:
        run(port=int(argv[1]))
    else:
        run()

This enables me to download a Video and store it as mp3, while accepting another download-request, but I do not know to perform further operations on the file after it has been downloaded/converted while accepting and starting new downloads/converts.

Using subprocess.call() and wait until youtube-dl has finished would break the option to accept another download, parallel to the current.

And writing a second script which I start with .Popen() to handle download/convert/post-download-things together does not seem to be the right way ^^

Right now this is a Chicken and Egg situation for me... Hope you can enlighten me...


Solution

  • The tip from Sanket Sudake did the trick for me!

    I use Celery as a task manager with defined tasks which i call with the help of async-chaining to process my defined tasks (download & convert & post_download-stuff) after another in dedicated tasks.

    Works pretty pretty well!

    and Celery has SO MUCH MORE functions and techniques to experiment with! But for my self defined case the async-chain of tasks works fine and is the solution for me here!

    I added a systemd-config to use this as daemonized-service manageable with systemd.