Search code examples
gitparallel-processinggit-submodules

Execute "git submodule foreach" in parallel


Is there any way to execute a git submodule foreach command in parallel, similarly of how the --jobs 8 parameter works with git submodule update?

For example, one of the projects we work on involves almost 200 sub-components (submodules) and we heavily use the foreach command to operate on them. I'd like to speed them up.

PS: In the case the solution involves a script, I work on Windows and, most of the time, using git-bash.


Solution

  • I propose you a solution based on a interpreted language multiplatform like Python.


    Process Launcher


    First of all you need define a class to manage the process to launch the command.

    class PFSProcess(object):
        def __init__(self, submodule, path, cmd):
            self.__submodule = submodule
            self.__path = path
            self.__cmd = cmd
            self.__output = None
            self.__p = None
    
        def run(self):
            self.__output = "\n\n" + self.__submodule + "\n"
            self.__p = subprocess.Popen(self.__cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True,
                                 cwd=os.path.join(self.__path, self.__submodule))
            self.__output += self.__p.communicate()[0].decode('utf-8')
            if self.__p.communicate()[1]:
                self.__output += self.__p.communicate()[1].decode('utf-8')
            print(self.__output)
    


    Multithreading


    Next step is a generate multithread execution. Python includes in its core very powerful library to work with Threads. You can use it importing the following package:

    import threading
    

    Before threads creation you need create a worker, a function to call for each thread:

    def worker(submodule_list, path, command):
        for submodule in submodule_list:
            PFSProcess(submodule, path, command).run()
    

    As you can see the worker recives a submodule list. For clarity and because it is out of our scope, I recommend you take a look to .gitmodules from where you can generate the list of your submodules reading the file.


    💡 < Tip >

    As basic orientation you can find the following line in each submodule:

    path = relative_path/project
    

    For that purpose you can use this regular expression:

    'path ?= ?([A-za-z0-9-_]+)(\/[A-za-z0-9-_]+)*([A-za-z0-9-_])'
    

    If the regular expression matches you can get the relative path using the following one in the same line:

    ' ([A-za-z0-9-_]+)(\/[A-za-z0-9-_]+)*([A-za-z0-9-_])'
    

    Pay attention because the last regular expression returns the relative path with a space character at first position.

    💡 < / Tip>


    Then split the submodule list into as many chunks as jobs that you want:

    num_jobs = 8
    
    i = 0
    for submodule in submodules:
        submodule_list[i % num_jobs].append(submodule)
        i += 1
    

    Finally dispatch each chunk (job) to each thread and wait until all threads finish:

    for i in range(num_jobs):
        t = threading.Thread(target=worker, args=(list_submodule_list[i], self.args.path, self.args.command,))
        self.__threads.append(t)
        t.start()
    
    for i in range(num_jobs):
        self.__threads[i].join()
    


    Obviously I have exposed the basic concepts, but you can access to full implementation accessing to parallel_foreach_submodule (PFS) project in GitHub.