Search code examples

Databricks subprocess vs os.system

I have the following shell command that I'm trying to run in databricks:

find /dbfs/mnt/data/ -name somename.tar.tar -exec tar -xvzf {} -C /dbfs/mnt/raw/data \;

When I run it as a shell command or using os.system as shown below in a databricks notebook it works and extracts the files. Shell:

find /dbfs/mnt/data/ -name somename.tar.tar -exec tar -xvzf {} -C /dbfs/mnt/raw/data \;


cmd = ['find', '/dbfs/mnt/data/', '-name', 'somename.tar.tar', '-exec', 'tar', '-xvzf', '{}', '-C', '/dbfs/mnt/raw/data', '\\;']
cmd_join = " ".join(cmd)

But running it as a subprocess does not seem to do anything even if the cell runs successfully.

Why is this the case?


  • When you run subprocess with a list, you need to peel off any quotes or escapes which were necessary when you ran the command with a shell between you and the command line.

    Specifically, the backslash before the ; is necessary when you have a shell because the semicolon character by itself is a statement terminator in the shell. But now you don't have a shell; so, take it out.

    cmd = [
        'find', '/dbfs/mnt/data/', '-name', 'somename.tar.tar',
        '-exec', 'tar', '-xvzf', '{}', '-C', '/dbfs/mnt/raw/data', ';']

    Probably also add check=True to your command;

    s =, check=True)