I have the following shell command that I'm trying to run in databricks:
find /dbfs/mnt/data/ -name somename.tar.tar -exec tar -xvzf {} -C /dbfs/mnt/raw/data \;
When I run it as a shell command or using os.system as shown below in a databricks notebook it works and extracts the files. Shell:
%sh
find /dbfs/mnt/data/ -name somename.tar.tar -exec tar -xvzf {} -C /dbfs/mnt/raw/data \;
python:
cmd = ['find', '/dbfs/mnt/data/', '-name', 'somename.tar.tar', '-exec', 'tar', '-xvzf', '{}', '-C', '/dbfs/mnt/raw/data', '\\;']
cmd_join = " ".join(cmd)
os.system(cmd_join)
But running it as a subprocess does not seem to do anything even if the cell runs successfully.
subprocess.run(cmd)
Why is this the case?
When you run subprocess
with a list, you need to peel off any quotes or escapes which were necessary when you ran the command with a shell between you and the command line.
Specifically, the backslash before the ;
is necessary when you have a shell because the semicolon character by itself is a statement terminator in the shell. But now you don't have a shell; so, take it out.
cmd = [
'find', '/dbfs/mnt/data/', '-name', 'somename.tar.tar',
'-exec', 'tar', '-xvzf', '{}', '-C', '/dbfs/mnt/raw/data', ';']
Probably also add check=True
to your command;
s = subprocess.run(cmd, check=True)