Search code examples
pythonbashdocker

Storing arbitrary git patches generated in Python in a file inside a Docker container with its Python SDK


The Problem

The patch is generated in Python and then I would like to persist this data in changes.patch inside the Docker container so that I can subsequently apply the patch and commit the changes etc. The use case for this is to have an AI Agent automatically select hunks and process them into commits iteratively. Due to this I am, unless I missed something, not able to use the git built-in interactive commands (in this case git add -p) as these hang in an open file in the terminal.

To fill the patch file with the generated diff from I have repeatedly stumbled upon Here-Documents [1][2][3], which I am currently trying to implement this with.

The relevant portion of my code looks as follows:

command = '/bin/bash -c "{command_to_execute}"'  # Need to quote command for it to actually be executed in the container
update_patch_file_command = command_to_execute.format(command=f"cat > {file <<EOL\n{data}\nEOL")

err_code, output = self.container.exec_run(
        update_patch_file_command,
        privileged=False,
        workdir=self.repository_work_dir
)

A minimal example for reproduction is this data:

data = 'tion(\"Could not construct input object: missing field \'\${it.name}\' in \'\${graphQLType.name}\' \")'

with which the code runs and results in the following update_patch_file_command string:

"/bin/bash -c "cat > all_changes.patch <<EOL
tion("Could not construct input object: missing field '\\${it.name}' in '\\${graphQLType.name}' ")
EOL""

However the container.run_exec returns b"not: line 2: warning: here-document at line 1 delimited by end-of-file (wanted EOL')\n" for this command.

As you can see above, EOL does not have any whitespace in front of it (this is copied directly from the debugger), which this error might hint at.

Things I've tried

I have tried the following other things:

  • echo: doesnt work because my strings are multiline
  • not double quoting command in the bash call: doesnt work because then the command isnt executed at all
  • piping to a file that does not yet exist
  • using elevated permissions for the container.run_exec
  • using >| to force overwrite the file
  • using shlex.quote: same error b"not: line 2: warning: here-document at line 1 delimited by end-of-file (wanted EOL')\n". Additionally I think using it is safe, as far as I know.
  • using tee instead of cat

Other ideas (more related to the overarching goal instead of the problem above, but I decided to add this for extra context on what I'm trying to do):

  • I could copy the file via the Python Docker SDK put_archive function, but that seems super hacky and probably scales poorly. So I would like to find another solution.
  • I tried using setting the container up with tty and opening stdin, but that didnt result in me being able to directly use the interactive git functionality (eg in the case described above the equivalent would be git add -p)

EDIT 1: Also I just realized that I could perform the entire hunk extraction logic in a shell script perhaps and completely avoid interfacing with docker/python.

EDIT 2: Changed accompanying example to a simpler case.


Solution

  • I took a step back and realized that I could achieve the desired behavior much more easily by writing the changes to a temporary file locally and copying the file into the docker container:

    with open(file, 'w+') as f:
        f.write(extracted_data)
    
    subprocess.run(['docker', 'cp', f'{file}', f'{self.container.id}:{self.repository_work_dir}/{file}'])