Search code examples
gitpython-3.xgitpython

Get a binary file from gitpython by revision (got unicodestring, want bytes)


I want to access the content of a binary in a git repository using gitpython. Unfortunately repo.git.show returns an unicode string and not a bytes object. So I want to convert the string into bytes and fail to do that.

#!/usr/bin/env python

from io import BytesIO
import git

# initialize repository
repo = git.Repo('.')
# use git show to get the content of example.jpg in revision 19e91a
u = repo.git.show("4cb2a02:example.jpg")

b = BytesIO(u.encode('utf-8'))

and run into

UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 0: surrogates not allowed

Which is not a surprise.

How can I convert this unicode string into bytes? Or better, how do i fetch the content of the file as byte object?


Solution

  • try

    b = BytesIO(u.encode('utf-8','surrogateescape'))