Search code examples
pythonhdfsbinaryfilespython-cryptography

Read a binary file stored in HDFS with Python


I have some binary files. When i store them to local files i can read them as binary files.

with open("binary_file", 'rb') as f:
    print("Binary file:  ", f.read())

Result:

Binary file:   b'Ix\x9d\xdf\xd2\xf6\x83\xe8B\x95.... (a long binary)

But i want to store them and retrieve them from HDFS. When i use the following commands:

f = os.popen("hdfs dfs -cat binary_file")
print("Binary file:  ", f.read())

I get an error on the 'print':

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 2: invalid start byte

I think that this commands reads the file as a text file. How can i explicitly read the file as binary?


Solution

  • os.popen() defaults to a text r mode.

    Use subprocess.check_output() instead; it defaults to binary:

    import subprocess
    output = subprocess.check_output("hdfs dfs -cat encrypted_file", shell=True)