Loading compressed baboon interaction data in Python

I am interested in reading the observational data and wearable sensor data available here into Python. Specifically, I would like to get them into Pandas dataframes, but even getting them into a more familiar form would effectively answer the question.

Both files are *.txt.gz files. I have tried to read them like this:

import gzip

with'../data/OBS_data.txt.gz', 'rb') as f:


But it is clear from printing the file contents that it is in some sort of encoding. I tried converting it a utf-8 string unsuccesfully with


But this gives the error:

UnicodeDecodeError                        Traceback (most recent call last)
Cell In [15], line 4
      1 with'../data/OBS_data.txt.gz', 'r') as f:
----> 4 print(file_content.decode("utf-8"))

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I also tried using Pandas directly:

df = pd.read_csv('../data/OBS_data.txt.gz', compression='gzip')

But that gives a similar error:

So I may have misunderstood the encoding.

How do I load this data?

Strangely, this works

df2 = pd.read_csv('', sep='\t')
            t        i         j          DateTime
0  1560396500  ARIELLE      FANA  13/06/2019 05:28
1  1560396500  ARIELLE  VIOLETTE  13/06/2019 05:28
2  1560396520     FANA    HARLEM  13/06/2019 05:28
3  1560396540   FELIPE    ANGELE  13/06/2019 05:29
4  1560396540  ARIELLE      FANA  13/06/2019 05:29

but this doesn't

df = pd.read_csv('', sep='\t')
My version of pandas is 1.3.5, and here is my OS (pretty fresh install; last week):

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy

Reinstalling pandas didn't work. I tried removing all 3rd party Python packages, then reinstalling pandas using pip and it still didn't work.


  • You're over-complicating things, pandas.read_csv will read zipped files without having to unzip them.~

    df = pd.read_csv('', sep='\t')
    df2 = pd.read_csv('', sep='\t')


                t        i         j          DateTime
    0  1560396500  ARIELLE      FANA  13/06/2019 05:28
    1  1560396500  ARIELLE  VIOLETTE  13/06/2019 05:28
    2  1560396520     FANA    HARLEM  13/06/2019 05:28
    3  1560396540   FELIPE    ANGELE  13/06/2019 05:29
    4  1560396540  ARIELLE      FANA  13/06/2019 05:29
               DateTime  Actor Recipient   Behavior Category  Duration Point
    0  13/06/2019 09:35  EWINE       NaN  Invisible    Other        34    NO
    1  13/06/2019 09:35  EWINE       NaN      Other    Other        21    NO
    2  13/06/2019 09:35  EWINE       NaN  Invisible    Other        42    NO
    3  13/06/2019 09:36  EWINE       NaN      Other    Other         2    NO
    4  13/06/2019 09:36  EWINE       NaN  Invisible    Other        30    NO

    If downloaded already:

    df = pd.read_csv('../data/OBS_data.txt.gz', sep='\t')