Search code examples
pythonrbinarydummy-variable

Python transforms 0 to 0.693147180559945


I created a dataframe in R with a column that holds dummy variables (thus 1 or 0) and saved it to file using

write.table(my_df,"my_df.txt",sep=" ", eol="\r\n", row.names=FALSE)

Then, I read the file into Python using

with open('./my_df.txt', 'r') as myfile:
    my_df = myfile.read().splitlines()

Eventually, I want to do something with the column holding the dummy variable:

header = my_df[0].split(' ')
body = my_df[1:]
for i,j in enumerate(header):
    if j == '"dummy_variable_column"':
        column_index = i
dummies = [row.split(' ')[column_index].replace('"', '') for row in body]

This is an approach I often use. However, in this specific case some values in the variable dummies, in which the column of question is kept, are 0.693147180559945. I cannot explain this to myself, there are only 0s and 1s supposed to be in the variable. Does somebody know what's going on?

*second edit (because of the comments)

This is the output of print(my_df[:20])

"subject" "session" "trial" "age" "gender" "dummy_variable_column"
"s1" 1 2 19 "female" 0
"s1" 1 4 19 "female" 0
"s1" 1 11 19 "female" 0
"s1" 1 14 19 "female" 1
"s1" 1 15 19 "female" 0
"s1" 1 16 19 "female" 0
"s1" 1 17 19 "female" 1
"s1" 1 21 19 "female" 0
"s1" 1 24 19 "female" 0
"s1" 1 26 19 "female" 0
"s1" 1 39 19 "female" 0
"s1" 1 40 19 "female" 0
"s1" 1 41 19 "female" 1
"s1" 1 45 19 "female" 0
"s1" 1 48 19 "female" 0
"s1" 1 49 19 "female" 0
"s1" 1 50 19 "female" 0
"s1" 1 59 19 "female" 1
"s1" 1 61 19 "female" 0

However, print(my_df[37045]) does produce

"s20" 1 26 19 "male" 0.693147180559945

Furthermore, I would like to point out that in R after the command unique(my_df$dummy_variable_column) the following output is given: 0 1

*third edits because of comments

This is how I work with my column:

header = my_df[0].split(' ')
body = my_df[1:]
for i,j in enumerate(header):
    if j == '"dummy_variable_column"':
        dummy_index = i
dummies = [item.split(' ')[dummy_index] for item in my_df]

And for instance print(dummies[37044]) outputs 0.693147180559945


Solution

  • It turned out that there is one column in the R dataframe, which consists of values such as 're + ba'. Because of the space, the split on spaces in the list comprehension dummies = [item.split(' ')[dummy_index] for item in my_df] (s. 3rd edit) does fail to grab the value from the correct column.