I created a dataframe in R with a column that holds dummy variables (thus 1 or 0) and saved it to file using
write.table(my_df,"my_df.txt",sep=" ", eol="\r\n", row.names=FALSE)
Then, I read the file into Python using
with open('./my_df.txt', 'r') as myfile:
my_df = myfile.read().splitlines()
Eventually, I want to do something with the column holding the dummy variable:
header = my_df[0].split(' ')
body = my_df[1:]
for i,j in enumerate(header):
if j == '"dummy_variable_column"':
column_index = i
dummies = [row.split(' ')[column_index].replace('"', '') for row in body]
This is an approach I often use. However, in this specific case some values in the variable dummies
, in which the column of question is kept, are 0.693147180559945
. I cannot explain this to myself, there are only 0s and 1s supposed to be in the variable. Does somebody know what's going on?
*second edit (because of the comments)
This is the output of print(my_df[:20])
"subject" "session" "trial" "age" "gender" "dummy_variable_column"
"s1" 1 2 19 "female" 0
"s1" 1 4 19 "female" 0
"s1" 1 11 19 "female" 0
"s1" 1 14 19 "female" 1
"s1" 1 15 19 "female" 0
"s1" 1 16 19 "female" 0
"s1" 1 17 19 "female" 1
"s1" 1 21 19 "female" 0
"s1" 1 24 19 "female" 0
"s1" 1 26 19 "female" 0
"s1" 1 39 19 "female" 0
"s1" 1 40 19 "female" 0
"s1" 1 41 19 "female" 1
"s1" 1 45 19 "female" 0
"s1" 1 48 19 "female" 0
"s1" 1 49 19 "female" 0
"s1" 1 50 19 "female" 0
"s1" 1 59 19 "female" 1
"s1" 1 61 19 "female" 0
However, print(my_df[37045])
does produce
"s20" 1 26 19 "male" 0.693147180559945
Furthermore, I would like to point out that in R after the command unique(my_df$dummy_variable_column)
the following output is given: 0 1
*third edits because of comments
This is how I work with my column:
header = my_df[0].split(' ')
body = my_df[1:]
for i,j in enumerate(header):
if j == '"dummy_variable_column"':
dummy_index = i
dummies = [item.split(' ')[dummy_index] for item in my_df]
And for instance print(dummies[37044])
outputs 0.693147180559945
It turned out that there is one column in the R dataframe, which consists of values such as 're + ba'
. Because of the space, the split on spaces in the list comprehension dummies = [item.split(' ')[dummy_index] for item in my_df]
(s. 3rd edit) does fail to grab the value from the correct column.