I used 2 ways to import a data (in CSV format) into R. The first method does not have the na.string
argument, while the second one has. I used the second one because some of the strings appeared as ""
instead of NA
after the import, and that I want to standardise all missing values as NA
.
data1<-read.csv("file.csv",stringsAsFactors=FALSE)
data2<-read.csv("file",stringsAsFactors=FALSE,na.string="")
I have 3 variables that are indicators. They give "X"
for yes and ""
/NA
for no. I tried to apply the following function to both data1
and data2
above.
df1<-data1%>%
mutate(Indicator_Institution=ifelse(Indicator_A=="X",1,
ifelse(Indicator_B=="X",2,
ifelse(Indicator_C=="X",3,NA))))
df2<-data2%>%
mutate(Indicator_Institution=ifelse(Indicator_A=="X",1,
ifelse(Indicator_B=="X",2,
ifelse(Indicator_C=="X",3,NA))))
df1
's ifelse
function went through all the conditions, while df2
only run the first condition. Any idea why? What difference does the argument na.string=""
make?
Reproducible example:
> dput(droplevels(head(data1)))
structure(list(Indicator_A = c("X", "X", "X", "X", "", ""),
Indicator_B = c("", "", "", "", "X", "X"), Indicator_C = c("",
"", "", "", "", "")), .Names = c("Indicator_A", "Indicator_B",
"Indicator_C"), row.names = c(NA, 6L), class = "data.frame")
> dput(droplevels(head(data2)))
structure(list(Indicator_A = c("X", "X", "X", "X", NA, NA),
Indicator_B = c(NA, NA, NA, NA, "X", "X"), Indicator_C = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_)), .Names = c("Indicator_A", "Indicator_B",
"Indicator_C"), row.names = c(NA, 6L), class = "data.frame")
The reason is that for the second case instead of blanks, we have NA
. If we use ==
, the NA values will remain as such. To make those values as FALSE, use the &
with !is.na
data2 %>%
mutate(Indicator_Institution = ifelse(Indicator_A == "X" & !is.na(Indicator_A), 1,
ifelse(Indicator_B=="X" & !is.na(Indicator_B), 2,
ifelse(Indicator_C == "X" & !is.na(Indicator_C), 3,
NA))))
Based on the example provided, this can be done easily with which
which(!is.na(data2), arr.ind=TRUE)[,2]