I'm trying to compute the number of rows with NA of the whole df as I'm looking to compute the % of rows with NA over the total number of rows of the df.
I have already have seen this post: Determine the number of rows with NAs but it just shows a specific range of columns.
tl;dr: row wise, you'll want sum(!complete.cases(DF))
, or, equivalently, sum(apply(DF, 1, anyNA))
There are a number of different ways to look at the number, proportion or position of NA
values in a data frame:
Most of these start with the logical data frame with TRUE
for every NA
, and FALSE
everywhere else. For the base dataset airquality
is.na(airquality)
There are 44 NA
values in this data set
sum(is.na(airquality))
# [1] 44
You can look at the total number of NA
values per row or column:
head(rowSums(is.na(airquality)))
# [1] 0 0 0 0 2 1
colSums(is.na(airquality))
# Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0
You can use anyNA()
in place of is.na()
as well:
# by row
head(apply(airquality, 1, anyNA))
# [1] FALSE FALSE FALSE FALSE TRUE TRUE
sum(apply(airquality, 1, anyNA))
# [1] 42
# by column
head(apply(airquality, 2, anyNA))
# Ozone Solar.R Wind Temp Month Day
# TRUE TRUE FALSE FALSE FALSE FALSE
sum(apply(airquality, 2, anyNA))
# [1] 2
complete.cases()
can be used, but only row-wise:
sum(!complete.cases(airquality))
# [1] 42