I am trying to calculate the average duration of UFO sighting (continuous) for each categorical shape that it is related with. Essentially, what is the average sighting length for each UFO shape?
I tried:
a <- aggregate(duration..seconds. ~ shape, data=alien, FUN=mean, na.rm=TRUE)
barplot(a$duration..seconds., names.arg=a$shape)
and got:
no non-missing arguments to min; returning Infno non-missing arguments to max;
returning -InfError in plot.window(xlim, ylim, log = log, ...) : need finite 'ylim' values
I realize that I need to alter my data somehow. I would like to simply remove all of the data that has missing corresponding data (ie, we know the shape but the duration is missing - and vice versa), but I don't quite know how to do this.
Thanks for your help!
PS. the "duration..seconds." is correct, that is how it transferred over from the excel file.
shape duration..seconds.
us changing 3600 NA 4/27/2004 29.8830556
us changing 300 NA 12/16/2005 29.38421
us changing 3600 NA 1/21/2008 53.2
us changing 900 NA 1/17/2004 28.9783333
ca changing 1200 NA 1/22/2004 21.4180556
us changing 3600 NA 4/27/2007 36.595
There are 80000 logs of UFO sightings, which is why I am trying to average it. And there are 29 different shapes.
Data
df <- read.table(text="
country shape duration_seconds dummy1 date dummy2
us changing 3600 NA 4/27/2004 29.8830556
us changing 300 NA 12/16/2005 29.38421
us changing 3600 NA 1/21/2008 53.2
us changing 900 NA 1/17/2004 28.9783333
ca changing 1200 NA 1/22/2004 21.4180556
us changing 3600 NA 4/27/2007 36.595
", header = TRUE, stringsAsFactors = FALSE)
You can fix the column titles with
names(df) <- c("country", "shape", "duration_seconds", "dummy1", "date", "dummy2")
Using library dplyr
library(dplyr)
df %>%
group_by(shape) %>%
summarize(mean_duration_seconds = mean(duration_seconds))
# shape mean_duration_seconds
# <chr> <dbl>
# 1 changing 2200.
And using the original code
names(df) <- c("country", "shape", "duration_seconds", "dummy1", "date", "dummy2")
a <- aggregate(duration_seconds ~ shape, data=df, FUN=mean, na.rm=TRUE)
barplot(a$duration_seconds, names.arg=a$shape)
a
# shape duration_seconds
# 1 changing 2200