Initially I start out with 2 vectors (subsets of my data). I run ecdf on both, plot them in the same plot for ease of comparison. All of that is fine but what I need to know is how to make the function work universally for any pair of vectors, so I can just enter in the vector and the function will work. Like if I were to call the larger vector second I want the axes of the plot to automatically scale for the larger vector, regardless of order called so that no data is lost.
I have included a set up to use the iris data just in case.
data=iris
virg<-subset(iris, iris$Species=="virginica"); virg
virg_pl<-virg$Petal.Length; virg_pl
versi<-subset(iris, iris$Species=="versicolor"); versi
versi_pl<-versi$Petal.Length; versi_pl
here is what i have:
twoecdfsoner<-function(x,y,z){
ecdf1<-ecdf(x)
ecdf2<-ecdf(y)
plot(ecdf1,xlab=head(z,n=1),
ylab="cumulative relative frequency",
lty=1,pch=".",
main="",
do.point=FALSE,
verticals=TRUE,xlim=c(min (y),max (x)))
plot(ecdf2,verticals=TRUE,
do.points=FALSE,
lty=3,pch=".",
add=TRUE, xlim=c(min (y),max (x)))
legend("right","center",
legend=c(deparse(substitute(x)),
deparse(substitute(y))),
lty=c(1,3),cex=0.8)
}
twoecdfsoner(virg_pl,versi_pl,"inches")
It seems like i could write a conditional statement but i get this error:
Warning messages:
1: In x > y :
longer object length is not a multiple of shorter object length
2: In x > y :
longer object length is not a multiple of shorter object length
3: In x > y :
longer object length is not a multiple of shorter object length
so far i have tried
xlim=c(min (y),max (x)))
xlim=range(c(x),c(y)
xlim=pmax(x,y)
and writing conditional statements
I would also like the solid line to code for that larger vector. If anyone has any suggestions it would be greatly appreciated.
@42- after reading up a bit I thought I could do a conditional statement, this seems to also work. Do have any critique of running the code this way?
twoecdfsoner<-function(x,y,z){
ecdf_1 <- plot(ecdf(x),
verticals=TRUE,
pch=".",
main="",
do.points=FALSE,
lty=ifelse(max(x)>max(y), c(1), c(3)),
xlab=head(z,n=1),
ylab="Cumulative relative frequency",
xlim=range(x,y))
ecdf_2 <- lines(ecdf(y),
verticals=TRUE,
do.points=FALSE,
lty=ifelse(max(y)>max(x), c(1), c(3)),
pch=".")
legend_text<-
if (max(x)>max(y)){
legend=c(deparse(substitute(x)), deparse(substitute(y)))
} else {max(y)>max(x)
legend=c(deparse(substitute(y)), deparse(substitute(x)))
}
legend("right",
legend=legend_text,
lty=c(1,3))
}
twoecdfsoner(virg_pl,versi_pl,"inches")
There's a problem with ecdf ... it hides the "x" arguments in an environment. Newbies then cannot find them.
> ecdf(versi$Petal.Length)
Empirical CDF
Call: ecdf(versi$Petal.Length)
x[1:19] = 3, 3.3, 3.5, ..., 5, 5.1
> str(ecdf(versi$Petal.Length))
function (v)
- attr(*, "class")= chr [1:3] "ecdf" "stepfun" "function"
- attr(*, "call")= language ecdf(versi$Petal.Length)
At this point it would be instructive to run all the examples in the help page:
?ecdf # and probably also look at ?stepfun
Notice that the result is a function
, so trying to derive a min
and/or max
with no argument will always fail. Once you do find how to access the environment of ecdf (see below) then you would probably want the min of the concatenated values of the two vectors, rather than assuming one has the min and the other has the max. Here's what's in the environments of the ecdf functions:
ls( environment(ecdf(versi_pl)) )
[1] "f" "method" "nobs" "x" "y" "yleft" "yright"
twoecdfsoner<-function(x,y,z){
ecdf1<-ecdf(x);
x1 <- environment(ecdf1)$x
ecdf2<-ecdf(y);
x2 <- environment(ecdf2)$x
plot(ecdf1,xlab=head(z,n=1),
ylab="cumulative relative frequency",
lty=1,pch=".",
main="",
do.point=FALSE,
verticals=TRUE,xlim=c( min ( c(x1,x2) ),max ( c(x1,x2)) ) )
plot(ecdf2,verticals=TRUE,
do.points=FALSE,
lty=3,pch=".",
add=TRUE, xlim=c( min ( c(x1,x2) ), max ( c(x1,x2) )) )
legend("right","center",
legend=c(deparse(substitute(x)),
deparse(substitute(y))),
lty=c(1,3),cex=0.8)
}
twoecdfsoner(versi_pl, virg_pl,"inches")