I have a dataframe correspond on frequency of each variable (example:variable 1 appear 1984 times and variable 2 appear 974 ...)
dff<-data.frame(Var1=c(1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11, 12 ,13 ,14 ,15 ,16 ,27, 30 ,35 ,36 ,38 ,39 ,40 ,41), Freq=c(1984, 974, 464 , 251 , 127 , 83 , 45 , 26 , 16 , 12 , 9 , 5 , 5 , 2 ,3 , 1 , 1 , 1 , 1 , 2 , 5,4,2,1))
plot(dff$Var1,log(dff$Freq))
as we see in the picture, there is a linear regression,
I just want to find a method that can fit only the regression part where the linear fit intersect the x axis.
because, I need to extract the other points that are not fitted in the linear fit (points after dff$Var1=20
) that is mean is not a noise point
What about this:
plot(dff$Var1, log(dff$Freq))
lr <- lm(log(Freq) ~ Var1, data = dff[dff$Var1 < 20, ])
abline(lr)
The cutoff point is 20. But you can vary it according to what you are doing.
If you want to calculate
where the linear fit intersect the x axis.
Get the coefficients:
coef(lr)
(Intercept) Var1
7.4636699 -0.4741615
And solve the equation 7.4636699 + Var1*(-0.4741615) = 0.