EDIT: I have made a mistake in my previous code that I shared. I replaced the "bins" with "b" but missed one...
I also use the correct data.frame now (y instead of the original df.score)
NEW code:
# some data
x <- runif(1000)
x2 <- rnorm(1000)
y <- data.frame(x,x2)
# we want to bin the dataframe y acording to values in x into b bins
b = 10
bins=10
# we create breaks in several ways
breaks=unique(quantile(x, probs=seq.int(0,1, by=1/b)))
breaks=unique(quantile(y$x, probs=seq.int(0,1, length.out=b+1)))
# now to the question
# this wokrs
y$b <- with(y, cut(x, breaks=unique(quantile(x, probs=seq.int(0,1, length.out=11))), include.lowest=TRUE))
table(y$b)
# this works too
y$b2 <- with(y, cut(x, breaks=unique(quantile(x, probs=seq.int(0,1, length.out=(bins+1)))), include.lowest=TRUE))
table(y$b2)
# this does not work
y$b3 <- with(y, cut(x, breaks=unique(quantile(x, probs=seq.int(0,1, length.out=(b+1)))), include.lowest=TRUE))
Error in seq.int(0, 1, length.out = (b + 1)) : 'length.out' must be a non-negative number In addition: Warning message: In Ops.factor(b, 1) : + not meaningful for factors
Now if I split the code up there is no issue !!!
brks=unique(quantile(x, probs=seq.int(0,1, length.out=(b + 1))))
y$b3 <- with(y, cut(x, breaks=brks, include.lowest=TRUE))
I am lost here...
This is part of more dynamic code, knitred together based on details in the data set.
So I want to create bins on the fly and report on them. The code works now but I do not understand why when I use the word "bins" the code works and when using the "b" it fails...?
OLD from here I need to add bins dynamically to a dataframe so I can report on them later.
# some data
x <- runif(1000)
x2 <- rnorm(1000)
y <- data.frame(x,x2)
# we want to bin the dataframe y acording to values in x into b bins
b = 10
# we create breaks in several ways
breaks=unique(quantile(x, probs=seq.int(0,1, by=1/b)))
breaks=unique(quantile(y$x, probs=seq.int(0,1, length.out=b+1)))
# now to question
# this works
y$bins <- with(df.score, cut(x, breaks=unique(quantile(Pchurn, probs=seq.int(0,1, length.out=11))), include.lowest=TRUE))
table(y$bins)
So if I want to do the exact same using the bin var directly it fails:
# this does not work
y$bins <- with(df.score, cut(x, breaks=unique(quantile(Pchurn, probs=seq.int(0,1, length.out=bins+1))), include.lowest=TRUE))
Error in seq.int(0, 1, length.out = (bins + 1)) :
'length.out' must be a non-negative number
In addition: Warning message:
In Ops.factor(bins, 1) : + not meaningful for factors
What am I missing here?
I think you want this (substituting b
for bins
in the length parameter calc just below "#this does not work":
y$bins <- with(df.score, cut(x,
breaks=unique(quantile(Pchurn,
probs=seq.int(0,1, length.out=b+1))),
include.lowest=TRUE))
Hard to test without a score variable and a more complete description of the goals, but at least the code does not throw an error with this in the workspace.
df.score=data.frame(Pchurn=rnorm(100), x=rnorm(100))