rknnldakruskal-wallispairwise.wilcox.test# How to find meaningful boundaries between two continuous variables in R

To find the relationship between two columns of the iris dataset, I am performing kruskal.test and p.value shows a meaningful relationship between these two columns.

```
data(iris)
kruskal.test(iris$Petal.Length, iris$Sepal.Width)
```

Here are the results:

```
Kruskal-Wallis rank sum test
data: iris$Petal.Length and iris$Sepal.Width
Kruskal-Wallis chi-squared = 41.827, df = 22, p-value = 0.00656
```

The Scatter plot also shows some sort of relationship.
`plot(iris$Petal.Length, iris$Petal.Width)`

To find the meaningful boundaries of these two variables, I ran `pairwise.wilcox.test`

test, but for this test to work, one of the variables needs to be categorical. If I pass both continuous variables to it, then the results are not as expected.

```
pairwise.wilcox.test(x = iris$Petal.Length, g = iris$Petal.Width, p.adjust.method = "BH")
```

As an output, I need a clear cut point where these two variables have some sort of relationship and where this relationship ends (As shown through the red line in the attached image above)

I am not sure if there is any statistical test or another programming technique to find these boundaries.

e.g. manually I can do something like this to mark boundaries -

```
setDT(iris)[, relationship := ifelse(Petal.Length > 3 & Sepal.Width < 3.5, 1, 0)]
```

But, is there a programming technique or library in R to find such boundaries?

It is important to note that my actual data is skewed.

Thanks, Saurabh

Solution

There is not sth like the best split. It could be the best under certain conditions/criteria you will specify.

I think you expected second plot although I added the first one too where you have one line. There is used a Linear Discriminant Analysis. However this is supervised learning as we have Species column. So you might be interested in unsupervised methods like K-Nearest Neighborhoods and boundaries for them - then check this one https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearest-neighbor-classifier-from-elements-o.

```
data(iris)
library(MASS)
plot(iris$Petal.Length, iris$Petal.Width, col = iris$Species)
# construct the model
mdl <- lda(Species ~ Petal.Length + Petal.Width, data = iris)
# draw discrimination line
np <- 300
nd.x <- seq(from = min(iris$Petal.Length), to = max( iris$Petal.Length), length.out = np)
nd.y <- seq(from = min(iris$Petal.Width), to = max( iris$Petal.Width), length.out = np)
nd <- expand.grid(Petal.Length = nd.x, Petal.Width = nd.y)
prd <- as.numeric(predict(mdl, newdata = nd)$class)
plot(iris[, c("Petal.Length", "Petal.Width")], col = iris$Species)
points(mdl$means, pch = "+", cex = 3, col = c("black", "red"))
contour(x = nd.x, y = nd.y, z = matrix(prd, nrow = np, ncol = np),
levels = c(1, 2), add = TRUE, drawlabels = FALSE)
#create LD sequences from min - max values
p = predict(mdl, newdata= nd)
p.x = seq(from = min(p$x[,1]), to = max(p$x[,1]), length.out = np) #LD1 scores
p.y = seq(from = min(p$x[,2]), to = max(p$x[,2]), length.out = np) #LD2 scores
contour(x = p.x, y = p.y, z = matrix(prd, nrow = np, ncol = np),
levels = c(1, 2, 3), add = TRUE, drawlabels = FALSE)
```

Linked to: How to plot classification borders on an Linear Discrimination Analysis plot in R

- Installing R on Linux: configure: error: libcurl >= 7.28.0 library and headers are required with support for https
- How to do ensembles with time series using AICc?
- planes3d expands and draws the area based on the sphere's radius
- How to extract tag code itself using R, rvest
- How to Display or Print Contents of Environment in R
- How to use Windows user credentials for proxy authentication in R/RStudio
- R reticulate specifying python executable to use
- Replace multiple Instances of a variable name in an R function and save the modified function
- Standardizing address formatting in R
- How to fix "failed to load cairo DLL" in R?
- Using grepl to filter columns names in specific range of columns
- changing the legends in ggplot2 to have groups of similar labels
- How to keep only unique rows but ignore a column?
- convert string date to R Date FAST for all dates
- Add subgroup text to plotly pie chart
- R Shiny : adjust height of DT datatable when fillContainer=TRUE,
- Why do R external pointers' "unusual copying semantics" mean they should not be used stand-alone?
- How to extract somo character after a string with a number of word which can change in R
- What does `se` stand for in geom_smooth(..., se = FALSE)?
- How to find number of rows greater than any values in R
- Align text and reduce space between text and parentheses in plotly hover info box
- Remove outer box of geom_bar plot with broken y-axis
- How to use lag/lead in mutate with an initial value?
- Is it possible to have a Shiny ConditionalPanel whose condition is a global variable?
- counting elements in one list in another list
- How to vectorize nested loops in R?
- Replace NA values with an incrementing sequence starting from the previous non-NA value
- How can I calculate the number of uniques in a row within a species matrix?
- How to perform operations on pairs of rows, based on a "distinguishing" column's values
- Mutate variable based on previous observations