I have colored a graph with ggplot2 based on a threshold value of 1. Surface scores greater than 1 was colored azure and surface scores less than 1 is colored beige. Here is my sample code.
library(ggplot2)
setwd("F:/SUST_mutation/Graph_input")
d <- read.csv(file = "N.csv", sep = ",", header = TRUE)
ggplot(d, aes(x= Position,y= wild_Score)) + xlab("Positions") + ylab("Scores") +
geom_ribbon(aes(ymin=pmin(wild_Score,1), ymax=1), fill="beige", alpha= 1.5) +
geom_ribbon(aes(ymin=1, ymax=pmax(wild_Score,1)), fill="azure", alpha= 1.5)
My problem is that if I go through the upper surface to the lower surface, I expect the surface line in one line.
But if you see the figure, you will see that they are not. Around the threshold line, the lower surface does not meet the upper surface rather it creates some extra surface. For convenience, I have marked the portions with a red circle.
extra surface on the negative portion close to threshold:
Position Wild_Score
4 1.048
5 1.052
6 1.016
7 0.996
8 0.97
9 0.951
10 0.971
11 1.047
12 1.036
13 1.051
14 1.124
15 1.172
16 1.172
17 1.164
18 1.145
19 1.186
20 1.197
21 1.197
22 1.216
23 1.193
24 1.216
25 1.216
26 1.262
Problem-2: I have a data frame like following.
Position Score_1 Score_2
4 1.048 1.048
5 1.052 1.052
6 1.016 1.016
7 0.996 1.433
8 0.97 1.432
9 0.951 1.567
10 0.971 1.231
11 1.047 1.055
12 1.036 1.036
13 1.051 1.051
14 1.124 1.124
15 1.172 1.172
16 1.172 1.172
17 1.164 1.164
I plot the surface for position vs score_1 with Tibble and a line graph on that surface with the same positions vs score_2 like the following, desired graph As the line just differs at some points I subsetted the main dataset(both column and row). I get the following error. "Error: Aesthetics must be either length 1 or the same as the data (13): x" I guess this is because I used two different data frames for the graphs. here is my code:
d <- read.csv(file = "E.csv", sep = ",", header = TRUE)
d1 <- tibble::tibble(
x = seq(min(d$Position), max(d$Position), length.out = 1000),
y = approx(d$Position, d$Score_1, xout = x)$y
)
ggplot(d1, aes(x= x,y= y)) + xlab("Positions") + ylab("Scores") +
geom_ribbon(aes(ymin=pmin(y,1), ymax=1), fill="red", alpha= 1.5) +
geom_ribbon(aes(ymin=1, ymax=pmax(y,1)), fill="blue", alpha= 1.5) +
geom_line(aes(y=1)) + geom_line(d = d[c(3:10), c(1,3)],aes(y =
Score_2), color = "blue", size = 1)
I want to know what is causing the problem and how should I deal with it?
It's because the negative surface at, for example, row 3 and 4 starts from 1 and goes to 0.996, instead of going from 1.016 to 0.996. Relevant discussion and other examples at ggplot2's issue tracker.
This problem is typically only visible if the number of observations is small-ish, so the typical way people overcome this problem is to interpolate the data. You can find an example of that below (I've omitted your colours because it was hard to see):
library(ggplot2)
# txt <- "your_example_table" # Omitted for brevity
df <- read.table(text = txt, sep = "\t", header = TRUE)
data2 <- tibble::tibble(
x = seq(min(df$Position), max(df$Position), length.out = 1000),
y = approx(df$Position, df$Wild_Score, xout = x)$y
)
ggplot(data2, aes(x= x,y= y)) + xlab("Positions") + ylab("Scores") +
geom_ribbon(aes(ymin=pmin(y,1), ymax=1, fill = "A")) +
geom_ribbon(aes(ymin=1, ymax=pmax(y,1), fill = "B"))
This is great for hiding the problem, but calculating the exact line intersection points is a bit of a pain. I apologise for the self-promotion but I ran into this too and wrapped my solution for finding these line intersection points in a function on the dev version of my package ggh4x, which you might find useful.
library(ggh4x) # devtools::install_github("teunbrand/ggh4x")
ggplot(df, aes(x= Position,y= Wild_Score)) +
stat_difference(aes(ymin = 1, ymax = Wild_Score))
Created on 2021-08-15 by the reprex package (v1.0.0)