I have a gene expression dataset for many genes per several conditions. Focussing on one condition, ("DD" for DMSO DMSO), I made a file called "DD_violin_subset" with a table of the enzymes and their expression (TPS) like this:
enzyme | TPS |
---|---|
A4GALT | 0,705748 |
AACS | 39,42209 |
AADAT | 3,619634 |
AAK1 | 16,64294 |
AARS1 | 566,514 |
I would like to make a violinplot of this data but it kept giving me errors.
I already tried several things:
>DD_violin_subset <- fread("C:\\Users\\maris\\Desktop\\DD_violin_subset.csv")
> names(DD_violin_subset)
[1] "enzyme" "TPS"
> str(DD_violin_subset)
Classes ‘data.table’ and 'data.frame': 5 obs. of 2 variables:
$ enzyme: chr "A4GALT" "AACS" "AADAT" "AAK1" ...
$ TPS : chr "0,705747614" "39,42209011" "3,619633792" "16,64293604" ...
- attr(*, ".internal.selfref")=<externalptr>
As you can see, it reads TPS as a character, even though it should be read as numerical values. I think this is what is causing my errors so I tried to write them as numerical values:
> ggplot(DD_violin_subset, aes(y= as.numeric(TPM)))+geom_violin()
Error in `geom_violin()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'TPM' not found
Run `rlang::last_trace()` to see where the error occurred.
So I thought, since TMP is not found, maybe I should specify the dataset again?
>ggplot(DD_violin_subset, aes(y= as.numeric(DD_violin_subset$TPM)))+geom_violin()
Error in `geom_violin()`:
Problem while computing aesthetics.
Error occurred in the 1st layer.
Caused by error in `check_aesthetics()`:
Aesthetics must be either length 1 or the same
as the data (5)
Fix the following mappings: `y`
Run `rlang::last_trace()` to see where the error occurred.
> ```
But no. TPS keeps on being "not found" and they're still read as characters whist they shoud be numericals. I fear this might be the reason why I can't get my plots, but I might be totally in the wrong. I'm new to R and would really appreciate your help! Anyone any ideas how I could get my violinplots?
Thanks a lot for you help and effort!
First, you are right that you need to convert TPS to numeric. This can be done with as.numeric
but preferably you would specify the dec
argument to read commas as decimal.
You would probably want something like this.
fread(path, dec = ",")
Second and third, you are not specifying your X axis in your ggplot function. So how should the violin plots be separated? And you don't need the $
notation in a ggplot call as tidyverse uses data masking. So the following should be enough.
ggplot(data = DD_violin_subset, aes(x = enzyme, y = TPS)) + geom_violin()