Search code examples
rggplot2characternumericviolin-plot

How do I make a violin plots of gene expression data when my y-values need to be converted to numerical values first?


I have a gene expression dataset for many genes per several conditions. Focussing on one condition, ("DD" for DMSO DMSO), I made a file called "DD_violin_subset" with a table of the enzymes and their expression (TPS) like this:

enzyme TPS
A4GALT 0,705748
AACS 39,42209
AADAT 3,619634
AAK1 16,64294
AARS1 566,514

I would like to make a violinplot of this data but it kept giving me errors.

I already tried several things:

>DD_violin_subset <- fread("C:\\Users\\maris\\Desktop\\DD_violin_subset.csv")

> names(DD_violin_subset)
[1] "enzyme" "TPS"   

> str(DD_violin_subset)
Classes ‘data.table’ and 'data.frame':  5 obs. of  2 variables:
 $ enzyme: chr  "A4GALT" "AACS" "AADAT" "AAK1" ...
 $ TPS   : chr  "0,705747614" "39,42209011" "3,619633792" "16,64293604" ...
 - attr(*, ".internal.selfref")=<externalptr> 

As you can see, it reads TPS as a character, even though it should be read as numerical values. I think this is what is causing my errors so I tried to write them as numerical values:

> ggplot(DD_violin_subset, aes(y= as.numeric(TPM)))+geom_violin()
Error in `geom_violin()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'TPM' not found
Run `rlang::last_trace()` to see where the error occurred.

So I thought, since TMP is not found, maybe I should specify the dataset again?


>ggplot(DD_violin_subset, aes(y= as.numeric(DD_violin_subset$TPM)))+geom_violin()
Error in `geom_violin()`:
Problem while computing aesthetics.
Error occurred in the 1st layer.
Caused by error in `check_aesthetics()`:
Aesthetics must be either length 1 or the same
as the data (5)
Fix the following mappings: `y`
Run `rlang::last_trace()` to see where the error occurred.
> ```

But no. TPS keeps on being "not found" and they're still read as characters whist they shoud be numericals. I fear this might be the reason why I can't get my plots, but I might be totally in the wrong. I'm new to R and would really appreciate your help! Anyone any ideas how I could get my violinplots? 

Thanks a lot for you help and effort! 

Solution

  • First, you are right that you need to convert TPS to numeric. This can be done with as.numeric but preferably you would specify the dec argument to read commas as decimal.

    You would probably want something like this. fread(path, dec = ",")

    Second and third, you are not specifying your X axis in your ggplot function. So how should the violin plots be separated? And you don't need the $ notation in a ggplot call as tidyverse uses data masking. So the following should be enough.

    ggplot(data = DD_violin_subset, aes(x = enzyme, y = TPS)) + geom_violin()