Search code examples
rggplot2date-comparison

Finding the trend in disease conditions in R


I have a dataset in which I have several patients, their disease activity status and abundance of specific bacteria as below:

**Patient** **DiseaseActivity** **Bacteria**
15  Severe  0.6704
15  Quiescent   0.0350
24  Quiescent   0.0137
24  Quiescent   0.0088
26  Quiescent   0.0023
26  Severe  0.0410
33  Quiescent   0.2031
33  Quiescent   0.0893
37  Quiescent   0.0345
37  Quiescent   0.0031
52  Quiescent   0.0601
52  Severe  0.0200
53  Severe  0.0050
53  Severe  0.2724
69  Severe  0.9369
69  Quiescent   0.0008
2   Severe  0.0421
2   Quiescent   0.0120
12  Severe  0.3109
12  Severe  0.0646
40  Quiescent   0.8048
40  Severe  0.9113
51  Severe  0.1918
51  Severe  0.9538

Each patient has two samples obtained in 2 different time points. When I plot one by one, I can see that when disease severity goes from Quiescent to Severe, the abundance of Bacteria increases or disease severity goes from Severe to Quiescent, the abundance of Bacteria reduces even though only 6 patients fits into the this type of category.

My question is how can I check whether this is really the case at least for those 6 patients or what type of test I need to do for this type of dataset? And if I want to plot this data, what would be the most accurate way to plot the data?

Thank you very much in advance.


Solution

  • I don't know about 'most accurate', and I can't help you with what test to use, that depends on your audience as well as your data. But here's one possible plot?

    change.df <- data.df%>%group_by(Patient)%>%summarize(status.change=paste(DiseaseActivity,collapse=""),bacteria.change=Bacteria[2]-Bacteria[1])
    ggplot(change.df,aes(x=bacteria.change,y=status.change,color=status.change))+geom_point(size=5)+theme_bw()
    

    This is assuming that every patient has two time points and that they're always in the order time1:time2, which is pretty dangerous! Timepoint should really be recorded in its own column.