Search code examples
rmodelstatisticsrpt

R repeatability model (in rptR), unsure about formula. Outcome =Zero repeatability for several models, with boundary singular fit warning


I am analyzing repeatability between a variety of cognitive tests (and repetitions of those tests). I try to determine the individual repeatability of birds using the rptR package in R. However, regardless of my model or what I'm testing it always results in a warning and R = 0. I am trying to understand what causes this.

I currently have a dataframe which includes: an ID (repeated twice for each individual). Each ID repetition is accompanied by a score for the test in question. These scores are first log-transformed to attain normality and then the Z-scores of these scores are computed so that I can make comparisons between tests measuring the same trait on different scales. However, regardless of how I set up my model, with my data it always results in a repeatability of R=0. While this is technically possible, I find it unlikely for it to be so low for all parameters (as I make comparisons both between different tests as well as the same test measured twice). Moreover, I get a warning with every model I run stating: 'Boundary (singular) fit: see ?isSingular'. From what I've gathered this means that the variance in my data might be too small, though I am not entirely certain about this. And I am worried that this might be causing my R = 0.

A snippet of my dataframe looks as follows: RNR_ID RoundNR TTC TTC_Z Test_date 2 1 1 28 0.0966013973 43423 114 1 2 14 -0.8138678026 43543 5 2 1 48 0.8045891472 43425 122 2 2 31 0.2302959586 43549

An example of two variations of my models: Unadjusted R:

Rep1_Assoc_A <- rpt(TTC_Z ~ RoundNR + (1|RNR_ID), grname = "RNR_ID", data = rpt_Assoc_A_df, datatype = "Gaussian", nboot = 10, npermut = 10)

Adjusted R (In which I control for test date in the hope of accounting for learning of individuals between repetitions of the same test):

Rep2_Assoc_A <- rpt(TTC_Z ~ RoundNR + Test_date + (1|RNR_ID), grname = "RNR_ID", data = rpt_Assoc_A_df, datatype = "Gaussian", nboot = 10, npermut = 10)

Note: RNR_ID, RoundNR & TTC_Z are numerical variables. Test_date is given as Date format, though I am not sure how the model handles this. In this model the RoundNR indicates the "treatment" (as this indicates whether a test was the first or the second time an individual was scored). The TTC_Z indicates the Z-score of an individual.

And the resulting output respectively:

Repeatability estimation using the lmm method 

Repeatability for RNR_ID
R  = 0
SE = 0.107
CI = [0, 0.283]
P  = 1 [LRT]
     1 [Permutation]

Repeatability estimation using the lmm method 

Repeatability for RNR_ID
R  = 0
SE = 0.12
CI = [0, 0.337]
P  = 1 [LRT]
     1 [Permutation]

As stated before, while running this code the console throws several: boundary (singular) fit: see ?isSingular messages at me. I have also tried a fake dataset in which I adjusted all values of repetitions to be nearly identical, which indeed results in a high R (around 0.9..). While this seems to suggest my R=0 might actually be correct, I am still skeptical due to this being not only unexpected, (As I would expect at least a very low but measurable R). But due to my lack of comprehension behind the model I fear something else might be going wrong as well.

To summarize, my questions are:

Q1: Are the current formulas for my models correct? And are the variables in the right data types?

Q2: What does the boundary (singular) fit: see ?isSingular mean in this situation, and can I "fix" it?

Q3: What could be causing my R=0? Am I analyzing my data wrongfully or is my R just really 0?


Solution

  • While not a complete answer quite yet, I at least have partial answers to my questions after talking to some colleagues.

    Q1: Yes and no, the way I set up my formula's is completely fine. However, I included some factors which were (in my case) unnecessary. I initially added RoundNR as a factor to try and correct for learning. However, this doesn't make any sense as I only have two rounds, and thus I believe all my variation would be attributed to this factor. Taking the Z-scores was enough. As for test_date, that might be interesting if this wasn't heavily confounded with my tests themselves. Though more generally speaking (for other people): The broad lines of the model were fine. Just be careful which fixed effects to include.

    Q2: I still am not entirely clear on it's meaning, so if someone else can offer a clearer explanation then that would be appreciated. However, as I understand, this is simply a consequence of my data and not any issue with for example my model.

    Q3: The quite obvious answer: My own data. A colleague ran a quick analysis via another method (Which produces less accurate but quicker repeatability assessments) and also found an R=0 (or at least very very close to it).

    Not a complete answer but I hope it helps others in the future.