The following occurred:
I set up my work space; read the .csv; added some subsets; did a few t.tests in form of t.test(HtoC2/C2.dur, s1)
and everything went just fine, until a few t.tests later I suddenly received the following error message:
Fehler in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my)))
stop("data are essentially constant") :
Fehlender Wert, wo TRUE/FALSE nötig ist
Zusätzlich: Warnmeldungen:
1: In mean.default(y) : argument is not numeric or logical: returning NA
2: In var(y) : NAs durch Umwandlung erzeugt
Ever since no other t.test (of the kind mentioned above) will work and neither do those t.tests which worked perfectly fine before. I always receive the same error message.
I had a look on similar problems but found no working solution, hence I am writing here. Of course I tried re-doing my first steps just in case I did some commands by accident, but this did not help either. Also, I tried using a similar set of data with identical columns for which t.tests had also worked before, but I received the same error.
Background information:
portion of my data:
sprecher ident tier1.label testwort L.zeit H.zeit C1.start C1.ende V1.start V1.ende C2.start C2.ende V2.start V2.ende C1.dur V1.dur C2.dur V2.dur LtoC1 LtoV1 HtoV1 HtoC2
1 s1 1 ma:mi_01 ma:mi 23912.0 24108.2 23827.4 23937.5 23937.5 24064.5 24064.5 24148.0 24148.0 24214.6 110.1 127.0 83.5 66.6 84.6 -25.5 170.7 43.7
2 s1 1 mami_01 mami 26755.0 26958.8 26700.0 26800.2 26800.2 26887.4 26887.4 26957.1 26957.1 27035.5 100.2 87.2 69.7 78.4 55.0 -45.2 158.6 71.4
3 s1 2 ma:mi_02 ma:mi 33237.6 33451.4 33179.6 33282.1 33282.1 33395.8 33395.8 33473.2 33473.2 33562.0 102.5 113.7 77.4 88.8 58.0 -44.5 169.3 55.6
4 s1 3 ma:mi_03 ma:mi 39100.7 39315.5 39057.8 39162.3 39162.3 39290.1 39290.1 39363.1 39363.1 39441.0 104.5 127.8 73.0 77.9 42.9 -61.6 153.2 25.4
5 s1 2 mami_02 mami 41881.7 42099.5 41825.6 41936.8 41936.8 42028.3 42028.3 42101.4 42101.4 42180.1 111.2 91.5 73.1 78.7 56.1 -55.1 162.7 71.2
6 s1 4 ma:mi_04 ma:mi 44801.2 45028.8 44753.5 44860.2 44860.2 44990.9 44990.9 45070.6 45070.6 45131.3 106.7 130.7 79.7 60.7 47.7 -59.0 168.6 37.9
According to sapply(mode)
and sapply(length)
all columns are numeric and each "sprecher" (s1 - s5) consists of 30 lines, resulting in a total of 150.
Edit1: Forgot to mention how I defined my subsets:
s1 = subset(daten,sprecher=="s1")
s1.mahmi = subset(s1,testwort=="ma:mi")
s1.mammi = subset(s1,testwort=="mami")
s2 = subset(daten,sprecher=="s2")
s2.mahmi = subset(s2,testwort=="ma:mi")
s2.mammi = subset(s2,testwort=="mami")
s3 = subset(daten,sprecher=="s3")
s3.mahmi = subset(s3,testwort=="ma:mi")
s3.mammi = subset(s3,testwort=="mami")
s4 = subset(daten,sprecher=="s4")
s4.mahmi = subset(s4,testwort=="ma:mi")
s4.mammi = subset(s4,testwort=="mami")
s5 = subset(daten,sprecher=="s5")
s5.mahmi = subset(s5,testwort=="ma:mi")
s5.mammi = subset(s5,testwort=="mami")
This should work for the subset with sprecher == "s1"
(and presuming you want the default t.test
options):
t.test(HtoC2/C2.dur ~ testwort, subset(my_data, sprecher == "s1"))
If you just wanted the p value for each subset, you could do:
sapply(levels(factor(my_data$sprecher)), function(lev) {
t.test(HtoC2/C2.dur ~ testwort, my_data[my_data$sprecher == lev, ])$p.value
})