Search code examples
rstatisticsdata-sciencet-test

T tests in R- unable to run together


I have an airline dataset from stat computing which I am trying to analyse.

There are variables DepTime and ArrDelay (Departure Time and Arrival Delay). I am trying to analyse how Arrival Delay is varying with certain chunks of departure time. My objective is to find which time chunks should a person avoid while booking their tickets to avoid arrival delay

My understanding-If a one tailed t test between arrival delays for dep time >1800 and arrival delays for dep time >1900 show a high significance, it means that one should avoid flights between 1800 and 1900. ( Please correct me if I am wrong). I want to run such tests for all departure hours.

**Totally new to programming and Data Science. Any help would be much appreciated.

Data looks like this. The highlighted columns are the ones I am analysing

enter image description here


Solution

  • Sharing an image of the data is not the same as providing the data for us to work with...

    That said I went and grabbed one year of data and worked this up.

    flights <- read.csv("~/Downloads/1995.csv", header=T)
    
    flights <- flights[, c("DepTime", "ArrDelay")]
    flights$Dep <- round(flights$DepTime-30, digits = -2)
    head(flights, n=25)
    
    # This tests each hour of departures against the entire day. 
    # Alternative is set to "less" because we want to know if a given hour
    # has less delay than the day as a whole.
    
    pVsDay <- tapply(flights$ArrDelay, flights$Dep, 
                     function(x) t.test(x, flights$ArrDelay, alternative = "less"))
    
    # This tests each hour of departures against every other hour of the day. 
    # Alternative is set to "less" because we want to know if a given hour
    # has less delay than the other hours.
    pAllvsAll <- tapply(flights$ArrDelay, flights$Dep, 
                               function(x) tapply(flights$ArrDelay, flights$Dep, function (z) 
                                 t.test(x, z, alternative = "less")))
    

    I'll let you figure out multiple hypothesis testing and the like.

    enter image description here

    All vs All

    enter image description here