Search code examples
rstatisticsaggregatesummarize

R, analyzing a data set with a large parameter space and replicates


I've run experiments whereby I use a parameter combination, collect the average forces and torques (in the x,y, and z directions). I do four replicates for each parameter combo, and I have 432 parameter combinations in total.

The actual dataset is a bit too big to include here, so I've made a subset for testing purposes and uploaded it to dropbox, along with the relevant R script.

Here is a heavily parsed version:

> data2[1:20,1:8]
# A tibble: 20 x 8
   `Foil Color` `Flow Speed (rpm)` `Frequency (Hz)` StepTime Maxpress Minpress `Minpress Percentage`      FxMean
         <fctr>             <fctr>           <fctr>   <fctr>   <fctr>    <int>            <fctr>       <dbl>
 1        Black                  0             0.25      250       50        0                 0 0.014537062
 2        Black                  0             0.25      250       50        0                 0 0.014870256
 3        Black                  0             0.25      250       50        0                 0 0.013180870
 4        Black                  0             0.25      250       50        0                 0 0.013448804
 5        Black                  0             0.25      250       50        3              0.05 0.012996979
 6        Black                  0             0.25      250       50        3              0.05 0.012115166
 7        Black                  0             0.25      250       50        3              0.05 0.012427347
 8        Black                  0             0.25      250       50        3              0.05 0.012561253
 9        Black                  0             0.25      250       50        5               0.1 0.012480644
10        Black                  0             0.25      250       50        5               0.1 0.011603403
11        Black                  0             0.25      250       50        5               0.1 0.011427116
12        Black                  0             0.25      250       50        5               0.1 0.011545803
13        Black                  0             0.25      250       50       13              0.25 0.009891865
14        Black                  0             0.25      250       50       13              0.25 0.008465604
15        Black                  0             0.25      250       50       13              0.25 0.009089619
16        Black                  0             0.25      250       50       13              0.25 0.008560160
17        Black                  0             0.25      250       75        0                 0 0.025101186
18        Black                  0             0.25      250       75        0                 0 0.023611920
19        Black                  0             0.25      250       75        0                 0 0.026276007
20        Black                  0             0.25      250       75        0                 0 0.026593895

I am trying to group the data by the parameter combinations and calculate the average FxMean, sd, and se, for that group of 4 replicates.

I have tried to follow tutorials and other examples where people try to summarize the data (example), but it doesn't work for me. It normally spits out an array that looks nothing like what I need.

For example:

fx_data2 <- ddply(data_csv, c(data_csv$`Frequency (Hz)`,data_csv$`Flow Speed (rpm)`, data_csv$StepTime, data_csv$Maxpress, data_csv$`Minpress Percentage`), summarise,
N    = length(data_csv$FxMean),
mean = mean(data_csv$FxMean),
sd   = sd(data_csv$FxMean),
se   = sd / sqrt(N)

)

fx_data3 <- summaryBy(FxMean ~freq + foilColor+maxP+minPP, data=data_csv, FUN=c(length, mean, sd))

fx_data2 looks just...abyssmal.

head(fx_data2)
....
Foil Color.2530 Foil Color.2531 Foil Length.2512 Foil Length.2513 Foil 
Length.2514 Foil Length.2515 Flow Speed (rpm).2544 Flow Speed (rpm).2545
Flow Speed (rpm).2546 Flow Speed (rpm).2547 Frequency (Hz).800 Frequency 
(Hz).801 Frequency (Hz).802 Frequency (Hz).803 Foil Color.2532 Foil Color.2533
Foil Color.2534 Foil Color.2535 Foil Length.2516 Foil Length.2517 Foil 
Length.2518 Foil Length.2519 Flow Speed (rpm).2548 Flow Speed (rpm).2549
Flow Speed (rpm).2550 Flow Speed (rpm).2551 Frequency (Hz).804 Frequency 
(Hz).805 Frequency (Hz).806 Frequency (Hz).807 Foil Color.2536 Foil Color.2537

I mean. I have no idea what's going on with that. The dimensions are 24x8724. Just...what.

and fx_data3 looks like this:

> fx_data3
  FxMean.length FxMean.mean  FxMean.sd
 1          1744  0.01379712 0.01423244
> 

Ideally, these would look like the original data set, but each parameter combination is compressed to a single line, and the values on the far right would be the mean, sd, and se for the FxMean, FxStDev, etc. for the four replicates.

I've been struggling with this for a few days. I'd greatly appreciate some help.

Thank you, Zane


Solution

  • url <- "https://www.dropbox.com/sh/vhf39uz4pol7sgl/AAAJ9Fr6OTEIgb_ZeSno-X5ea?dl=1"
    download.file(url, destfile = "from-SO-via-dropbox")
    unzip("from-SO-via-dropbox")
    df <- readr::read_csv("Data_subset.csv")
    
    library(dplyr)
    
    df %>% 
      group_by(`Frequency (Hz)`, `Foil Color`, StepTime, Maxpress, `Minpress Percentage`) %>% 
      summarise_at(vars(FxMean), funs(N = length, mean, sd, se = sd(.) / sqrt(N)))
    
    # # A tibble: 13 x 9
    # # Groups:   Frequency (Hz), Foil Color, StepTime, Maxpress [?]
    #    `Frequency (Hz)` `Foil Color` StepTime Maxpress `Minpress Percentage`     N        mean           sd           se
    #               <dbl>        <chr>    <int>    <int>                 <dbl> <int>       <dbl>        <dbl>        <dbl>
    #  1             0.25        Black      250       50                  0.00     4 0.014009248 0.0008206156 0.0004103078
    #  2             0.25        Black      250       50                  0.05     4 0.012525186 0.0003658681 0.0001829340
    #  3             0.25        Black      250       50                  0.10     4 0.011764241 0.0004832082 0.0002416041
    #  4             0.25        Black      250       50                  0.25     4 0.009001812 0.0006538297 0.0003269149
    #  5             0.25        Black      250       75                  0.00     4 0.025395752 0.0013514463 0.0006757231
    #  6             0.25        Black      250       75                  0.05     4 0.020794212 0.0028703242 0.0014351621
    #  7             0.25        Black      250       75                  0.10     4 0.018409500 0.0037305138 0.0018652569
    #  8             0.25        Black      250       75                  0.25     4 0.016193536 0.0016200530 0.0008100265
    #  9             0.25        Black      250      100                  0.00     4 0.035485324 0.0052513208 0.0026256604
    # 10             0.25        Black      250      100                  0.05     4 0.050097709 0.0024123653 0.0012061827
    # 11             0.25        Black      250      100                  0.10     4 0.051378181 0.0049857712 0.0024928856
    # 12             0.25        Black      250      100                  0.25     4 0.039374874 0.0031421884 0.0015710942
    # 13             0.50        Black      250       50                  0.00     2 0.014778494 0.0004683882 0.0003312005