Standard Deviation from multiple files in bash

I wish to calculate the standard deviation from a range of files titled "res_NUMBER.cs" which are formatted as a CSV. Example data includes

1,M,CA,54.9130  
1,M,CA,54.9531  
1,M,CA,54.8845  
1,M,CA,54.7517  
1,M,CA,54.8425  
1,M,CA,55.2648  
1,M,CA,55.0876

I have calculated the mean using

#!/bin/bash


files=`ls res*.cs`  
for f in $files; do 
        echo "$f" 
        echo " " 
        #Count number of lines N 
        lines=`cat $f | wc -l` 
        #Sum Total 
        sum=`cat $f | awk -F "," '{print $4}' | paste -sd+ | bc` 
        #Mean 
        mean=`echo "scale=5 ; $sum / $lines" | bc` 
        echo "$mean" 
        echo " "

I would like to calculate the standard deviation across each file. I understand that the standard deviation formula is

S.D=sqrt((1/N)*(sum of (value - mean)^2))

But I am unsure how I would implement this into my script.

Solution

awk is powerful enough to calculate the mean of one file easily

$ awk -F, '{sum+=$4} END{print sum/NR}' file

to add standard deviation (not that your formula is for population, not for sample, that's what I replicate here)

 $ awk -F, '{sum+=$4; ss+=$4^2} END{print m=sum/NR,sqrt(ss/NR-m^2)}' file
 54.9567 0.15778

this uses the fact that stddev = sqrt(Var(x)) = sqrt( E(x^2) - E(x)^2 ) which has worse numerical accuracy (since squaring the values instead of diff) but works fine if your values have low bounds.

The simplest is then using this in a for loop for the files

for f in res*.cs
do 
    awk -F, '{sum+=$4; ss+=$4^2} 
         END {print FILENAME; 
              print "mean:", m=sum/NR, "stddev:", sqrt(ss/NR-m^2)}' "$f"
end

to run res1.cs .. res37.cs in that order, easiest is change the for loop

for f in res{1..37}.cs
# the rest of the code not changed.

which will expand in the numerical order specified.