Search code examples
bashshellvariablescommand-linecommand-substitution

command substitution in shell script with shell variables within the substitution


Basically the file I'm getting has the first three columns pasted into followed by a column of blanks lines because it looks like nothing is getting appended into column4

I feel like I probably shouldn't be using the variables I created in the command substitution but I'm unsure how I would access these numbers that I need otherwise

#!/bin/sh # the first file in the expression of a bunch of patients to be made into data files that can be put into the graph
awk '{print "hs"$1,"\t",$2,"\t",$3}' $1 > temp1.txt     #important columns saved
numLines=`wc -l $1`     
touch column4.txt       #creates a column for the average of column 6-
for ((s=0;s<$numlines;s++)); do                 
        currentRow=0                            #Will eventually be the average of column 6- for the row of focus
        for ((i=6;i<=106;i++)); do              
                addition=`cut -f $i $1 | head -n $s | tail -n 1`        # cuts out the number at the row and column of focus for this loop
                currentRow=`expr $currentRow + $addition`              # adding the newly extracted number to the total
        done
        currentRow=`expr $currentRow / 101`                            #divides so the number is an average instead of a really big number
        echo $currentRow >> column4.txt                                 #appends this current row into a text file that can be pasted onto the first three columns
done
paste temp1.txt column4.txt
rm temp1.txt column4.txt

if it helps the input file is very large(about 106 columns and and tens of thousands of rows) but here's an example of what it looks like

Important identifier line grant regis 76 83 02 38 0 38 29 38 48 (..up to to 106 columns)
another important identifier bill susan 98 389 20 29 38 20 94 29 0 (.. same point)

And then output would look like (assuming we exclude the columns after ..)

Important identifier line 34.88
another important identifier 79.67

Sorry if something is unclear, tried my best to make it clear, just ask if there's something you're wondering about and I will edit or comment

Thank-you


Solution

  • awk to the rescue!

    you can replace all with this script, using the values in the sample input

    $ awk '{for(i=6;i<=NF;i++) sum+=$i; 
            printf "%s %s %s %.2f\n", $1,$2,$3, sum/(NF-5); 
            sum=0}' file
    
    Important identifier line 39.11
    another important identifier 79.67
    

    for median (odd number of fields) you can do this

    $ awk '{for(i=6;i<=NF;i++) a[i-5]=$i; 
            asort(a); 
            mid=(NF-4)/2; print mid, a[mid]}' file
    
    5 38
    5 29
    

    for even number, the general approach is taking the average of neighboring numbers (can be weighted average by distance too).