I would like to create a new column whose values equal the average of values in other columns. But the number of columns I am taking the average of is dictated by a variable. My data look like this, with 'length' dictating the number of columns x1-x5 that I want to average:
data have;
input ID $ length x1 x2 x3 x4 x5;
datalines;
A 5 8 234 79 36 78
B 4 8 26 589 3 54
C 3 19 892 764 89 43
D 5 72 48 65 4 9
;
run;
I would like to end up with the below where 'avg' is the average of the specified columns.
data want;
input ID $ length avg
datalines;
A 5 87
B 4 156.5
C 3 558.3
D 5 39.6
;
run;
Any suggestions? Thanks! Sorry about the awful title, I did my best.
You have to do a little more work since mean(of x[1]-x[length])
is not valid syntax. Instead, save the values to a temporary array and take the mean of it, then reset it at each row. For example:
tmp1 tmp2 tmp3 tmp4 tmp5
8 234 79 36 78
8 26 589 3 .
19 892 764 . .
72 48 65 4 9
data want;
set have;
array x[*] x:;
array tmp[5] _temporary_;
/* Reset the temp array */
call missing(of tmp[*]);
/* Save each value of x to the temp array */
do i = 1 to length;
tmp[i] = x[i];
end;
/* Get the average of the non-missing values in the temp array */
avg = mean(of tmp[*]);
drop i;
run;