Here's my dataset
Column 1: Lipid level
Column 2: Age
Column 3: Fat content category
Column 4: Gender (1=male)
0.73 1 1 1
0.67 1 2 1
0.15 1 3 1
0.86 2 1 1
0.67 2 2 1
0.15 2 3 1
0.94 3 1 1
0.81 3 2 1
0.26 3 3 1
0.23 4 1 2
1.40 4 1 1
1.32 4 2 1
0.15 4 3 1
1.62 5 1 1
1.41 5 2 1
0.78 5 3 1
9.78 5 1 1
Here's a few different analysis I'm running with this code but I'm not so sure why SAS is not compiling.
Before doing anything else, I set up a permanent library manually.
libname di ‘c:\diet’;
data di.HW3 Data;
infile hw3 data.sas;
input Lipidlevel Age Fatcontent Gender;
run;
Next, I want to produce a plot using ODS of Lipid Level by Fat Content Category for each Age Group.
ods graphics on;
proc sgplot data=newdiet;
var=Age;
scatter Age/Lipidlevel
ods graphics off;
To make it more clear how would I make one line for each Age Group connecting the 3 data point, color each line with a different color and represent each data point by a star, and make a legend below the X axis and create a label for the graph? (I thought this last part came automatically)
Now I want to produce two different sets of summary statistics using ODS. a) Shows the mean, median, sample size and standard deviation of the Lipid Level for each Age Group.
Proc means data=newdiet;
var Lipidlevel;
run;
b) reports the sample size, mean and standard deviation of the Lipid Level for each Fat Content Category.
Proc means data=newdiet;
var Lipidlevel;
run;
Lastly, can anyone give me some advice on stratifying the data as follows? I want to create labels and formats for both the Age Group and Fat Content Category variables. The age groups are coded 1 to 5 and correspond to: 15-24; 25-34; 35-44; 45-54; 55-64. The fat content categories are coded 1 to 3 and correspond to: extremely low; fairly low; moderately low. I have no idea how to do this. The only way I can think of is to go into the original dataset and sort them out manually.
There are a few questions in here.
Question in comments on the plot.
PROC Means -- use the class statement. You can get what you want in 1 statement;
proc means data=newdiet mean std median;
class age Fatcontent;
ways 1 ;
var Lipidlevel;
run;
CLASS
tells the Procedure how to group the data. The WAYS 1
says to calculate separately. Otherwise it assumes this is a drill through path.
This produces:
Analysis Variable : Lipidlevel
N
Fatcontent Obs Mean Std Dev Median
1 7 2.2228571 3.3628892 0.9400000
2 5 0.9760000 0.3610817 0.8100000
3 5 0.2980000 0.2736238 0.1500000
Analysis Variable : Lipidlevel
N
Age Obs Mean Std Dev Median
1 3 0.5166667 0.3189566 0.6700000
2 3 0.5600000 0.3675595 0.6700000
3 3 0.6700000 0.3609709 0.8100000
4 4 0.7750000 0.6770771 0.7750000
5 4 3.3975000 4.2699444 1.5150000