Training Manual for Data Analysis using SAS, Sujai Das [best management books of all time TXT] 📗
- Author: Sujai Das
Book online «Training Manual for Data Analysis using SAS, Sujai Das [best management books of all time TXT] 📗». Author Sujai Das
MODEL yield = a b a*b / DDFM=KR;
/* DDFM specifies the method for computing the denominator degrees of freedom for the tests of fixed effects resulting from the MODEL*/
RANDOM year rep(year) year*a year*rep*a year*a*b; LSMEANS a b a*b / PDIFF;
STORE spd1;
run;
PROC PLM SOURCE = SPD1; LSMEANS a b a*b/pdiff lines; RUN;
Exercise 3.10: An agricultural field experiment was conducted in 9 treatments using 36 plots arranged in 4 complete blocks and a sample of harvested output from all the 36 plots are to be analysed blockwise by three technicians using three different operations. The data collected is given below:
1. Perform the analysis of the data considering that technicians and operations are crossed with each other and nested in the blocking factor.
2. Perform the analysis by considering the effects of technicians as negligible.
3. Perform the analysis by ignoring the effects of the operations and technicians.
Procedure:
Prepare the data file.
DATA Name;
INPUT BLK TECH OPER TRT OBS; Cards;
. . . .
. . . .
. . . .
;
Perform analysis of objective 1 using PROC GLM. The statements are as follows: Proc glm;
Class blk tech oper trt;
Model obs= blk tech (blk) oper(blk) trt/ss2; Lsmeans trt oper(blk)/pdiff;
Run;
Perform analysis of objective 2 using PROC GLM with the additional statements as follows: Proc glm;
Class blk tech oper trt;
Model obs= blk oper(blk) trt/ss2;
run;
Perform analysis of objective 3 using PROC GLM with the additional statements as follows: Proc glm;
Class blk tech oper trt; Model obs = blk trt/ss2; run;
Exercise 3.11: A greenhouse experiment on tobacco mossaic virus was conducted. The experimental unit was a single leaf. Individual plants were found to be contributing significantly to error and hence were taken as one source causing heterogeneity in the experimental material. The position of the leaf within plants was also found to be contributing significantly to the error. Therefore, the three positions of the leaves viz. top, middle and bottom were identified as levels of second factor causing heterogeneity. 7 solutions were applied to leaves of 7 plants and number of lesions produced per leaf was counted. Analyze the data of this experiment.
The figures at the intersections of the plants and leaf position are the solution numbers and the figures in the parenthesis are number of lesions produced per leaf.
Procedure:
Prepare the data file. DATA Name;
INPUT plant posi $ trt count; Cards;
. . . .
. . . .
. . . .
;
Perform analysis using PROC GLM. The statements are as follows:
Proc glm;
Class plant posi trt count;
Model count= plant posi trt/ss2; Lsmeans trt/pdiff; Run;
Exercise 3.12: The following data was collected through a pilot sample survey on Hybrid Jowar crop on yield and biometrical characters. The biometrical characters were average Plant Population (PP), average Plant Height (PH), average Number of Green Leaves (NGL) and Yield (kg/plot).
1. Obtain correlation coefficient between each pair of the variables PP, PH, NGL and yield.
2. Fit a multiple linear regression equation by taking yield as dependent variable and biometrical characters as explanatory variables. Print the matrices used in the regression computations.
3. Test the significance of the regression coefficients and also equality of regression coefficients of a) PP and PH b) PH and NGL
4. Obtain the predicted values corresponding to each observation in the data set.
5. Identify the outliers in the data set.
6. Check for the linear relationship among the biometrical characters.
7. Fit the model without intercept.
8. Perform principal component analysis.
25
88.44
0.9800
5.00
4.080
26
99.55
0.6450
9.60
2.830
27
63.99
0.6350
5.60
2.570
28
101.77
0.2900
8.20
7.420
29
138.66
0.7200
9.90
2.620
30
90.22
0.6300
8.40
2.000
31
76.92
1.2500
7.30
1.990
32
126.22
0.5800
6.90
1.360
33
80.36
0.6050
6.80
0.680
34
150.23
1.1900
8.80
5.360
35
56.50
0.3550
9.70
2.120
36
136.00
0.5900
10.20
4.160
37
144.50
0.6100
9.80
3.120
38
157.33
0.6050
8.80
2.070
39
91.99
0.3800
7.70
1.170
40
121.50
0.5500
7.70
3.620
41
64.50
0.3200
5.70
0.670
42
116.00
0.4550
6.80
3.050
43
77.50
0.7200
11.80
1.700
44
70.43
0.6250
10.00
1.550
45
133.77
0.5350
9.30
3.280
46
89.99
0.4900
9.80
2.690
Procedure: Prepare a data file Data mlr;
Input PP PH NGL Yield; Cards;
. . . .
. . . .
;
For obtaining correlation coefficient, Use PROC CORR; Proc Corr;
Var PP PH NGL Yield;
run;
For fitting of multiple linear regression equation, use PROC REG
Proc Reg;
Model Yield = PP PH NGL/ p r influence vif collin xpx i; Test 1: Test PP =0; Test 2: Test PH=0;
Test 3: Test NGL=0;
Test 4: Test PP-PH=0; Test 4a: Test PP=PH=0; Test 5: Test PH-NGL=0; Test 5a: Test PH=NGL=0;
Model Yield = PP PH NGL/noint;
run;
Proc reg;
Model Yield = PP PH NGL; Restrict intercept =0;
Run;
For diagnostic plots
Proc Reg plots(unpack)=diagnostics; Model Yield = PP PH NGL;
run;
For variable selection, one can use the following option in model statement:
Selection=stepwise sls=0.10;
For performing principal component analysis, use the following: PROC PRINCOMP;
VAR PP PH NGL YIELD;
run;
Example 3.13: An experiment was conducted at Division of Agricultural Engineering, IARI, New Delhi for studying the capacity of a grader in number of hours when used with three different speeds and two processor settings. The experiment was conducted using a factorial completely randomised design in 3 replications. The treatment combinations and data obtained on capacity of grader in hours given as below:
3
2
1
4
2265
3
2
2
5
2280
3
2
3
6
2278
3
3
1
7
3040
3
3
2
8
3028
3
3
3
9
3040
Experimenter was interested in identifying the best combination of speed and processor setting that gives maximum capacity of the grader in hours.
Solution: This data can be analysed as per procedure of factorial CRD and one can use the following SAS steps for performing the nalysis:
Data ex1a;
Input rep speed proset cgrader;
/*here rep: replication; proset: processor setting and cgrader: capacity of the grader in hours*/ Cards;
1 1 1 1852
1 1 2 1848
1 1 3 1855
. . . .
. . . .
. . . .
3 3 1 3040
3 3 2 3028
3 3 3 3040
;
Proc glm data=ex1; Class speed prost;
Model cgrader=speed post speed*post;
Lsmeans speed post speed*post/pdiff adjust=tukey lines; Run;
The above analysis would identify test the significance of main effects of speed and processor setting and their interaction. Through this analysis one can also identify the speed level (averaged over processor setting) {Processor Setting (averaged over speed levels)} at which the capacity of the grader is maximum. The multiple comparisons between means of combinations of speed and processor setting would help in identifying the combination at which capacity of the grader is maximum.
Exercise 3.14: An experiment was conducted with five levels of each of the four fertilizer treatments nitrogen, Phosphorus, Potassium and Zinc. The levels of each of the four factors and yield obtained are as given below. Fit a second order response surface design using the original data. Test the lack of fit of the model. Compute the ridge of maximum and minimum responses. Obtain predicted residual Sum of squares.
40
30
25
20
11.28
40
30
25
60
8.44
40
30
75
20
13.29
40
90
25
20
7.71
120
30
25
20
8.94
40
30
75
60
10.9
40
90
25
60
11.85
120
30
25
60
11.03
120
30
75
20
8.26
120
90
25
20
7.87
40
90
75
20
12.08
40
90
75
60
11.06
120
30
75
60
7.98
120
90
75
60
10.43
120
90
75
20
9.78
120
90
75
60
12.59
160
60
50
40
8.57
0
60
50
40
9.38
80
120
50
40
9.47
80
0
50
40
7.71
80
60
100
40
8.89
80
60
0
40
9.18
80
60
50
80
10.79
80
60
50
0
8.11
80
60
50
40
10.14
80
60
50
40
10.22
80
60
50
40
10.53
80
60
50
40
9.5
80
60
50
40
11.53
80
60
50
40
11.02
Procedure:
Prepare a data file.
/* yield at different levels of several factors */
title 'yield with factors N P K Zn';
data dose;
input n p k Zn y ; label y = "yield" ;
cards;
. . . . .
. . . . .
. . . . .
;
*Use PROC RSREG.
ods graphics on;
proc rsreg data=dose plots(unpack)=surface(3d);
model y= n p k Zn/ nocode lackfit press;
run;
ods graphics off; *If we do not want surface plots, then we may proc rsreg;
model y= n p k Zn/ nocode lackfit press; Ridge min max;
run;
Exercise 3.15: Fit a second order response surface design to the following data. Take replications as covariate.
Procedure:
Prepare a data file.
/* yield at different levels of several factors */
title 'yield with factors x1 x2';
data respcov;
input fert1 fert2 x1 x2 yield ;
cards;
. . . . .
. . . . .
. . . . .
;
/*Use PROC RSREG.*/ ODS Graphics on;
proc rsreg plots(unpack)=surface(3d);
model yield = rep fert1 fert2/ covar=1 nocode lackfit ; Ridge min max;
run;
ods graphics off;
Exercise 3.16: Following data is related to the length(in cm) of the ear-head of a wheat variety
9.3, 18.8, 10.7, 11.5, 8.2, 9.7, 10.3, 8.6, 11.3, 10.7, 11.2, 9.0, 9.8, 9.3, 10.3, 10, 10.1 9.6, 10.4. Test the data that the median length of ear-head is 9.9 cm.
Procedure:
This may be tested using any of the three tests for location available in Proc Univariate viz. Student’s test, the sign test, and the Wilcoxon signed rank test. All three tests produce a test statistic for the null hypothesis that the mean or median is equal to a given value 0 against the
two-sided alternative that the mean or median is not equal to 0. By default, PROC UNIVARIATE sets the value of 0 to zero. You can use the MU0= option in the PROC UNIVARIATE statement to specify the value of 0. If the data is from a normal population, then we can infer using t-test otherwise non-parametric tests sign test, and the Wilcoxon signed rank test may be used for drawing inferences.
Procedure: data npsign; input length; cards;
9.3
18.8
10.7
11.5
8.2
9.7
10.3
8.6
11.3
10.7
11.2
9.0
9.8
9.3
10.3
10.0
10.1
9.6
10.4
;
PROC UNIVARIATE DATA=npsign MU0=9.9; VAR length;
HISTOGRAM / NOPLOT ;
RUN;
QUIT;
Exercise 3.17: An experiment was conducted with 21 animals to determine if the four different feeds have the same distribution of Weight gains on experimental animals. The feeds 1, 3 and 4 were given to 5 randomly selected animals and feed 2 was given to 6 randomly selected animals. The data obtained is presented in the following table.
Procedure:
data np;
input feed wt;
datalines;
1
3.35
1
3.80
1
3.55
1
3.36
1
3.81
2
3.79
2
4.10
2
4.11
2
3.95
2
4.25
2
4.40
3
4.00
3
4.50
3
4.51
3
4.75
3
5.00
4
3.57
4
3.82
4
4.09
4
3.96
4
3.82
;
PROC NPAR1WAY DATA=np WILCOXON; /*for performing Kruskal-Walis test*/;
VAR wt; CLASS feed; RUN;
Example 3.18: Finney (1971) gave a data representing the effect of a series of doses of carotene (an insecticide) when sprayed on Macrosiphoniella sanborni (some obscure insects). The Table below contains the concentration, the number of insects tested at each dose, the proportion dying and the probit transformation (probit+5) of each of the observed proportions.
Concentratio n (mg/1)
No. of insects (n)
No. of affected (r)
%kill (P)
Log concentration (x)
Empirical probit
10.2
50
44
88
1.01
6.18
7.7
49
42
86
0.89
6.08
5.1
46
24
52
0.71
5.05
3.8
48
16
33
0.58
4.56
2.6
50
6
12
0.41
3.82
0
49
0
0
-
-
Perform the probit analysis on the above data.
Procedure data probit; input con n r; datalines;
10.2 50 44
7.7 49 42
5.1 46 24
3.8 48 16
2.6 50 6
0 49 0
;
ods html;
Proc Probit log10 ;
Model r/n=con/lackfit inversecl; title ('output of probit analysis'); run;
ods html close;
Model Information
Data Set WORK.PROBIT Events Variable r Trials Variable n Number of Observations 5
Number of Events 132
Number of Trials 243
Name of Distribution Normal
Log Likelihood -120.0516414
Number of Observations Read
6
Number of Observations Used
5
Number of Events
132
Number of Trials
243
Algorithm converged.
Goodness-of-Fit Tests
Statistic
Value
DF
Pr > ChiSq
Pearson Chi-Square
1.7289
3
0.6305
L.R. Chi-Square
1.7390
3
0.6283
Response-Covariate Profile
Response Levels 2
Number of Covariate Values 5
Since the chi-square is small (p > 0.1000), fiducial limits will be calculated using a t value of 1.96
Type III Analysis of Effects
Wald
Effect DF
Chi-Square Pr > ChiSq
Log10(con) 1 77.5920 <.0001
Analysis of Parameter Estimates
Parameter
DF
Estimate
Standard
Error
95% Confidence
Limits
Chi-Square
Pr > ChiSq
Intercept
1
-2.8875
0.3501
-3.5737 -2.2012
68.01
<.0001
Log10(con)
1
4.2132
0.4783
3.2757 5.1507
77.59
<.0001
Probit Model in Terms of
Tolerance Distribution
MU SIGMA
0.68533786 0.23734947
Estimated Covariance Matrix for
Tolerance Parameters
MU SIGMA
MU
0.000488
-0.000063
SIGMA
-0.000063
0.000726
Probit Analysis on Log10(con) Probability Log10(con) 95% Fiducial Limits
0.01
0.13318
-0.03783
0.24452
0.02
0.19788
0.04453
0.29830
0.03
0.23893
0.09668
0.33253
0.04
0.26981
0.13584
0.35834
0.05
0.29493
0.16764
0.37940
0.06
0.31631
0.19466
0.39737
0.07
0.33506
0.21832
0.41316
0.08
0.35184
0.23946
0.42733
0.09
0.36711
0.25866
0.44026
0.10
0.38116
0.27631
0.45218
0.15
0.43934
0.34898
0.50192
0.20
0.48558
0.40618
0.54202
0.25
0.52525
0.45467
0.57700
0.30
0.56087
0.49759
0.60904
0.35
0.59388
0.53666
0.63942
0.40
0.62521
0.57295
0.66905
0.45
0.65551
0.60716
0.69861
0.50
0.68534
0.63983
0.72870
0.55
0.71516
0.67142
0.75986
0.60
0.74547
0.70240
0.79265
0.65
0.77679
0.73330
0.82766
0.70
0.80980
0.76480
0.86563
0.75
0.84543
0.79777
0.90761
0.80
0.88510
0.83352
0.95533
0.85
0.93133
0.87427
1.01188
0.90
0.98951
0.92456
1.08401
0.91
1.00357
0.93658
1.10155
0.92
1.01883
0.94960
1.12065
0.93
1.03562
0.96387
1.14170
0.94
1.05436
0.97976
1.16526
0.95
1.07574
0.99783
1.19218
0.96
1.10086
1.01898
1.22388
0.97
1.13174
1.04490
1.26294
0.98
1.17279
1.07924
1.31498
0.99
1.23750
1.13315
1.39721
Probit Analysis on con
Probability con 95% Fiducial Limits
0.01
1.35888
0.91657
1.75599
0.02
1.57718
1.10799
1.98745
0.03
1.73353
1.24935
2.15043
0.04
1.86129
1.36724
2.28215
0.05
1.97212
1.47110
2.39553
0.06
2.07163
1.56554
2.49671
0.07
2.16302
1.65317
2.58917
0.08
2.24825
1.73565
2.67506
0.09
2.32868
1.81410
2.75586
0.10
2.40526
1.88932
2.83257
0.15
2.75005
2.23349
3.17629
0.20
3.05900
2.54788
3.48353
0.25
3.35157
2.84884
3.77571
0.30
3.63808
3.14478
4.06477
0.35
3.92538
3.44084
4.35935
0.40
4.21897
3.74068
4.66710
0.45
4.52389
4.04724
4.99582
0.50
4.84549
4.36343
5.35423
0.55
5.18995
4.69265
5.75260
0.60
5.56506
5.03963
6.20374
0.65
5.98127
5.41132
6.72450
0.70
6.45363
5.81830
7.33883
0.75
7.00531
6.27722
8.08377
0.80
7.67532
6.81590
9.02252
0.85
8.53758
7.48633
10.27723
0.90
9.76143
8.40534
12.13411
0.91
10.08243
8.64132
12.63428
0.92
10.44313
8.90434
13.20233
0.93
10.85466
9.20181
13.85792
0.94
11.33346
9.54469
14.63036
0.95
11.90537
9.95006
15.56609
0.96
12.61427
10.44674
16.74479
0.97
13.54388
11.08927
18.32046
0.98
14.88655
12.00168
20.65263
0.99
17.27807
13.58779
24.95808
Interpretation: The goodness-of-fit tests (p-values = 0.6305, 0.6283) suggest that the distribution and the model fits the data adequately. In this case, the fitting is done on normal equivalent deviate only without adding 5. Therefore, log LD50 or lof ED50 corresponds to the value of Probit=0. Log LD50 is obtained as 0.685338. Therefore, the stress level at which the
50% of the insects will be killed is (100.685338=4.845 mg/l). Similarly the stress level at which
65% of the insects will be killed is (100.776793 = 5.981 mg/l). Although both values are given in the table above.
4. Discussion
We have initiated a link “Analysis of Data” at Design Resources Server (www.iasri.res.in/design) to provide steps of analysis of data generated from designed experiments by using statistical packages like SAS, SPSS, MINITAB, and SYSTAT, MS-
EXCEL etc. For details and live examples one may refer to the link Analysis of data at http://www.iasri.res.in/design/Analysis%20of%20data/Analysis%20of%20Data.html.
How to see SAS/STAT Examples?
One can learn from the examples available at http://support.sas.com/rnd/app/examples/STATexamples.html
How to use HELP?
Help SAS help and Documentation Contents Learning to use SAS Sample SAS Programs SAS/STAT …
5. Strengthening Statistical Computing for NARS
NAIP Consortium on Strengthening Statistical Computing for NARS (www.iasri.res.in/sscnars)
targets
Comments (0)