Training Manual for Data Analysis using SAS, Sujai Das [best management books of all time TXT] 📗
- Author: Sujai Das
Book online «Training Manual for Data Analysis using SAS, Sujai Das [best management books of all time TXT] 📗». Author Sujai Das
PROC SURVEYREG procedure fits linear models for survey data and computes regression coefficients and their variance-covariance matrix. The procedure allows you to specify classification effects using the same syntax as in the GLM procedure. The procedure also provides hypothesis tests for the model effects, for any specified estimable linear functions of the model parameters, and for custom hypothesis tests for linear combinations of the regression parameters. The procedure also computes the confidence limits of the parameter estimates and their linear estimable functions.
PROC SURVEYLOGISTIC procedure investigates the relationship between discrete responses and a set of explanatory variables for survey data. The procedure fits linear logistic regression models for discrete response survey data by the method of maximum likelihood, incorporating the sample design into the analysis. The SURVEYLOGISTIC procedure enables you to use categorical classification variables (also known as CLASS variables) as explanatory variables in an explanatory model, using the familiar syntax for main effects and interactions employed in the GLM and LOGISTIC procedures.
The SURVEYSELECT procedure provides a variety of methods for selecting probability-based random samples. The procedure can select a simple random sample or a sample according to a complex multistage sample design that includes stratification, clustering, and unequal probabilities of selection. With probability sampling, each unit in the survey population has a known, positive probability of selection. This property of probability sampling avoids selection bias and enables you to use statistical theory to make valid inferences from the sample to the survey population.
PROC SURVEYSELECT provides methods for both equal probability sampling and sampling with probability proportional to size (PPS). In PPS sampling, a unit's selection probability is proportional to its size measure. PPS sampling is often used in cluster sampling, where you select clusters (groups of sampling units) of varying size in the first stage of selection. Available PPS methods include without replacement, with replacement, systematic, and sequential with minimum replacement. The procedure can apply these methods for stratified and replicated sample designs.
3. Exercises
Example 3.1: An experiment was conducted to study the hybrid seed production of bottle gourd (Lagenaria siceraria (Mol) Standl) Cv. Pusa hybrid-3 under open field conditions during Kharif-2005 at Indian Agricultural Research Institute, New Delhi. The main aim of the investigation was to compare natural pollination and hand pollination. The data were collected on 10 randomly selected plants from each of natural pollination and hand pollination on number of fruit set for the period of 45 days, fruit weight (kg), seed yield per plant (g) and seedling length (cm). The data obtained is as given below:
Group
No. of fruit
Fruit weight
Seed yield/plant
Seedling length
1
7.0
1.85
147.70
16.86
1
7.0
1.86
136.86
16.77
1
6.0
1.83
149.97
16.35
1
7.0
1.89
172.33
18.26
1
7.0
1.80
144.46
17.90
1
6.0
1.88
138.30
16.95
1
7.0
1.89
150.58
18.15
1
7.0
1.79
140.99
18.86
1
6.0
1.85
140.57
18.39
1
7.0
1.84
138.33
18.58
2
6.3
2.58
224.26
18.18
2
6.7
2.74
197.50
18.07
2
7.3
2.58
230.34
19.07
2
8.0
2.62
217.05
19.00
2
8.0
2.68
233.84
18.00
2
8.0
2.56
216.52
18.49
2
7.7
2.34
211.93
17.45
2
7.7
2.67
210.37
18.97
2
7.0
2.45
199.87
19.31
2
7.3
2.44
214.30
19.36
{Here 1 denotes natural pollination and 2 denotes the hand pollination}
1. Test whether the mean of the population of Seed yield/plant (g) is 200 or not.
2. Test whether the natural pollination and hand pollination under open field conditions are equally effective or are significantly different.
3. Test whether hand pollination is better alternative in comparison to natural pollination.
Procedure:
For performing analysis, input the data in the following format. {Here Number of fruit (45 days) is termed as nfs45, Fruit weight (kg) is termed as fw, seed yield/plant (g) is termed as syp and Seedling length (cm) is termed as sl. It may, however, be noted that one can retain the same name or can code in any other fashion}.
data ttest1; /*one can enter any other name for data*/
input group nfs45 fw syp sl;
cards;
. . . . .
. . . . .
. . . . .
;
*To answer the question number 1 use the following SAS statements
proc ttest H0=200;
var syp;
run;
*To answer the question number 2 use the following SAS statements;
proc ttest;
class group;
var nfs45 fw syp sl;
run;
To answer the question number 3 one has to perform the one tail t-test. The easiest way to convert a two-tailed test into a one-tailed test is take half of the p-value provided in the output of
2-tailed test output for drawing inferences. The other way is using the options sides in proc
statement. Here we are interested in testing whether hand pollination is better alternative in comparison to natural pollination, therefore, we may use Sides=L as
proc ttest sides=L;
class group;
var nfs45 fw syp sl;
run;
Similarly this option can also be used in one sample test and for right tail test Sides=U is used.
Exercise 3.2: A study was undertaken to find out whether the average grain yield of paddy of farmers using laser levelling is more than the farmers using traditional land levelling methods. For this study data on grain yield in tonne/hectare was collected from 59 farmers (33 using traditional land levelling methods and 26 using new land leveller) and is given as:
Test whether the traditional land levelling and laser levelling give equivalent yields or are significantly different.
Procedure:
For performing analysis, input the data in the following format. {Here traditional land levelling is termed as LL, laser levelling as LL, method of levelling as MLevel and grain yield in t/ha as gyld. It may, however, be noted that one can retain the same name or can code in any other fashion}.
data ttestL; /*one can enter any other name for data*/
input MLevel gyld;
cards;
. . . . .
. . . . .
. . . . .
;
*To answer the question number 1 use the following SAS statements
proc ttest data =ttestL;
var gyld;
run;
Exercise 3.3: The observations obtained from 15 experimental units before and after application of the treatment are the following:
1. Test whether the mean score before application of treatment is 65.
2. Test whether the application of treatments has resulted into some change in the score of the experimental units.
3. Test whether application of treatment has improved the scores.
Procedure:
data ttest;
input sn preapp postapp;
cards;
1 80 82
2 73 71
3 70 95
4 60 69
5 88 100
6 84 71
7 65 75
8 37 60
9 91 95
10 98 99
11 52 65
12 78 83
13 40 60
14 79 86
15 59 62
;
*For objective 1, use the following; PROC TTEST H0=65;
VAR PREAPP; RUN;
*For objective 2, use the following; PROC TTEST;
PAIRED PREAPP*POSTAPP; RUN;
*For objective 3, use the following; PROC TTEST sides=L;
PAIRED PREAPP*POSTAPP; RUN;
Exercise 3.4: In F2 population of a breeding trial on pea, out of a total of 556 seeds, the frequency of seeds of different shape and colour are: 315 rounds and yellow, 101 wrinkled and yellow, 108 round and green , 32 wrinkled and green. Test at 5% level of significance whether the different shape and colour of seeds are in proportion of 9:3:3:1 respectively.
Procedure:
/*rndyel=round and yellow, rndgrn=round and green, wrnkyel=wrinkled and yellow, wrnkgrn=wrinkled and green*/;
data peas;
input shape_color $ count;
cards; rndyel 315 rndgrn 108 wrnkyel 101 wrnkgrn 32
;
proc freq data=peas order=data;
weight count ;
tables shape_color / chisq testp=(0.5625 0.1875 0.1875 0.0625);
exact chisq;
run;
Exercise 3.5: The educational standard of adoptability of new innovations among 600 farmers are given as below:
Draw the inferences whether educational standard has any impact on their adoptability of innovation.
Procedure:
data innovation;
input edu $ adopt $ count;
cards;
Matric adopt 100
Matric Noadopt 50 grad adopt 60
grad Noadopt 20 illit adopt 80
illit Noadopt 290
;
proc freq order=data;
weight count ;
tables edu*adopt / chisq ;
run;
Exercise 3.6: An Experiment was conducted using a Randomized complete block design in 5 treatments a, b, c, d & e with three replications. The data (yield) obtained is given below:
1. Perform the analysis of variance of the data.
2. Test the equality of treatment means.
3. Test H0: 2T1=T2+T3, where as T1, T2, T3, T4 and T5 are treatment effects.
Procedure:
Prepare a SAS data file using
DATA Name;
INPUT REP TRT $ yield; Cards;
. . .
. . .
. . .
;
Print data using PROC PRINT. Perform analysis using PROC ANOVA, obtain means of treatments and obtain pairwise comparisons using least square differences, Duncan’s New Multiple range tests and Tukey’s Honest Significant difference tests. Make use of the following statements:
PROC Print; PROC ANOVA; Class REP TRT;
Model Yield = REP TRT; Means TRT/lsd;
Means TRT/duncan;
Means TRT/tukey; Run;
Perform contrast analysis using PROC GLM. Proc glm;
Class rep trt;
Model yld = rep trt; Means TRT/lsd; Means TRT/duncan; Means TRT/tukey
Contrast ‘1 Vs 2&3’ trt 2 -1 -1; Run;
Exercise 3.7: In order to select suitable tree species for Fuel, Fodder and Timber an experiment was conducted in a randomized complete block design with ten different trees and four replications. The plant height was recorded in cm. The details of the experiment are given below: Plant Height (Cms): Place – Kanpur
Analyze the data and draw your conclusions.
Exercise 3.8: An experiment was conducted with 49 crop varieties (TRT) using a simple lattice design. The layout and data obtained (Yld) is as given below:
REPLICATION (REP)-I
REPLICATION (REP)-II
1(24)
11(51)
48(121)
37(85)
40(33)
10(30)
42(23)
36(58)
4(39)
41(22)
9(10)
12(48)
31(50)
35(54)
29(97)
39(67)
6(75)
30(65)
33(73)
38(30)
28(54)
15(47)
32(93)
34(44)
44(5)
26(56)
45(103)
7(85)
1. Perform the analysis of variance of the data. Also obtain Type II SS.
2. Obtain adjusted treatment means with their standard errors.
3. Test the equality of all adjusted treatment means.
4. Test whether the sum of 1 to 3 treatment means is equal to the sum of 4 to 6 treatments.
5. Estimate difference between average treatment 1 average of 2 to 4 treatment means.
6. Divide the between block sum of squares into between replication sum of squares and between blocks within replications sum of squares.
7. Assuming that the varieties are a random selection from a population, obtain the genotypic variance.
8. Analyze the data using block effects as random.
PROCEDURE
Prepare the DATA file. DATA Name;
INPUT REP BLK TRT yield;
Cards;
. . . .
. . . .
. . . .
;
Print data using PROC PRINT. Perform analysis of 1 to 5 objectives using PROC GLM. The statements are as follows:
Proc print; Proc glm;
Class rep blk trt;
Model yld= blk trt/ss2; Contrast ‘A’ trt 1 1 1 -1 -1 -1; Estimate ‘A’ trt 3 -1 -1 -1/divisor=3; Run;
The objective 6 can be achieved by another model statement. Proc glm;
Class rep blk trt;
Model yield= rep blk (rep) trt/ss2;
run;
The objective 7 can be achieved by using the another PROC statement
Proc Varcomp Method=type1; Class blk trt;
Model yield = blk trt/fixed = 1; Run;
The above obtains the variance components using Hemderson’s method. The methods of maximum likelihood, restricted maximum likelihood, minimum quadratic unbiased estimation can also be used by specifying method =ML, REML, MIVQE respectively.
Objective 8 can be achieved by using PROCMIXED.
Proc Mixed ratio covtest; Class blk trt;
Model yield = trt;
Random blk/s; Lsmeans trt/pdiff; Store lattice;
Run;
PROC PLM SOURCE = lattice; LSMEANS trt /pdiff lines; RUN;
Exercise 3.9: Analyze the data obtained through a Split-plot experiment involving the yield of 3
Irrigation (IRRIG) treatments applied to main plots and two Cultivars (CULT) applied to subplots in three Replications (REP). The layout and data (YLD) is given below:
Replication-I Replication -II Replication-III
I1 I2 I3 I1 I2 I3 I1 I2 I3
C1 (1.6) C2 (3.3)
C1 (2.6) C2 (5.1)
C1 (4.7) C2 (6.8)
C1 (3.4) C2 (4.7)
C1 (4.6) C2 (1.1)
C1 (5.5) C2 (6.6)
C1 (3.2) C2 (5.6)
C1 (5.1) C2 (6.2)
C1 (5.7) C2 (4.5)
Perform the analysis of the data. (HINT: Steps are given in text).
Remark 3.9.1: Another way proposed for analysis of split plot designs is using replication as random effect and analyse the data using PROC MIXED of SAS. For the above case, the steps for using PROC MIXED are:
PROC MIXED COVTEST; CLASS rep irrig cult;
MODEL yield = irrig cult irrig*cult / DDFM=KR; RANDOM rep rep*irrig;
LSMEANS irrig cult irrig*cult / PDIFF; STORE spd;
run;
/* An item store is a special SAS-defined binary file format used to store and restore information with a hierarchical structure*/
/* The PLM procedure performs post fitting statistical analyses for the contents of a SAS item store that was previously created with the STORE statement in some other SAS/STAT procedure*/
PROC PLM SOURCE = SPD;
LSMEANS irrig cult irrig*cult /pdiff lines; RUN;
Remark 3.9.2: In Many experimental situations, the split plot designs are conducted across environments and a pooled is required. One way of analysing data of split plot designs with two factors A and B conducted across environment is
PROC MIXED COVTEST; CLASS
Comments (0)