Training Manual for Data Analysis using SAS, Sujai Das [best management books of all time TXT] 📗
- Author: Sujai Das
Book online «Training Manual for Data Analysis using SAS, Sujai Das [best management books of all time TXT] 📗». Author Sujai Das
‐ research guidance in statistical computing and computational statistics and creating sound and healthy statistical computing environment
‐ Providing advanced, versatile, and innovative and state-of the art high end statistical packages to enable them to draw meaningful and valid inferences from their research.
The efforts also involve designing of intelligent algorithms for implementing statistical techniques particularly for analysing massive data sets, simulation, bootstrap, etc.
The objectives of the consortium are:
‐ To strengthen the high end statistical computing environment for the scientists in NARS;
‐ To organize customized training programmes and also to develop training modules and
manuals for the trainers at various hubs; and
‐ To sensitize the scientists in NARS with the statistical computing capabilities available for enhancing their computing and research analytics skills.
This consortium has provided the platform for closer interactions among all NARS
organizations.
Capacity Building
For capacity building of researchers in the usage of high end statistical computing facility and statistical techniques,
‐ 209 trainers have been trained through 30 working days training programmes;
‐ 2166 researchers have been trained through 104 training programmes of one week duration
each in the usage.
The capacity building efforts have paved the way for publishing research papers in the high impact factor journals.
Indian NARS Statistical Computing Portal
For providing service oriented computing, developed and established Indian NARS Statistical Computing portal, which is available to NARS users through IP authentication at http://stat.iasri.res.in/sscnarsportal. Any researcher from Indian NARS may obtain User name and password from Nodal Officers of their respective NARS organizations, list available at
www.iasri.res.in/sscnars. It is a paradigm of computing techniques that operate on software-as- a-service). There is no need of installation of statistical package at client side. Following 24 different modules of analysis of data are available on this portal, which have been classified into four broad categories as
Basic Statistics
• Descriptive Statistics
• Univariate Distribution Fitting
• Test of Significance based on t-test
• Test of Significance based on Chi-square test
• Correlation Analysis
• Regression Analysis
Designs of Experiments
• Completely randomized designs
• Block Designs (includes both complete and incomplete block designs)
• Combined Block Designs
• Augmented Block Designs
• Resolvable Block Designs
• Nested Block Designs
• Row-Column Designs
• Cross Over Designs
• Split Plot Designs
• Split-Split-Plot Designs
• Split Factorial (main A, sub B C) designs
• Split Factorial (main AB, sub CD) designs
• Strip Plot Designs
• Response Surface Designs
Multivariate Analysis
• Principal Component Analysis
• Linear Discriminant Analysis
Statistical Genetics
• Estimation of Heritability from half- sib data
• Estimation of variance-Covariance matrix from Block Designs
The above modules can be used by uploading *.xlsx, *.csv and *.txt files and results can be saved as *.RTF or *.pdf files. This has helped them in analyzing their data in an efficient manner without losing any time.
Requirements of Excel Files during analysis over Indian NARS Statistical Computing
Portal
1. Excel file must have the .xls, .xlsx, .csv or .txt extensions
2. This system will only consider the first sheet of the excel file which has name appearing first in lexicographic order. It will not analyze the data which lies in subsequent sheets in excel file.
3. Do not put period (.) or Zero (0) to display missing values in the treatment. It will not
consider as missing. Please leave the missing observations as blank cells.
4. If you are getting some wrong analysis then kindly check your excel file. Go to First Column, first cell and then press Ctrl+Shift+End. It will select all the filled rows and columns. If it selects some missing rows and columns then kindly delete those rows and columns otherwise it will give wrong analysis result.
5. Do not use special characters in the variable/column names. Also variable names should not start with spaces.
6. Do not use any formatting to the Excel sheet including formats or expressions to the cell values. It should be data value.
7. If the First row cells has been merged then it will not detect as Column/Variable names.
8. If any rows or columns are hidden then it will be displayed during the analysis.
Basic Statistics
9. Descriptive Statistics: The data file should contain at least one quantitative analysis variable.
10. Univariate Distribution Fitting: The data file should contain at least one quantitative numeric variable.
11. Test of Significance based on t-distribution: The data file should contain at least one quantitative variable name and one classificatory variable.
12. Chi-Square Test: The data file should contain at least one categorical variable and weights or frequency counts variable if frequencies are entered in a separate column. Data may also have classificatory in it.
13. Correlation: The data file should contain at least two quantitative variables.
14. Regression Analysis: The data file should contain at least one Dependent and one
Independent variable.
Design of Experiments
15. Unblock Design: Prepare a data file containing one variable to describe the Treatment details and at least one response/ dependent variable in the experimental data to be analyzed. Also, the treatment details may be coded or may have actual names (i.e. data values, for variable describing treatment column may be in numeric or character). The maximum length of treatment value is 20 characters. The variables can be entered in any order.
16. Block Design: Prepare a data file containing two variables to describe the block and treatment details. There should be at least one response/ dependent variable in the experimental data to be analyzed. Also, the block/treatment details may be coded or may have actual names (i.e. data values, for variables describing block and treatment column may be in numeric or character). The maximum length of treatment value is 20 character. The variables can be entered in any order. (These conditions are applicable to other similar experimental designs also)
17. Combined Block Design: The data file should contain three variables to describe
Environment, Block, Treatment variables and at least one Dependent variable.
18. Augmented Block Design: The data file should contain two variables to describe Block
& Treatment variables and at least one Dependent variable. At present, Portal supports only numeric treatment and block variables for augmented designs. An augmented block design involves two sets of treatments known as check or control and test treatments. The treatments should be numbered in such a fashion that the check or control treatments are numbered first followed by test treatments. For example, if there are 4 control treatments and 8 test treatments, then the control treatments are renumbered as 1, 2, 3, 4 and tests are renumbered as 5, 6, 7, 8, 9, 10, 11, 12.
19. Resolvable Block Design: The data file should contain three variables to describe the
Replication, Block, Treatment variables and at least one Dependent/ response variable.
20. Nested Block Design: The data file should contain three variables to describe Block, SubBlock, Treatment variables and at least one Dependent variable.
21. Row Column Design: The data file should contain three variables to describe Row, Column, Treatment variables and at least one Dependent variable.
22. Crossover Design: Create a data file with at least 5 variables, one for units, one for periods, one treatments, one for residual, and one for the dependent or analysis variable. For performing analysis using the portal, please rearrange the data in the following order: animal numbers as units; periods can be coded as 1, 2, 3, and so on, treatments as
alphabets or numbers (coding could be done as follows: for every first period the number one has assigned (fixed) and for other periods code 1 to 3 are given according to the treatment received by the unit in the previous period) and residual effect as residual. It may, however, be noted that one can retain the same name or can code in any other fashion. A carry-over or residual term has the special property as a factor, or class variate, of having no level in the first period because the treatment in the first period is not affected by any residual or carry over effect of any treatment. When we consider the residual or carryover effect in practice the fact that carry-over or residual effects will be adjusted for period effects (by default all effects are adjusted for all others in these analysis). As a consequence, any level can be assigned to the residual variate in the first period, provided the same level is always used. An adjustment for periods then removes this part of the residual term. (For details a reference may made to Jones, B. and Kenward,M.G. 2003. Design and Analysis of Cross Over Trials. Chapman and Hall/CRC. New York . Pp: 212)
23. Split Plot Design: The data file should contain three variables to describe Replication, Main Plot, Sub Plot variables and at least one Dependent variable.
24. Split Split Plot Design: The data file should contain four variables to describe Replication, Main Plot, Sub Plot, and Sub-Sub Plot Treatment variables and at least one Dependent variable.
25. Split Factorial (Main A, Sub B×C) Plot Design The data file should contain four variables to describe Replication, Main Plot, Sub Plot(1){levels of factor 1 in sub plot} , and Sub Plot(2) ){levels of factor 21 in sub plot} Treatment variables and at least one Dependent variable.
26. Split Factorial (Main A×B, Sub C×D) Plot Design: Create a data file with at least 6 variables, one for block or replication, one for main plot- treatment factor 1, one main plot- treatment factor 2, one for subplot- treatment factor 1, one for subplot- treatment factor 2 and at least one for the dependent or analysis variable. If the data on more than one dependent variable is collected in the same experiment, the data on all variables may be entered in additional columns. One may give actual levels used for different factors applied in main plot-treatment factor 1, main plot- treatment factor 2, subplot- treatment factor 1 and subplot- treatment factor 2. Please remember that there should not be any space between a single data value. Main plot- treatment factor 1, main plot- treatment factor 2, subplot- treatment factor 1, subplot- treatment factor 2 treatments and block numbers may be coded as 1, 2, 3 and so on. One can have character values also.
27. Strip Plot Design: The data file should contain at least 4 variables to describe Replication, Horizontal Strip, Vertical Strip variables and at least one Dependent variable.
28. Response Surface Design: The data file should contain at least one treatment factor variable and at least one dependent variable
Multivariate Analysis
29. Principal Component Analysis: The data file should contain at least one quantitative analysis variable.
30. Discriminant Analysis: The data file should contain at least one quantitative analysis variable and a classificatory variable.
Statistical Genetics
31. Genetic Variance Covariance: Create a data file with at least 4 variables, one for blocking variable, one for treatments and at least two analysis variable.
32. Heritability Estimation from Half-Sib Data: The data file should contain at least one quantitative analysis variable and a classificatory variable.
Other IP Authenticated Services
Following can also be accessed through IP authenticated networks:
Web Report Studio: http://stat.iasri.res.in/sscnarswebreportstudio
BI DashBoard: http://stat.iasri.res.in/sscnarsbidashboard
Web OLAP Viewer: http://sas.iasri.res.in:8080/sscnarswebolapviewer
E-Miner 6.1: http://sas.iasri.res.in:6401/AnalyticsPlatform
E-Miner 7.1: http://stat.iasri.res.in/SASEnterpriseMinerJWS/Status
Accessing SAS E-Miner through URL (IP Authenticated Services)
For Accessing E-miner 6.1 and 7.1 through URLs, following ports should be open
Server
Ports
1) Metadata server
8561
2) Object spawner
8581
3) Table Server
2171
4) Remote Server
5091
5) SAS App. Olap Server
5451
6) SAS Deployment Tester Server
10021
7) Analytics Platform Server
6411
8) Framework Server
22031
However, if you are accessing only E-miner 6.1, then following port need not be opened.
Framework Server 22031
Steps for accessing SAS Enterprise Miner 6.1 and SAS Enterprise Miner 7.1 separately
SAS Enterprise Miner 6.1
Pre-requisite:
‐ JRE 1.5 Update 15
‐ If Firewall and proxy has been implemented then
Comments (0)