Friday, February 04, 2011

SAS 9.2 New Procedures


In this post, I am going to introduce a couple of new procedures and some new features in existing procedures in SAS 9.2 for statistical analysis. If you are interested in knowing more about what's new in SAS 9.2, here is the link to the documentation by SAS on what's in SAS 9.2.

1. Setting up a learning environment within SAS

SAS comes with a great many sample programs for data steps and for all the procedures. SAS 9.2 also has the entire online documentation within SAS.

2. New procedures for statistical analysis
  • PROC GLIMMIX
You probably have used proc glimmix in SAS 9.1.3  for analyzing multilevel data with non-normal, such as count or dichotomous outcome variables. In SAS 9.1.3, proc glimmix is an experimental procedure that requires additional downloading and installation. Now in SAS 9.2 it is a production procedure. Moreover, it offers maximum likelihood estimation wit adaptive quadrature together with Laplace approximation estimation method. Same as most of the other statistical procedures, it also provides ODS graphics, such as diagnostics graphs. It can handle normal, binary, binomial, ordered and count outcome variables.

Here is an example dealing with a binary outcome variable. 

ods graphics on;

proc glimmix data = ats.thaieduc plots =(all) noclprint method=quad;
  class  sex schoolid;
  model repeat (event='1') = sex msesc sex*msesc
                            / solution dist=binary
                              oddsratio (at msesc = .5 unit msesc =.1);
  random intercept /subject = schoolid;
run;

ods graphics off;

The GLIMMIX Procedure

                  Model Information

Data Set                      ATS.THAIEDUC

Response Variable             REPEAT
Response Distribution         Binary
Link Function                 Logit
Variance Function             Default
Variance Matrix Blocked By    SCHOOLID
Estimation Technique          Maximum Likelihood
Likelihood Approximation      Gauss-Hermite Quadrature
Degrees of Freedom Method     Containment
Number of Observations Read        8582
Number of Observations Used        7516

           Response Profile
 Ordered                        Total
   Value    REPEAT          Frequency
       1    0                    6449
       2    1                    1067

The GLIMMIX procedure is modeling the probability that REPEAT='1'.

            Dimensions
G-side Cov. Parameters           1
Columns in X                     6
Columns in Z per Subject         1
Subjects (Blocks in V)         356
Max Obs per Subject             41

           Optimization Information

Optimization Technique        Dual Quasi-Newton
Parameters in Optimization    5
Lower Boundaries              1
Upper Boundaries              0
Fixed Effects                 Not Profiled
Starting From                 GLM estimates
Quadrature Points             7


                                Iteration History

                                           Objective                         Max

Iteration    Restarts    Evaluations        Function          Change    Gradient

        0           0              4    5507.6473045       .            130.4493
        1           0              3    5482.1591394     25.48816512    24.41885
        2           0              3     5479.727173      2.43196632    10.25265
        3           0              3    5478.7888209      0.93835210    5.524192
        4           0              2    5478.7248344      0.06398651    0.968477
        5           0              3    5478.7227711      0.00206335    0.397583
        6           0              3    5478.7223653      0.00040580    0.012755
        7           0              3    5478.7223621      0.00000320    0.002078

         Convergence criterion (GCONV=1E-8) satisfied.

           Fit Statistics

-2 Log Likelihood            5478.72
AI(smaller is better)     5488.72
AICC (smaller is better)     5488.73
BIC  (smaller is better)     5508.10
CAIC (smaller is better)     5513.10
HQIC (smaller is better)     5496.43


     Fit Statistics for Conditional

              Distribution
-2 log L(REPEAT | r. effects)     4754.08
Pearson Chi-Square                5629.08
Pearson Chi-Square / DF              0.75

       Covariance Parameter Estimates

                                     Standard

Cov Parm     Subject     Estimate       Error
Intercept    SCHOOLID      1.7364      0.2143

                        Solutions for Fixed Effects

             pupil                 Standard

Effect       gender    Estimate       Error       DF    t Value    Pr > |t|
Intercept               -1.9866     0.09301      354     -21.36      <.0001
SEX          0          -0.5474     0.07603     7158      -7.20      <.0001
SEX          1                0           .        .        .         .
MSESC                   -0.3250      0.2328     7158      -1.40      0.1626
MSESC*SEX    0          -0.3045      0.1975     7158      -1.54      0.1232
MSESC*SEX    1                0           .        .        .         .


                              Odds Ratio Estimates

pupil               pupil                                       95% Confidence
gender     MSESC    gender    _MSESC    Estimate       DF           Limits
0            0.5    1            0.5       0.497     7158       0.386       0.640
0            0.6    0            0.5       0.939     7158       0.895       0.986
1            0.6    1            0.5       0.968     7158       0.925       1.013


        Type III Tests of Fixed Effects

              Num      Den
Effect         DF       DF    F Value    Pr > F
SEX             1     7158      51.84    <.0001
MSESC           1     7158       4.75    0.0294
MSESC*SEX       1     7158       2.38    0.1232


Proc countreg is part of SAS/ETS for econometrics and time series. It supports the following models for count data: Poisson regression, negative binomial regression, zero-inflated Poisson (ZIP) model  and zero-inflated negative binomial (ZINB) model. Proc genmod in SAS/STAT module supports everything but ZINB model.

PROC MCMC

Proc mcmc is for Bayesian models using Markov chain Monte Carlo (MCMC) simulation. It can be used as a simulation tool. Here is an example from SAS documentation for simulating a normal distribution.

data x;
  run; 
ods graphics on;
proc mcmc data=x outpost=simout seed=23 nmc=10000 maxtune=0
          nbi=0 statistics=(summary interval) diagnostics=none;
   parm alpha 0;
   prior alpha ~ normal(0, sd=1);
   model general(0);
run;
ods graphics off;


The MCMC Procedure

                               Posterior Summaries

                                      Standard               Percentiles
Parameter           N        Mean    Deviation         25%         50%         75%
alpha           10000     -0.0392       1.0194     -0.7198     -0.0403      0.6351


                       Posterior Intervals
Parameter    Alpha     Equal-Tail Interval        HPD Interval
alpha        0.050     -2.0746      1.9594     -2.2197      1.7869

3. New features in existing procedures

  • PROC FREQ

·         *testing for specified proportions;

·         proc freq data=ats.hsb2;

·            tables ses / testp=(.33 .4 .27);
run;



The FREQ Procedure

                                   Test     Cumulative    Cumulative
ses    Frequency     Percent     Percent     Frequency      Percent
--------------------------------------------------------------------
  1          47       23.50       33.00            47        23.50
  2          95       47.50       40.00           142        71.00
  3          58       29.00       27.00           200       100.00

     Chi-Square Test
for Specified Proportions
-------------------------
Chi-Square         8.5785
DF                      2
Pr > ChiSq         0.0137
Sample Size = 200

* distribution plot;
ods graphics on;
proc freq data = ats.hsb2;
  tables ses*prog;
run;
ods graphics off;


*binomial proportion test and confidence interval;
proc freq data = ats.hsb2;
  tables prog /binomial (level=2 p=.55 all);
run;

                     type of program
                                 Cumulative    Cumulative
prog    Frequency     Percent     Frequency      Percent
---------------------------------------------------------
   1          45       22.50            45        22.50
   2         105       52.50           150        75.00
   3          50       25.00           200       100.00

 Binomial Proportion
     for prog = 2
----------------------
Proportion      0.5250
ASE             0.0353

Type                     95% Confidence Limits

Wald                          0.4558    0.5942

Wilson                        0.4560    0.5931

Agresti-Coull                 0.4560    0.5931

Jeffreys                      0.4558    0.5934

Clopper-Pearson (Exact)       0.4534    0.5959
 Test of H0: Proportion = 0.55
ASE under H0              0.0352
Z                        -0.7107
One-sided Pr <  Z         0.2386
Two-sided Pr > |Z|        0.4773

Sample Size = 200

·         * robust standard error, collinearity and test of heteroscedasticity;

·         ods graphics on;

·         proc reg data = ats.hsb2 plots=diagnostics;

·           model write = female math read /collin  spec hccmethod=1 white;

·         run;

·         quit;

ods graphics off;


The REG Procedure

Model: MODEL1

Dependent Variable: write writing score

Number of Observations Read         200

Number of Observations Used         200


                             Analysis of Variance

                                    Sum of           Mean

Source                   DF        Squares         Square    F Value    Pr > F

Model                     3     9405.34864     3135.11621      72.52    <.0001
Error                   196     8473.52636       43.23228
Corrected Total         199          17879
Root MSE              6.57513    R-Square     0.5261
Dependent Mean       52.77500    Adj R-Sq     0.5188
Coeff Var            12.45879


                              Parameter Estimates

                                   Parameter      Standard

Variable    Label           DF      Estimate         Error   t Value   Pr > |t|
Intercept   Intercept        1      11.89566       2.86285      4.16     <.0001
female                       1       5.44337       0.93500      5.82     <.0001
math        math score       1       0.39748       0.06640      5.99     <.0001
read        reading score    1       0.32524       0.06073      5.36     <.0001



                        Parameter Estimates

                                 ---Heteroscedasticity Consistent--

                                    Standard

Variable    Label           DF         Error    t Value    Pr > |t|
Intercept   Intercept        1       2.58504       4.60      <.0001
female                       1       0.94931       5.73      <.0001
math        math score       1       0.06359       6.25      <.0001
read        reading score    1       0.05874       5.54      <.0001


HCC Approximation Method: HC1
       Collinearity Diagnostics

                             Condition
  Number     Eigenvalue          Index
       1        3.58262        1.00000

       2        0.38760        3.04024

       3        0.01873       13.83149

       4        0.01105       18.00780


                      Collinearity Diagnostics

            -----------------Proportion of Variation----------------

  Number      Intercept         female           math           read

       1        0.00199        0.02429        0.00129        0.00155
       2        0.00333        0.94447        0.00305        0.00402
       3        0.90676        0.03123        0.04497        0.33778
       4        0.08791     0.00000813        0.95069        0.65665


     Test of First and Second

       Moment Specification
    DF    Chi-Square    Pr > ChiSq
     8         20.78        0.0078
The model below has an interaction of a categorical variable with a continuous variable. SAS 9.2 creates an ANOVA plot if we just turn the ODS graphics on. 

ods graphics on;
proc glm data = ats.hsb2;
  class female ;
  model write = female math female*math ;
run;
quit;
ods graphics off;


Proc glm in SAS 9.2 provides measures of effect size. Notice that this option is still experimental.

proc glm data = ats.hsb2;
  class female prog;
  model write = female prog female*prog /ss3 effectsize;
run;
quit;

                                      Sum of

Source                     DF        Squares    Mean Square   F Value   Pr > F
Model                       5     4630.36091      926.07218     13.56   <.0001
Error                     194    13248.51409       68.29131
Corrected Total           199    17878.87500
R-Square     Coeff Var      Root MSE    write Mean

0.258985      15.65866      8.263856      52.77500


           Overall Noncentrality

Min Var Unbiased Estimate    62.104
Low MSE Estimate             61.457
95% Confidence Limits        (33.709,102.7)

   Proportion of Variation Accounted for

Eta-Square                   0.26
Omega-Square                 0.24
95% Confidence Limits        (0.14,0.34)


Source                     DF    Type III SS    Mean Square   F Value   Pr > F

female                      1    1261.853291    1261.853291     18.48   <.0001
prog                        2    3274.350821    1637.175410     23.97   <.0001
female*prog                 2     325.958189     162.979094      2.39   0.0946



                                   Noncentrality Parameter

                           Min Var

                          Unbiased        Low MSE
Source                    Estimate       Estimate    95% Confidence Limits

female                       17.29           17.1        5.23     39.7
prog                         45.45           45.0       22.56     79.8
female*prog                   2.72            2.7        0.00     15.9



                               Total Variation Accounted Fo

  Semipartial          Semipartial         Omega-         Conservative

Source                  Eta-Square         Square    95% Confidence Limits
female                      0.0706         0.0665     0.0173  0.1469
prog                        0.1831         0.1748     0.0911  0.2718
female*prog                 0.0182         0.0106     0.0000  0.0637


                              Partial Variation Accounted For

                                          Partial
                           Partial         Omega-

Source                  Eta-Square         Square    95% Confidence Limits

female                      0.0870         0.0804     0.0255  0.1656
prog                        0.1982         0.1868     0.1014  0.2851
female*prog                 0.0240         0.0137     0.0000  0.0735



When an interaction term is present, odds ratios are calculated and graphed as shown in the example below.

data hsb2;
  set ats.hsb2;
  hon=(write>60);
run;

ods graphics on;

proc logistic data = hsb2 descending;
   model hon = female math female*math;
   oddsratio female / at(math = 45 50 65);
run;
ods graphics off;


The LOGISTIC Procedure
         Analysis of Maximum Likelihood Estimates

                                 Standard          Wald

Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept       1     -8.7458      2.1291       16.8729        <.0001
female          1     -2.8998      3.0942        0.8783        0.3487
math            1      0.1294      0.0359       12.9994        0.0003
female*math     1      0.0670      0.0535        1.5704        0.2101


         Wald Confidence Interval for Odds Ratios

Label                     Estimate    95% Confidence Limits

female at math=45            1.122       0.245        5.139
female at math=50            1.568       0.517        4.759
female at math=65            4.284       1.386       13.237



When there is a quasi-complete separation of data points, the maximum likelihood estimate may not exist. SAS 9.2 provides Firth estimation for dealing with the issue of quasi or complete separation of data points.

data test;
 input Y X freq;
datalines;
0 1     3
0 2     4
0 3     5
0 3     10
1 3     6
1 4     12
1 5     8
1 6     9
1 10 11
1 11 6
;
run;

proc logistic data = test descending;
 freq freq;
  model y = x;
run;


The LOGISTIC Procedure

WARNING: The validity of the model fit is questionable.

        Testing Global Null Hypothesis: BETA=0
Test                 Chi-Square       DF     Pr > ChiSq
Likelihood Ratio        64.9376        1         <.0001

Score                   26.0506        1         <.0001

Wald                     0.0859        1         0.7695



             Analysis of Maximum Likelihood Estimates

                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1    -32.8245       108.9        0.0909        0.7630
X             1     10.6361     36.2903        0.0859        0.7695



proc logistic data = test descending;
  freq freq;
  model y = x /firth;
run;

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq
Likelihood Ratio        57.0231        1         <.0001
Score                   25.3902        1         <.0001
Wald                     7.6435        1         0.0057


             Analysis of Maximum Likelihood Estimates

                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept     1    -13.0905      4.5755        8.1851        0.0042
X             1      4.0766      1.4745        7.6435        0.0057

ROC curves and ROC curve contrast.

ods graphics on;
proc logistic data=hsb2 plots=roc(id=prob);
   model hon = female math read;
   roc 'female' female;
   roc 'maths score' math;
   roc 'read' read;   
   roccontrast reference('female') / estimate e;
run;
ods graphics off;


                         ROC Association Statistics
              -------------- Mann-Whitney -------------
                         Standard         95% Wald        Somers' D
ROC Model         Area      Error    Confidence Limits       (Gini)      Gamma
Model           0.8569     0.0288     0.8005     0.9134      0.7139     0.7142
female          0.5716     0.0400     0.4932     0.6499      0.1431     0.2880
maths score     0.8325     0.0329     0.7681     0.8970      0.6651     0.6792
read            0.7979     0.0325     0.7343     0.8616      0.5959     0.6298


ROC Association Statistics
ROC Model        Tau-a
Model           0.2654
female          0.0532
maths score     0.2473
read            0.2216


              ROC Contrast Coefficients

ROC Model            Row1          Row2          Row3
Model                   1             0             0
female                 -1            -1            -1
maths score             0             1             0
read                    0             0             1



              ROC Contrast Test Results
Contrast                DF    Chi-Square    Pr > ChiSq
Reference = female       3      113.0593        <.0001


                ROC Contrast Rows Estimation and Testing Results


                                Standard       95% Wald                     Pr >
Contrast              Estimate     Error   Confidence Limits  Chi-Square   ChiSq
Model - female          0.2854    0.0439    0.1994    0.3714     42.3060  <.0001
maths score - female    0.2610    0.0532    0.1567    0.3652     24.0700  <.0001
read - female           0.2264    0.0543    0.1199    0.3329     17.3547  <.0001



4. New graphics procedures for statistical graphics

proc sgplot data=ats.hsb2;
  dot ses / response=write stat=mean
            limitstat=stddev numstd=1;
run;




proc sgplot data=ats.hsb2;
  scatter x=math y=write;
  ellipse x=math y=write;
  keylegend / location=inside position=bottomright;
run;

title;
filename odsout 'c:\sas\temp\test.htm';
goptions device = java ;
ods listing close;
ods html file=odsout style=styles.ocean;
proc gchart data=ats.hsb2;
  block prog /sumvar= write type=mean;
run;
ods html close;

ods listing;


  • PROC LOGISTIC

  • PROG GLM

  • PROC REG

  • PROC COUNTREG and PROC GENMOD for count models

9 comments:

  1. Why are there so many blank lines in this blog?

    ......Philip Holland

    ReplyDelete
  2. I sincerely make an apology for trouble cause to you, I will remove blanks. Thanks for suggestions Philip.

    ReplyDelete
  3. Venkata... will you please make presentation on SAS debugger .... it would be of great help

    ReplyDelete
  4. Hi Shounak, Thanks for the comments. I will soon keep request information.

    ReplyDelete
  5. In order to ensure this Phen375 is good to suit your needs and
    your entire body, you must make sure that you eat well, nap effectively, and take in a
    lot water although you are saved to the following fat getting rid of program.
    For those who have plans to usephentermine side effectsfor a long length,
    you should get used to a far more wholesome diet decide to absolutely take pleasure
    in it's superior repercussions.You might view some alternation in ones feces persistence through the entire interval after you take in Phen375. There isn't anything to get anxious,
    this really should go back to normal sooner or later.Over-all, there've actually been an enormous number of men and women that have slipped weight using Phen375. Doesn't necessarily work for everyone, however it's efficient for just a quite high number of folks who take it regularly.
    Here is my web site ... effective weight loss

    ReplyDelete
  6. One of the best ways to prevent hair loss is to prevent your hair from tangling, so it's best that you use a very soft pillow when you're sleeping.
    Both men and women can suffer from hair loss. Protective Treatment Spray.


    Feel free to surf to my web blog; Hair growth

    ReplyDelete
  7. Once that has been accomplished, you just need to submit a
    current photo ID, sign some paperwork and provide sexy, high-quality photos
    of yourself. If you are adventurous and enjoy expressing your sexuality
    while entertaining others, webcam entertaining may be the perfect
    fit for you. They seem to have an efficient way for finding people who you're interested and try to make a connection.

    Here is my web-site ... free sex cams

    ReplyDelete
  8. A huge dick in my pussy,any warm wet tounge up our arse and cum and pussy
    juice all over me. Fuck, ozzy

    Also visit my webpage :: hcg injections

    ReplyDelete
  9. A huge dick in my pussy,any warm wet tounge up our arse and cum and pussy juice all over me.
    Fuck, ozzy

    my page; hcg injections
    Also see my page > hcg injections

    ReplyDelete

I love to hear from you! Leave a comment.
If your question is unrelated to this article, please use my Facebook page.