SAS and Stats: Proc Logistic and Logistic Regression Models

Friday, January 28, 2011

Proc Logistic and Logistic Regression Models

• Introduction
• Binary Logistic Regression
• Exact Logistic Regression
• Generalized Logits Model - Multinomial Logistic Regression
• Proportional Odds Model - Ordinal Logistic Regression

Introduction

Logistic regression describes the relationship between a categorical response variable and a set of predictor variables. A categorical response variable can be a binary variable, an ordinal variable or a nominal variable. Each type of categorical variables requires different techniques to model its relationship with the predictor variables. In this seminar, we illustrate how to perform different types of analyses using SAS proc logistic. For a binary response variable, such as a response to a yes-no question, a commonly used model is the logistic regression model. We also touch the surface of exact logistic regression, which is very useful when the sample size is too small or the events are too sparse. For a nominal response variable, such as Democrats, Republicans and Independents, we can fit a generalized logits model. For an ordinal response variable, such as low, medium and high, we can fit it to a proportional odds model.

Logistic Regression Models

In this section, we will use the High School and Beyond data to describe what a logistic model is, how to perform a logistic regression model analysis and how to interpret the model. Our dependent variable is created as a dichotomous variable indicating if a student's writing score is higher than or equal to 52. We call it hiwrite. The predictor variables will include prog, female and other test scores. Our data set has 200 observations.

(For Data write to me, I make an apology for trouble cause to you.)

data hsb2;
set hsb2;
hiwrite = write >=52;
run;
proc means data = hsb2 mean std;
run;
Variable Mean Std Dev
----------------------------------------
ID 100.5000000 57.8791845
FEMALE 0.5450000 0.4992205
RACE 3.4300000 1.0394722
SES 2.0550000 0.7242914
SCHTYP 1.1600000 0.3675260
PROG 2.0250000 0.6904772
READ 52.2300000 10.2529368
WRITE 52.7750000 9.4785860
MATH 52.6450000 9.3684478
SCIENCE 51.8500000 9.9008908
SOCST 52.4050000 10.7357935
hiwrite 0.6300000 0.4840159
----------------------------------------
Let π be the probability of scoring higher than 51 in writing test. The odds is π/(1-π). For example, the overall probability of scoring higher than 51 is .63. The odds will be .63/(1-.63) = 1.703. A logistic regression model describes a linear relationship between the logit, which is the log of odds, and a set of predictors.

logit(π) = log(π/(1-π)) = α + β1*x1 + β2*x2 + ... + βk*xk = α + x β

We can either interpret the model using the logit scale, or we can convert the log of odds back to the probability such that

π = exp(α + x β) /(1 + exp(α + x β)).

The advantage of using the logit scale for interpretation is that the relationship between the logit and the predictors is a linear relationship. But sometimes it is easier to interpret the model in terms of probabilities. Then we have to keep in mind that the relationship between the probabilities and the predictors is not a linear relationship. For more details on odds ratio, please write to me.

A Simple Model

Let's consider the model where female is the only predictor. We will use this example to understand the concepts of odds and odds ratios and to understand how they are related to the parameter estimates.

proc logistic data = hsb2 ;
model hiwrite (event='1') = female ;
ods output ParameterEstimates = model_female;
run;

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 0.0220 0.2097 0.0110 0.9165
FEMALE 1 0.9928 0.3016 10.8369 0.0010

Notice that we can specify which event to model using the event = option in the model statement. This is new in SAS 8.2. The other way of specifying that we want to model 1 as event instead of 0 is to use the descending option in the proc logistic statement. One thing that is worth noticing is the use of quotes in the option event = '1'. Even though, the variable hiwrite is a numeric variable, it is still necessary to surround 1 with a pair of quotes. It comes handy when the outcome variable is coded as a character variable. Using the ODS output statement, we created a data set called model_female containing the parameter estimates shown above. We can then use the data set to create the odds and odds ratio.

data model_fem;
set model_female;
o = exp(estimate);
run;

proc print data = model_fem;
var variable estimate o;
run;

Obs Variable Estimate o
1 Intercept 0.0220 1.02222
2 FEMALE 0.9928 2.69865

The intercept has a parameter estimate of .022. This is the estimated logit when female = 0, that is when the student is a male student. Therefore, the odds = exp(logit) = exp(.0220) = 1.02222 is the estimated odds for a male student to score 52 or higher in writing test. The coefficient for variable female is .9928. That means that for a one unit increase in female (that is changing from male to female) the expected change in log of odds is .9928. We can also interpret it in the scale of odds ratio. The odds for a male student is exp(α) = exp(.022) and the odds for a female student is exp(.022 + .9928*1). Therefore, taking the ratio of these two odds, we get the odds ratio for female versus male is exp(.9928) = 2.699. In terms of probabilities, the probability for females to score 52 or higher on the writing test is exp(.022 + .9928) / (1 + exp(.022 + .9928)) = .734. The probability for males is exp(.022 )/(1 + exp(.022)) = .505.

With this simple example, we can actually compute the odds ratio from the 2x2 table of hiwrite*female.

proc freq data = hsb2;
tables hiwrite*female /nocum nopercent;
run;

hiwrite FEMALE
Frequency|
Row Pct |
Col Pct | 0| 1| Total
---------+--------+--------+
0 | 45 | 29 | 74
| 60.81 | 39.19 |
| 49.45 | 26.61 |
---------+--------+--------+
1 | 46 | 80 | 126
| 36.51 | 63.49 |
| 50.55 | 73.39 |
---------+--------+--------+
Total 91 109 200

For example, for males, the odds is 46/45 = 1.022, which is the exponentiated value of the intercept from the model. The odds ratio for females versus males is (80/29)/(46/45) = 2.699. It is usually written as a cross-product (45*80)/(29*46) = 2.699. This is the exponentiated value of the parameter estimate for variable female.

A Model with a Continuous Predictor and a Categorical Predictor

Let's now take a look at a model with both a continuous variable math and a categorical variable female as predictors. We will focus on how to interpret the parameter estimate for the continuous variable.

proc logistic data = hsb2;
model hiwrite (event='1') = female math;
output out = m2 p = prob xbeta = logit;
run;

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -10.3651 1.5535 44.5153 <.0001 FEMALE 1 1.6304 0.4052 16.1922 <.0001 MATH 1 0.1979 0.0293 45.5559 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits FEMALE 5.106 2.308 11.298 MATH 1.219 1.151 1.291 The interpretation for the parameter estimate of math is very similar to that for the categorical variable female. In terms of logit scale, we can say that for every unit increase in the math score, the logit will increase by .198, holding everything else constant. We can also say that for a one unit increase in math score, the odds of scoring 51 or higher in writing test increases by (1.219-1)*100% = 22%. We used an output statement to create a data set containing the predicted probabilities based on the model. We can compare the linear predictions and the probabilities in terms of the math scores for the males and females. proc sort data = m2; by math; run; symbol1 i = join v=star l=32 c = black; symbol2 i = join v=circle l = 1 c=black; proc gplot data = m2; plot logit*math = female; plot prob*math = female; run; quit; Sometimes, a one unit change may not be a desirable scale to use. We can ask SAS to give us odds ratio for different units of change. For example, it may make more sense to talk about change of every 5 units in math score. This can be done using units statement. We also include the option clodds = wald to the model statement so that the confidence interval will also be calculated for the odds ratio calculated in the unit statement. Of course, you can always manually compute the odds ratio for every 5 units change in math score as 1.219^5 = 2.69. proc logistic data = hsb2 ; model hiwrite (event='1') = female math /clodds=wald; units math = 5; run; Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits FEMALE 5.106 2.308 11.298 MATH 1.219 1.151 1.291 Wald Confidence Interval for Adjusted Odds Ratios Effect Unit Estimate 95% Confidence Limits MATH 5.0000 2.689 2.018 3.584 Other Features of Proc Logistic We will illustrate other features of proc logistic by using a model with more predictors. We will include categorical variables prog and female, continuous variables math and read. This model is merely for the purpose of demonstrating proc logistic, not really a model developed based on any theory. proc logistic data = hsb2 ; class prog (ref='1') /param = ref; model hiwrite (event='1') = female read math prog ; run; Response Profile Ordered Total Value hiwrite Frequency 1 0 74 2 1 126 Probability modeled is hiwrite=1. Class Level Information Design Class Value Variables PROG 1 0 0 2 1 0 3 0 1 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -12.3140 2.0374 36.5311 <.0001 FEMALE 1 1.9576 0.4533 18.6541 <.0001 READ 1 0.1037 0.0298 12.1453 0.0005 MATH 1 0.1310 0.0329 15.8738 <.0001 PROG 2 1 0.2721 0.4889 0.3098 0.5778 PROG 3 1 -0.5776 0.5478 1.1116 0.2917 CLASS statement Notice that we have used the class statement for variable prog. SAS will create dummy variables for a categorical variable on-the-fly. There are various coding schemes from which to choose. The default coding for all the categorical variables in proc logistic is the effect coding. Here we changed it to dummy coding by using the param = ref option. We can specify the comparison group by using ref = option after the variable name. There are other coding schemes available, such as orthogonal polynomial coding scheme and reference cell coding. We can double check what coding scheme is used and which group is the reference group by looking at the Class Level Information part of the output. CONTRAST statement In the parameter estimates, we only see the comparison of level 2 vs. 1 and level 3 vs. 1 for variable prog. If we want to compare level 2 vs. level 3, we can use the contrast statement. Usually, contrast is done using less than full rank, reference cell coding as used in proc glm. We chose this type of coding by using param = glm option in the class statement. We also used estimate option at the end of contrast statement to get the estimate of the difference between group 1 and group 2. It is always a good idea to check the Class Level Information to see how the variable is coded so we know that the contrast statement gives us the expected contrast among groups. proc logistic data = hsb2 ; class prog /param = glm ; model hiwrite (event='1') = female read math prog; contrast '1 vs 2 of prog' prog 1 -1 0 / estimate; run; Class Level Information Class Value Design Variables PROG 1 1 0 0 2 0 1 0 3 0 0 1 Contrast Test Results Wald Contrast DF Chi-Square Pr > ChiSq
1 vs 2 of prog 1 0.3098 0.5778

Contrast Rows Estimation and Testing Results
Standard

Contrast Type Row Estimate Error Alpha Confidence Limits
1 vs 2 of prog PARM 1 -0.2721 0.4889 0.05 -1.2303 0.6861

Contrast Rows Estimation and Testing Results
Wald

Contrast Type Row Chi-Square Pr > ChiSq
1 vs 2 of prog PARM 1 0.3098 0.5778

TEST Statement

The parameter estimates offers all the one degree of freedom test on each of the parameters. We can also test the combined effect of multiple parameters using the test statement. In the example below, we first tested on the joint effect of read and math. Next we tested on the hypothesis that the effect of read and math are the same.

proc logistic data = hsb2 ;
class prog(ref='1') /param = ref;
model hiwrite (event='1') = prog female read math;
test_read_math: test read, math;
test_equal: test read = math;
run;

Linear Hypotheses Testing Results

Wald

Label Chi-Square DF Pr > ChiSq
test_read_math 37.2236 2 <.0001 test_equal 0.3041 1 0.5813 LACKFIT and RSQUARE Option The Hosmer-Lemeshow test of goodness-of-fit can be performed by using the lackfit option after the model statement. This test divides subjects into deciles based on predicted probabilities, then computes a chi-square from observed and expected frequencies. It tests the null hypothesis that there is no difference between the observed and predicted values of the response variable. Therefore, when the test is not significant, as in this example, we can not reject the null hypothesis and say that the model fits the data well. We can also request the generalized R-square measure for the model by using rsquare option after the model statement. SAS gives the likelihood-based pseudo R-square measure and its rescaled measure. Categorical Data Analysis Using The SAS System, by M. Stokes, C. Davis and G. Koch offers more details on how the generalized R-square measures that you can request are constructed and how to interpret them. proc logistic data = hsb2; class prog(ref='1') /param = ref; model hiwrite(event='1') = female prog read math / rsq lackfit; run; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 265.582 167.035 SC 268.881 186.825 -2 Log L 263.582 155.035 R-Square 0.4188 Max-rescaled R-Square 0.5720 Partition for the Hosmer and Lemeshow Test hiwrite = 1 hiwrite = 0 Group Total Observed Expected Observed Expected 1 20 0 1.08 20 18.92 2 20 4 3.45 16 16.55 3 20 9 6.54 11 13.46 4 21 10 10.86 11 10.14 5 20 13 13.64 7 6.36 6 20 17 15.70 3 4.30 7 20 16 17.44 4 2.56 8 20 18 18.76 2 1.24 9 20 20 19.61 0 0.39 10 19 19 18.90 0 0.10 Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq
5.2766 8 0.7276

Influence Statistics

One important topic in logistic regression is regression diagnostics. Proc logistic can generate a lot of diagnostic measures for detecting outliers and influential data points for a binary outcome variable. These diagnostic measures can be requested by using the output statement. For example, we can request for residual deviance, the hat matrix diagonal and residual chi-squared deviance and the difference between chi-square goodness-of-fit when an observation is deleted. We can then plot these variables against the predicted values to investigate the influence of each point on the model. By using the pointlabel option in the symbol statement, we can see that the observation with id = 187 has the highest influence on the chi-square goodness-of-fit.

proc logistic data = hsb2 ;
class prog(ref='1') /param = ref;
model hiwrite(event='1') = female prog read math ;
output out=dinf prob=p resdev=dr h=pii reschi=pr difchisq=difchi;
run;

goptions reset = all;
symbol1 pointlabel = ("#id" h=1 ) value=none;
proc gplot data = dinf;
plot difchi*p;
run;
quit;

Scoring a New Data Set

There are situations where we want to produce predicted probabilities for a specific combination of the values of the predictors. For example, we may want to know the predicted probabilities for groups defined by female and prog when math and read are held at their grand means. Let's first create a data set with the groups and grand means for math and read.

proc sql;
create table gdata as
select distinct female, (prog=2) as prog2,(prog=3) as prog3,
mean(read) as read, mean(math) as math
from hsb2;
quit;

proc print data = gdata;
run;

Obs FEMALE prog2 prog3 read math
1 0 0 0 52.23 52.645
2 0 0 1 52.23 52.645
3 0 1 0 52.23 52.645
4 1 0 0 52.23 52.645
5 1 0 1 52.23 52.645
6 1 1 0 52.23 52.645

We can use SAS proc score to generate the linear predicted values and then compute the odds or probabilities afterwards. Notice that the score procedure does not care what model we have run. It uses the estimated parameters to generate linear predictions. In our logistic regression case, the predicted values are therefore in the logit scale. In the output data set created by proc score, we have a variable called hiwrite. This is the new variable that proc score created for predicted values.

proc logistic data = hsb2 outest=mg;
class prog(ref='1') /param = ref;
model hiwrite(event='1') = female prog read math ;
run;

*Scoring the data set to get the linear predictions;

proc score data=gdata score=mg out=gpred type=parms;
var female prog2 prog3 read math;
run;

data gpred;
set gpred;
odds = exp(hiwrite);
p_1 = odds /(1+odds);
p_0 = 1 - p_1;
run;

proc print data = gpred;
run;

Obs FEMALE prog2 prog3 read math hiwrite odds p_1 p_0
1 0 0 0 52.23 52.645 0.00012 1.00012 0.50003 0.49997
2 0 0 1 52.23 52.645 -0.57747 0.56132 0.35952 0.64048
3 0 1 0 52.23 52.645 0.27223 1.31289 0.56764 0.43236
4 1 0 0 52.23 52.645 1.95774 7.08332 0.87629 0.12371
5 1 0 1 52.23 52.645 1.38016 3.97552 0.79902 0.20098
6 1 1 0 52.23 52.645 2.22986 9.29856 0.90290 0.09710

Remarks: This process will be simplified with SAS 9.0 and above with the new statement score in proc logistic. The syntax one will use looks like the following:

proc sql;
create table gdata1 as
select distinct female, ses,
mean(read) as read, mean(math) as math
from hsb2;
quit;

proc logistic data = hsb2 outmodel=mg1;
class prog(ref='1') /param = ref;
model hiwrite(event='1') = female prog read math ;
run;

proc logistic inmodel=mg1;
score data = gdata1 out=gpred1;
run;

proc print data = gpred1;
run;
________________________________________

Exact Logistic Regression

All of the models we have inspected so far require large sample sizes. When the data sets are too small or when the event occurs very infrequently, the maximum likelihood method may not work or may not provide reliable estimates. Exact logistic regression provides a way to get around these difficulties. What it does is to enumerate the exact distributions of the parameters of interest, conditional on the remaining parameters. Here is a simple example from Performing Exact Logistic Regression with the SAS System. The data set has very small cells, with each cell having only 3 observations. Let's run the exact logistic regression on this data set.

(For Data write to me, I make an apology for trouble cause to you.)

data dose;
input dose deaths total;
datalines;
0 0 3
1 0 3
2 0 3
3 0 3
4 1 3
5 2 3
;
run;

proc logistic data = dose desc;
model deaths/total = dose;
exact dose /estimate = both;
run;

Model Fit Statistics
Intercept
Intercept and Criterion Only Covariates
AIC 18.220 12.072
SC 19.111 13.853
-2 Log L 16.220 8.072

Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 8.1478 1 0.0043
Score 5.7943 1 0.0161
Wald 2.7249 1 0.0988

Analysis of Maximum Likelihood Estimates
Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -9.4745 5.5677 2.8958 0.0888
dose 1 2.0804 1.2603 2.7249 0.0988
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
dose 8.007 0.677 94.679
Exact Conditional Analysis
Conditional Exact Tests
--- p-Value ---
Effect Test Statistic Exact Mid
dose Score 5.4724 0.0245 0.0190
Probability 0.0110 0.0245 0.0190
Exact Parameter Estimates
95% Confidence
Parameter Estimate Limits p-Value
dose 1.8000 0.1157 5.8665 0.0245
Exact Odds Ratios
95% Confidence
Parameter Estimate Limits p-Value
dose 6.049 1.123 353.000 0.0245

Notice first of all that the syntax for model statement is slight different than we have seen so far. This is the syntax used for grouped data. That is we have frequencies of the events for each of the cells. This type of syntax works for both the maximum likelihood logistic regression and exact logistic regression. With estimate = both, we request that both the parameters and the odds ratios are being estimated.
________________________________________

Generalized Logits Model for Multinomial Logistic Models

Proc logistic also perform analysis on nominal response variables. Since the response variable no longer has the ordering, we can no longer fit a proportional odds model to our data. But we can fit a generalized logits model. This analysis can be done using proc catmod and that is how it is used to be done. SAS 8.2 added some new features to its proc logistic and now proc logistic does analysis on nominal responses with ease. In this section, we are going to use a data file called school used in Categorical Data Analysis Using The SAS System, by M. Stokes, C. Davis and G. Koch. We will illustrate what a generalized logits model is and how to perform an analysis using proc logistic.

data school;
input school program style $ count;
cards;
1 1 self 10
1 1 team 17
1 1 class 26
1 2 self 5
1 2 team 12
1 2 class 50
2 1 self 21
2 1 team 17
2 1 class 26
2 2 self 16
2 2 team 12
2 2 class 36
3 1 self 15
3 1 team 15
3 1 class 16
3 2 self 12
3 2 team 12
3 2 class 20
;
run;

In this data set, three different teaching styles have been implemented in teaching third grade math. School children in experimental learning settings were surveyed to determine which teaching styles they preferred. The response variable style takes three values: class, self and team. We want to determine the preference of students by their schools and programs. The programs are regular and after-school programs with 1 being regular and 2 being after-school.

In a generalized logit model, we will pick a particular category of responses as the baseline reference and compare every other category with the baseline response. In our example, we will choose team as the baseline category. Consider the probabilities:

π1 = probability of 'Preferring class',
π2 = probability of 'Preferring self',
π3 = probability of 'Preferring team'.
The generalized logits are defined as
logit(θ1) = log(π1/π3),
logit(θ2) = log(π2/π3).
The generalized logits model for our example is then defined as
logit(θi) = αi + x βi,
where i = 1 and 2 indicating the two logits. This means that we allow two different sets of regression parameters, one for each logit.

A Simple Example

We can calculate the generalized odds from the frequency table, similar to what we have done in the case of proportional odds model.

proc freq data = school;
weight count;
tables style /list chisq relrisk;
ods output OneWayFreqs = test;
run;

data test1;
set test;
godds = frequency/85;
run;

proc print data = test1;
var style frequency godds;
run;

Obs style Frequency godds
1 class 174 2.04706
2 self 79 0.92941
3 team 85 1.00000

The other way of getting the same results is to run the generalized logits model. In SAS, we can simply use proc logistic with the link = glogit option.

proc logistic data = school order = internal;
freq count;
model style = /link = glogit ;
ods output parameterestimates= odds;
run;

data odds1;
set odds;
odds = exp(estimate);
run;

proc print data = odds1;
var response estimate odds;
run;

Obs Response Estimate odds
1 class 0.7164 2.04706
2 self -0.0732 0.92941

Saturated Model Example
In this data set, there are three schools and two types of programs. That is, for each of the preference choices there are possible six cell counts. If we use both school and program and also include their interaction, we will use up all the degrees of freedom. That is we have a saturated model. This is the best model we can get, fitting each cell with its own parameter.

proc logistic data=school order = internal;
freq count;
class school program / order = data;
model style = school program school*program /link = glogit scale = none aggregate;
run;

The LOGISTIC Procedure
Model Information
Data Set WORK.SCHOOL
Response Variable style
Number of Response Levels 3
Number of Observations 18
Frequency Variable count
Sum of Frequencies 338
Model generalized logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value style Frequency
1 class 174
2 self 79
3 team 85
Logits modeled use style='team' as the reference category.
Class Level Information
Design
Class Value Variables
school 1 1 0
2 0 1
3 -1 -1
program 1 1
2 -1
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 0.0000 0 . .
Pearson 0.0000 0 . .
Number of unique profiles: 6
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 699.404 689.156
SC 707.050 735.033
-2 Log L 695.404 665.156
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 30.2480 10 0.0008
Score 28.3738 10 0.0016
Wald 25.6828 10 0.0042
Type 3 Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
school 4 14.5522 0.0057
program 2 10.4815 0.0053
school*program 4 1.7439 0.7827

We have included most parts of the output from SAS, excluding the parameter estimates. The Deviance and Pearson Goodness-of-Fit Statistics output is new here. They were requested by using option scale = none aggregate. Because our model is saturated, the goodness-of-fit statistics are zero with zero degree of freedom. We also see that the default type of coding scheme, e.g. effect coding, that proc logistic has for categorical variables. We also see that the overall effect of the interaction of school and program is not significant. This leads us to a simpler model with only the main effect.
Model With Only Main Effect

proc logistic data=school order = internal;
freq count;
class school program /order = data;
model style = school program /link = glogit scale = none aggregate;
run;

Odds Ratio Estimates
Point 95% Wald
Effect style Estimate Confidence Limits
school 1 vs 3 class 1.926 0.990 3.747
school 1 vs 3 self 0.517 0.228 1.175
school 2 vs 3 class 1.609 0.820 3.155
school 2 vs 3 self 1.276 0.620 2.626
program 1 vs 2 class 0.476 0.280 0.809
program 1 vs 2 self 1.005 0.538 1.877

We will focus on the interpretation of parameters. For example the odds ratio of class to team for program1 versus program 2 is .476. We can say that the odds for students in program 1 to choose class over team is .476 times the odds for students in program 2. Or we can say that the odds for students in program 1 to choose class over team is .524 times less than the odds for students in program 2. Similarly, we can say that the odds for students in school 1 to choose class over team is 1.926 times the odds for students in school 3. Or we can say that the odds for students in school 1 to choose class over team is .926 times more than the odds for students in school 3. It is oftentimes easier to describe in terms of probabilities. We can use the output statement to generate these probabilities as shown below.

proc logistic data=school order = internal;
freq count;
class school program ;
model style = school program / link = glogit;
output out = smodel p=prob;
run;

proc freq data = smodel;
where school = 1 or school = 2;
format prob 5.4;
tables school*program*_level_*prob /list nopercent nocum;
run;

school program _LEVEL_ prob Frequency
--------------------------------------------------
1 1 class .5371 3
1 1 self .1580 3
1 1 team .3049 3
1 2 class .7095 3
1 2 self .0989 3
1 2 team .1917 3
2 1 class .3924 3
2 1 self .3409 3
2 1 team .2667 3
2 2 class .5764 3
2 2 self .2372 3
2 2 team .1864 3
________________________________________

Proportional Odds Model for Ordinal Logistic Models

The proportional odds model is also referred as the logit version of an ordinal regression model. It extends logistic regression to handle ordinal response variables. In this section, we are going to use SAS data set ordwarm2.sas7bdat to illustrate what a proportional odds model is and how to perform a proportional odds model analysis.

Let's first take a look at the data set. This data set is taken from Regression Models For Categorical Dependent Variables Using Stata by Long and Freese. Each subject in the data set was asked to evaluate the following statement: "A working mother can establish just as warm and secure of a relationship with her child as a mother who does not work". The response is recoded in a variable called warm. It has four levels: 1 = Strongly Disagree (SD), 2 = Disagree (D), 3 = Agree (A) and 4 = Strongly Agree (SA). This will be the response variable in our analysis. Other variables in the data set include age, education level, gender of the subject, and other subject related variables.

options nocenter nodate label;
proc contents data = ordwarm2;
run;

The CONTENTS Procedure
Data Set Name: WORK.ORDWARM2 Observations: 2293
Member Type: DATA Variables: 10

-----Alphabetic List of Variables and Attributes-----
# Variable Type Len Pos Label
------------------------------------------------------------------------------
2 age Num 3 8 Age in years
3 ed Num 3 11 Years of education
5 male Num 3 17 Gender: 1=male 0=female
4 prst Num 3 14 Occupational prestige
1 warm Num 8 0 Mom can have warm relations with child
8 warmlt2 Num 3 26 1=SD; 0=D,A,SA
9 warmlt3 Num 3 29 1=SD,D; 0=A,SA
10 warmlt4 Num 3 32 1=SD,D,A; 0=SA
7 white Num 3 23 Race: 1=white 0=not white
6 yr89 Num 3 20 Survey year: 1=1989 0=1977

We are interested in building up a model to describe the relationship between the response variable warm and some of the explanatory variables, such as the age, level of education and race. Let's consider the probabilities

θ1 = π1, probability of 'Strongly Disagree',
θ2 = π1 + π2, probability of 'Strongly Disagree' or 'Disagree',
θ3 = π1 + π2 + π3, probability of 'Not Strongly Agree',

where
π1 = probability of 'Strongly Disagree',
π2 = probability of 'Disagree',
π3 = probability of 'Agree',
π4 = probability of 'Strongly Agree'.
Then we can construct the cumulative logits:
logit(θ1) = log( θ1/(1 - θ1)) = log(π1/(π2 + π3 + π4)),
logit(θ2) = log( θ2/(1 - θ2)) = log((π1 + π2)/(π3 + π4)),
logit(θ3) = log( θ3/(1 - θ3)) = log((π1 + π2 + π3))/π4).
The proportional odds model is the following:
logit(θi) = αi + xβ.

Thus we allow the intercept to be different for different cumulative logit functions, but the effect of the explanatory variables will be the same across different logit functions. That is, we allow different αs for each of the cumulative odds, but only one set of βs for all the cumulative odds. This is the proportionality assumption and this is why this type model is called proportional odds model. Also notice that although this is a model in terms of cumulative odds, we can always recover the probabilities of each response category as follows.

π1 = θ1
π2 = θ2 - θ1
π3 = θ3 - θ2
π4 = 1 - θ3

A Simple Example

We can calculate the cumulative odds from the frequency table.

proc freq data = ordwarm2;
table warm ;
ods output onewayfreqs = test (keep = warm frequency cumfrequency);
run;

data test1;
set test;
if _n_ <=3 then odds = cumfrequency /(2293-cumfrequency); run; proc print data= test1; run; Cum Obs warm Frequency Frequency odds 1 1 297 297 0.14880 2 2 723 1020 0.80126 3 3 856 1876 4.49880 4 4 417 2293 . The other way of getting the same result is to run a proportional odds model with only the intercept as a predictor. proc logistic data = ordwarm2 ; model warm = /link = clogit; ods output ParameterEstimates = model_based; run; data test2; set model_based; odds = exp(estimate); run; proc print data = test2 noobs; var variable estimate odds; run; Variable Estimate odds Intercept -1.9052 0.14880 Intercept -0.2216 0.80126 Intercept 1.5038 4.49880 An Example With Only Categorical Predictors In SAS, a proportional odds model analysis can be performed using proc logistic with the option link = clogit. Here clogit stands for cumulative logit. In this example, we are going to use only categorical predictors, white (1=white 0=not white) and male (1=male 0=female), and we will focus more on the interpretation of the regression coefficients. Our model can be written as logit(θi) = αi + β1*white + β2*male, i = 1, 2, 3. The formula for the odds is shown in the table below. For example, we can see that the odds ratio for males versus females is exp(β2) and the odds ratio for the whites versus non-whites is exp(β1). Race Gender SD vs. all other choices SD and D vs. all other choices SD, D and A vs. SA White Male exp(α1+ β1+ β2) exp(α2 + β1+ β2) exp(α3 + β1 + β2) White Female exp(α1 + β1) exp(α2 + β1) exp(α3 + β1) Non-White Male exp(α1 + β2) exp(α2 + β2) exp(α3 + β2) None-White Female exp(α1 ) exp(α2) exp(α3) proc logistic data = ordwarm2 ; model warm = white male /link = clogit; run; Response Profile Ordered Total Value warm Frequency 1 1 297 2 2 723 3 3 856 4 4 417 Probabilities modeled are cumulated over the lower Ordered Values. Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 1 -2.5550 0.1277 400.0337 <.0001 Intercept 2 1 -0.8417 0.1159 52.7347 <.0001 Intercept 3 1 0.9326 0.1167 63.8455 <.0001 white 1 0.3422 0.1163 8.6594 0.0033 male 1 0.6450 0.0774 69.5178 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits white 1.408 1.121 1.769 male 1.906 1.638 2.218 From the output above, we can say that males have 1.906 times greater odds of somewhat disagreeing with the statement as females, no matter at what level. A Example with a Continuous Predictor In this example, we are going to use a continuous predictor, age and show how to output the predicted values and how to graph them. The output statement below requests that SAS output predicted probabilities and the linear predictions and save them to a data set. Based on the proportionality assumption, we should expect that the lines for the linear predictions will be parallel to each other. proc logistic data = ordwarm2 ; model warm = age /link = clogit; output out = pred p=p xbeta=linp; run; proc sort data = pred; by age; run; goptions reset = all ; symbol i = join w=.4 ; proc gplot data = pred; plot p*age=_level_; plot linp*age=_level_; run; quit; Another Example -- Proportional Odds Assumption Test and Goodness of Fit • Proportionality Assumption In general, we can model the cumulative odds model in such as a way that each of the cumulative odds has its own regression model: logit(θi) = αi + xβi. A proportional odds model simplifies the model so that the effects of the predictors are the same across different levels. This is the main assumption of the model. We need to test this assumption. That is, we need test the hypothesis that β1 = β2 = β3. SAS proc logistic performs a score test to test this hypothesis. Let's look at the model with male and white as predictors again. proc logistic data = ordwarm2 ; model warm = white male /link = clogit; run; Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq
21.6592 4 0.0002
The p-value is really small, so we have to reject the null hypothesis of proportionality. The degrees of freedom is calculated as k*(r-2), where k is the number of predictors and r is the number of levels of response variables. In our example, we have two predictors and four levels of responses. It is not uncommon for a model not to satisfy the proportionality assumption (which is also called parallel regression assumption). When the test fails, other alternative models should be considered, such as multinomial logistic models.

• A Model with More Predictors

Now let's take a look at a model where we use white, age and ed as our predictors. We also add options scale = none aggregate to get the goodness of fit tests. The deviance and Pearson tests compare the current model with the saturated model. This test being not significant tells us our model fits the data well.

proc logistic data = ordwarm2 ;
model warm = white age ed /link = clogit scale=none aggregate;
run;

Response Profile
Ordered Total
Value warm Frequency
1 1 297
2 2 723
3 3 856
4 4 417
Probabilities modeled are cumulated over the lower Ordered Values.
Score Test for the Proportional Odds Assumption

Chi-Square DF Pr > ChiSq
10.3962 6 0.1089
Deviance and Pearson Goodness-of-Fit Statistics
Criterion DF Value Value/DF Pr > ChiSq
Deviance 2628 2523.3191 0.9602 0.9271
Pearson 2628 2588.2232 0.9849 0.7062
Number of unique profiles: 878
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 5997.541 5841.101
SC 6014.754 5875.526
-2 Log L 5991.541 5829.101
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 162.4403 3 <.0001 Score 157.6156 3 <.0001 Wald 157.7599 3 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 1 -2.0393 0.2342 75.8321 <.0001
Intercept 2 1 -0.2698 0.2294 1.3829 0.2396
Intercept 3 1 1.5397 0.2318 44.1185 <.0001
white 1 0.4376 0.1176 13.8470 0.0002
age 1 0.0179 0.00241 54.8182 <.0001
ed 1 -0.0933 0.0129 52.6988 <.0001
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
white 1.549 1.230 1.951
age 1.018 1.013 1.023
ed 0.911 0.888 0.934

28 comments:

AnonymousSeptember 17, 2012 at 10:46 PM
Hοwdy outstanԁing ωеbsіte!
Doeѕ runnіng a blog such as this rеquire a lot of work?

I've absolutely no knowledge of computer programming however I was hoping to start my own blog soon. Anyways, if you have any ideas or techniques for new blog owners please share. I understand this is off subject nevertheless I just needed to ask. Thank you!
Feel free to visit my homepage ; V2 Cig coupon code
ReplyDelete
Replies
AnonymousOctober 25, 2012 at 9:47 AM
I don't drop a lot of remarks, but after reading a great deal of remarks on "Proc Logistic and Logistic Regression Models". I actually do have a couple of questions for you if it's okay.

Is it just me or ԁo a feω of the comments come аcross aѕ if they are wгittеn bу brаіn dead folks?
:-Ρ And, if you are posting at addіtionаl
social sіteѕ, Ι'd like to keep up with anything fresh you have to post. Could you make a list of all of your social networking sites like your linkedin profile, Facebook page or twitter feed?
Feel free to surf my weblog :: v2 cigs coupon
ReplyDelete
Replies
AnonymousOctober 26, 2012 at 8:25 AM
This wеb site гeаlly has all
of the info I wanted abоut this subject and didn't know who to ask.
Have a look at my web-site : v2 discount code
ReplyDelete
Replies
AnonymousOctober 26, 2012 at 8:25 AM
Тhiѕ web site really hаs all of the info
I ωanted about this subject and didn't know who to ask.
my web page :: v2 discount code
ReplyDelete
Replies
AnonymousOctober 27, 2012 at 3:27 AM
This is my fiгst time pау a quіcκ vіsit at
here and i am reаllу hаppy to гead everthing at onе place.
Here is my weblog :: bodylastics p90x
ReplyDelete
Replies
AnonymousNovember 2, 2012 at 5:34 AM
Ӏ loνеd аs much as you'll receive carried out right here. The sketch is tasteful, your authored subject matter stylish. nonetheless, you command get bought an impatience over that you wish be delivering the following. unwell unquestionably come more formerly again as exactly the same nearly a lot often inside case you shield this hike.
Also visit my website ... Prweb.Com
ReplyDelete
Replies
AnonymousNovember 9, 2012 at 8:16 AM
Wrіte more, thats all ӏ hаve to ѕay.

Lіterаllу, it seems aѕ thοugh you геliеd оn thе
νideo to make уour poіnt. Yоu definitelу know ωhаt yοure talking abοut, why throω awау youг intellіgence on just poѕting ѵideos tο your weblog ωhen you
сοulԁ be givіng us somеthing infoгmаtiѵe tо read?
My webpage :: v2 coupons
ReplyDelete
Replies
AnonymousNovember 13, 2012 at 4:54 AM
ӏ'm gone to convey my little brother, that he should also pay a visit this blog on regular basis to obtain updated from hottest reports.
Also see my web page > iconcompanions.com
ReplyDelete
Replies
AnonymousNovember 15, 2012 at 11:01 PM
It's not my first time to pay a quick visit this web site, i am visiting this site dailly and take fastidious facts from here daily.
my website :: prweb.com
ReplyDelete
Replies
AnonymousNovember 18, 2012 at 3:40 AM
I think thiѕ is among the mоst vital infоrmаtiοn fοr mе.
Аnd i'm glad reading your article. But should remark on some general things, The website style is ideal, the articles is really great : D. Good job, cheers
Feel free to visit my site :: how much is the Flex belt
ReplyDelete
Replies
AnonymousNovember 18, 2012 at 6:48 AM
You actually make it appear so easy together with
your presentation however I in finding this matter to be really something that
I think I might by no means understand. It kind of feels too complex and very extensive for
me. I'm looking ahead on your subsequent put up, I'll attempt to get the hold of it!
Feel free to visit my blog chron.Com
ReplyDelete
Replies
AnonymousDecember 17, 2012 at 12:01 PM
Today, I wеnt to the beach front with mу κids.
I found а sea shell аnd gаvе іt to my 4 yeaг old daughter and saiԁ "You can hear the ocean if you put this to your ear." She ρut the shell to her ear and sсrеamed.
Thеrе waѕ a hermit crab inside аnd it pіnched hег ear.
She neveг wantѕ tο gо back!
LоL I know thіs іs completely off tоpic but I had to tell
ѕоmеonе!
Also visit my website : V2 Coupon
ReplyDelete
Replies
AnonymousFebruary 15, 2013 at 10:53 PM
Ѕo what can this all mean?

Аlѕo visit my web-site :: http://www.prnewswire.Com
ReplyDelete
Replies
AnonymousFebruary 18, 2013 at 12:06 PM
The belt cοuld be wοrn ωhereνeг ѕo
you сan get а amаzіng funсtion out
tаkіng a cаt nap оr washіng home.

Αlѕo viѕit mу webρage; http://www.squidoo.com/flex-belt-review-and-discount
ReplyDelete
Replies
AnonymousFebruary 19, 2013 at 1:24 AM
Wow, marvelous weblog format! Hoω lеngthy have you ever been running а blog for?
you maκe runnіng a blоg looκ easy. Τhe
entiгe glanсe of your sitе іѕ fantаstiс, as neatly aѕ the contеnt!

my wеbρаge :: ampmcomputersinc.com
Also see my web site - 2address
ReplyDelete
Replies
AnonymousFebruary 23, 2013 at 8:48 AM
Chrоmium аllows you assіstance glucose metabοliс rаtе.

Heге is my wеblog ... super beta Prostate Review
Also see my page - Super beta prostate reviews
ReplyDelete
Replies
AnonymousFebruary 26, 2013 at 1:56 PM
Make positiѵe your ab beltѕ have а distinctive toning plan.

My homeрagе - http://www.firecrew77.com/
my page - Flex belt coupon Code
ReplyDelete
Replies
AnonymousFebruary 26, 2013 at 2:38 PM
І absolutеly love your blοg.. Exсellent сolors & thеmе.
Dіd yοu create this ωeb site yourself? Pleasе reply back as ӏ'm attempting to create my very own site and would love to know where you got this from or exactly what the theme is named. Many thanks!

Also visit my web blog visit website
Also see my webpage - blog url
ReplyDelete
Replies
AnonymousMarch 20, 2013 at 10:30 AM
Hostas come in a wide variety of colors, shapes and sizes and
can be found in almost every home landscape.
Louis, a new delivery tradition has been created: the "Mulch Man. Sure, you had your occasional exception that stupidly borrowed from the local loan shark, but most learned to live on less.

My blog post :: mulching
ReplyDelete
Replies
AnonymousMarch 22, 2013 at 12:40 PM
Please let me know if you're looking for a writer for your blog. You have some really good posts and I think I would be a good asset. If you ever want to take some of the load off, I'd really
like to write some content for your blog in exchange for a link back to mine.

Please shoot me an e-mail if interested. Kudos!

My webpage - custom squeeze page design
ReplyDelete
Replies
AnonymousMarch 22, 2013 at 1:31 PM
I like the valuable info you provide in your articles.

I will bookmark your weblog and check again here regularly.
I am quite sure I'll learn a lot of new stuff right here! Best of luck for the next!

Take a look at my site: squeeze page
ReplyDelete
Replies
AnonymousMarch 22, 2013 at 1:40 PM
Excellent site you have got here.. It's hard to find quality writing like yours these days. I honestly appreciate individuals like you! Take care!!

Look into my website ... custom squeeze page design
ReplyDelete
Replies
AnonymousMarch 22, 2013 at 1:40 PM
Have you ever thought about writing an e-book or guest
authoring on other sites? I have a blog based on the same information you
discuss and would really like to have you share some stories/information.
I know my visitors would appreciate your work. If
you are even remotely interested, feel free
to shoot me an e-mail.

Feel free to surf to my blog :: rockstar squeezepage
ReplyDelete
Replies
AnonymousApril 8, 2013 at 6:33 AM
The subcontractors bidding work packages directly to the
school district. Having options when it comes to these slot machines for sale and
contractor shows, don't be afraid to get all of your fantasies. Suncoast GC is a certified professional slot machines for sale, a 100% energy star builder, and when they want to do when looking for a contractor to undertake the work. They now enter these awards for their value, design and passion to build.

Here is my blog; used slot machines for sale
ReplyDelete
Replies
AnonymousApril 15, 2013 at 11:51 PM
This can be flight simulator version FS2002, FS2004 or the latest
version of Flight simulator, FSX.
Solution Options: It was decided to use a web services simulator to simulate the exact functions of
a web services backend. In terms of sheer enjoyment, the Rollercoaster Tycoon
series leads the pack. If you are looking for some excitement beyond just flying, the
Jane's series of combat flight simulators is the best place to look. It features its own unique economy in which players can buy and sell things they make and so on, several real world companies are actually getting into the game to offer services, such as H&R Block.
ReplyDelete
Replies
AnonymousMay 15, 2013 at 5:25 PM
So what can this all imply?

my web blog :: Flex belt reviews
ReplyDelete
Replies
AnonymousMay 20, 2013 at 7:59 PM
Anybody who is prepared to get a nicely-toned stomach can use this belt.

Feel free to visit my web-site ... http://www.marsvenusatwork.Com
ReplyDelete
Replies
AnonymousJune 10, 2013 at 10:09 PM
This design is wicked! You certainly know how to keep a reader
entertained. Between your wit and your videos, I was almost moved to start my own blog (well, almost.
..HaHa!) Great job. I really enjoyed what you had to say, and more than that,
how you presented it. Too cool!

Review my homepage; trading ()
ReplyDelete
Replies

Add comment

I love to hear from you! Leave a comment.
If your question is unrelated to this article, please use my Facebook page.

Upcoming Posts

As a programmer, Sometime we ignorant or may be don’t want to give up things which were serving our purpose for a long time (Was well accepted) – I felt same when my company asked us to adopt new data standards – I understand the concept of CDISC standards but I don’t want to start because it costs more time to re-create SAS macros and understand source elements in detail to perform meaning full analysis to interpret safety and efficacy of clinical drug. I left with no option, so I decided to read and understand new data standard guidelines to make sure that I am also align with the industry and When I was underway, it’s started seeding me new ideas to make programmers job ease and effective.

I would like to bring one by one to you in upcoming posts.

I wish you happy New Year and happy blogging.

Venkata Maguluri

Friday, January 28, 2011

Proc Logistic and Logistic Regression Models

28 comments:

About Me

TRANSLATE PAGE