Sunday, June 2, 2013

Logistic Regression in PROC MODEL

PROC MODEL in SAS/ETS provides an open interface for fitting many different types of linear and nonlinear models.  PROC MODEL allows users to simulate from fitted models and allows users to register models with SAS Risk Dimensions (PROC RISK et al).  Risk Dimensions is SAS's market and credit risk analysis package used by many large banks and trading organisations.

Being a user of both ETS and occasional user of Risk Dimensions, I've often wanted to incorporate logistic models into my simulations.  It is possible to fit the logistic regressions through numerous PROCs in SAS/STAT and incorporate the fitted parameters into MODEL or RISK statements.  It would be nice to be able to fit a logistic regression in PROC MODEL to save the macro programming hoops necessary to fit in something else and incorporate in MODEL.

First there are two resources I used for this example.

  1. Wikipedia on Logistic Regression
  2. A well written paper by Scott A. Czepiel
Now, let's generate some binomial data
data test;
a = 5;
b = 10;

do i=1 to 10000;
       rand = rannor(123);
       xb = a + b*rand ;
       p = 1 / (1+ exp(-xb));
       if ranuni(123) < p then
              y = 1;
       else
              y = 0;
       output;
end;
run;
Next, let's fit the model in PROC LOGISTIC
proc logistic data=test descending;
model y = rand;
run;
The interesting part of the output:
                                      The LOGISTIC Procedure

                            Analysis of Maximum Likelihood Estimates

                                              Standard          Wald
               Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

               Intercept     1      4.9155      0.1517     1050.0422        <.0001
               rand          1      9.7196      0.2896     1126.4620        <.0001
The fitted values are within reason of the true values.

Now let's use the eq 9. on page 5 of the Czepiel paper.  Also of note is the documentation on the general likelihood function in PROC MODEL for use in ML estimation.
proc model data=test;
parameters a b;
reg = a + b*rand;
y = 1 / (1 + exp(-reg));

llik = -(y*reg - log(1 + exp(reg)));

errormodel y ~ general(llik);

fit y / fiml ;
quit;
First, we define a function, "reg," which is our regression equation.  Next, we define the equation for y.  Using Eq. 9, we define the likelihood function.  Note, PROC MODEL uses a minimization of the sum of the likelihood function, so we negate the value.

The key is to now define the ERRORMODEL for y as a general likelihood function.  The FIT statement with the "/ fiml" option tell SAS to use a ML estimation.  Because the general likelihood function is defined, this is used for the fitting.

Here is the relevant output from PROC MODEL:
                                       The MODEL Procedure

                         Nonlinear Liklhood Summary of Residual Errors

                       DF       DF                                                        Adj
    Equation        Model    Error         SSE         MSE    Root MSE    R-Square       R-Sq

    y                   2     9998       359.5      0.0360      0.1896      0.8335     0.8335


                              Nonlinear Liklhood Parameter Estimates

                                                 Approx                  Approx
                   Parameter       Estimate     Std Err    t Value     Pr > |t|

                   a               4.915868      0.1517      32.42       <.0001
                   b               9.719905      0.2895      33.57       <.0001


                      Number of Observations       Statistics for System

                      Used             10000    Log Likelihood        -1179
                      Missing              0

2 comments:

  1. Thanks for the helpful post. In the first data step, why are you using a random uniform variable as the cutoff point, why not using 0.5? that is, instead of writing
    if ranuni(123) < p then ....
    why can't we write the following?
    if 0.5 < p then ....

    ReplyDelete
  2. Logostic regression in proc at nice posts thanku for sharing..
    ]
    Hadoop training in hyderabad.All the basic and get the full knowledge of hadoop.
    hadoop training in hyderabad

    ReplyDelete