Building the Logistic Regression model
Statsmodels is a Python module that provides various functions for estimating different statistical models and performing statistical tests
- First, we define the set of dependent(y) and independent(X) variables. If the dependent variable is in non-numeric form, it is first converted to numeric using dummies. The file used in the example for training the model, can be downloaded here.
- Statsmodels provides a Logit() function for performing logistic regression. The Logit() function accepts y and X as parameters and returns the Logit object. The model is then fitted to the data.
Python3
# importing libraries import statsmodels.api as sm import pandas as pd # loading the training dataset df = pd.read_csv( 'logit_train1.csv' , index_col = 0 ) # defining the dependent and independent variables Xtrain = df[[ 'gmat' , 'gpa' , 'work_experience' ]] ytrain = df[[ 'admitted' ]] # building the model and fitting the data log_reg = sm.Logit(ytrain, Xtrain).fit() |
Output :
Optimization terminated successfully. Current function value: 0.352707 Iterations 8
In the output, ‘Iterations‘ refer to the number of times the model iterates over the data, trying to optimize the model. By default, the maximum number of iterations performed is 35, after which the optimization fails.
Logistic Regression using Statsmodels
Prerequisite: Understanding Logistic Regression
Logistic regression is the type of regression analysis used to find the probability of a certain event occurring. It is the best suited type of regression for cases where we have a categorical dependent variable which can take only discrete values.
The dataset :
In this article, we will predict whether a student will be admitted to a particular college, based on their gmat, gpa scores and work experience. The dependent variable here is a Binary Logistic variable, which is expected to take strictly one of two forms i.e., admitted or not admitted.