Building the Logistic Regression model

Statsmodels is a Python module that provides various functions for estimating different statistical models and performing statistical tests  

  • First, we define the set of dependent(y) and independent(X) variables. If the dependent variable is in non-numeric form, it is first converted to numeric using dummies. The file used in the example for training the model, can be downloaded here.
  • Statsmodels provides a Logit() function for performing logistic regression. The Logit() function accepts y and X as parameters and returns the Logit object. The model is then fitted to the data.

Python3




# importing libraries
import statsmodels.api as sm
import pandas as pd 
  
# loading the training dataset 
df = pd.read_csv('logit_train1.csv', index_col = 0)
  
# defining the dependent and independent variables
Xtrain = df[['gmat', 'gpa', 'work_experience']]
ytrain = df[['admitted']]
   
# building the model and fitting the data
log_reg = sm.Logit(ytrain, Xtrain).fit()


Output : 

Optimization terminated successfully.
         Current function value: 0.352707
         Iterations 8

In the output, ‘Iterations‘ refer to the number of times the model iterates over the data, trying to optimize the model. By default, the maximum number of iterations performed is 35, after which the optimization fails.

Logistic Regression using Statsmodels

Prerequisite: Understanding Logistic Regression
Logistic regression is the type of regression analysis used to find the probability of a certain event occurring. It is the best suited type of regression for cases where we have a categorical dependent variable which can take only discrete values. 

The dataset : 
In this article, we will predict whether a student will be admitted to a particular college, based on their gmat, gpa scores and work experience. The dependent variable here is a Binary Logistic variable, which is expected to take strictly one of two forms i.e., admitted or not admitted

Similar Reads

Building the Logistic Regression model :

Statsmodels is a Python module that provides various functions for estimating different statistical models and performing statistical tests...

The summary table :

...

Predicting on New Data :

The summary table below gives us a descriptive summary about the regression results....