Skip to main content

Regression Analysis with Scikit-learn (part 2 - Logistic)

This lesson is the second of two that focus on an indispensable set of data analysis methods: logistic and linear regression. Linear regression represents how one (or more) quantitative measures relate to, or predict, some other quantitative measure. Logistic regression uses a similar approach to represent how one (or more) quantitative measures relate to, or predict, a category. Depending on one’s home discipline, one might use logistic regression to do the following:

  • Explore the historical continuity of three fiction genres by comparing the accuracy of three binary logistic regression models that predict, respectively, horror fiction vs. general fiction; science fiction vs. general fiction; and crime/mystery fiction vs. general fiction
  • Analyze the degree to which the ideological leanings of U.S. Courts of Appeals predict panel decisions

The first of these examples is a good example of how logistic regression classification tends to be used in cultural analytics (in this case, literary history), and the second is more typical of how it might be used by a quantitative historian or political scientist.

Logistic and linear regression are perhaps the most widely used methods in quantitative analysis, including (but not limited to) computational history. They remain popular in part because:

  • They are extremely versatile, as the above examples suggest
  • Their performance can be evaluated with easy-to-understand metrics
  • The underlying mechanics of model predictions are accessible to human interpretation (in contrast to many ‘black box’ models)

Learning Outcomes

After completing this lesson, you will be able to:

  • Run logistic regression algorithms in Python using the Scikit-learn library
  • Validate models and assess their performance
  • Interpret the results given by logistic regression models
  • Know which common pitfalls to avoid when conducting regression analysis
Interested in learning more?

Check out this lesson on Programming Historian's website

Go to this resource

Cite as

Matthew J Lavin, Thomas Jurczyk and Rennie C Mapp (2022). Regression Analysis with Scikit-learn (part 2 - Logistic). Version 1.0.0. Edited by James Baker. ProgHist Ltd. [Training module].

Reuse conditions

Resources hosted on DARIAH-Campus are subjects to the DARIAH-Campus Training Materials Reuse Charter

Full metadata

Regression Analysis with Scikit-learn (part 2 - Logistic)
Matthew J Lavin
Social Sciences and Humanities
Content type:
Training module
CCBY 4.0
Programming Historian
Python, Data visualisation