latest

# Statsmodels

`statsmodels` is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at statsmodels.org.

## Minimal Examples

Since version `0.5.0` of `statsmodels`, you can use R-style formulas together with `pandas` data frames to fit your models. Here is a simple example using ordinary least squares:

``````In [1]: import numpy as np

In [2]: import statsmodels.api as sm

In [3]: import statsmodels.formula.api as smf

In [4]: dat = sm.datasets.get_rdataset("Guerry", "HistData").data

# Fit regression model (using the natural log of one of the regressors)
In [5]: results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()

# Inspect the results
In [6]: print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable:                Lottery   R-squared:                       0.348
Method:                 Least Squares   F-statistic:                     22.20
Date:                Mon, 14 May 2018   Prob (F-statistic):           1.90e-08
Time:                        21:48:09   Log-Likelihood:                -379.82
No. Observations:                  86   AIC:                             765.6
Df Residuals:                      83   BIC:                             773.0
Df Model:                           2
Covariance Type:            nonrobust
===================================================================================
coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept         246.4341     35.233      6.995      0.000     176.358     316.510
Literacy           -0.4889      0.128     -3.832      0.000      -0.743      -0.235
np.log(Pop1831)   -31.3114      5.977     -5.239      0.000     -43.199     -19.424
==============================================================================
Omnibus:                        3.713   Durbin-Watson:                   2.019
Prob(Omnibus):                  0.156   Jarque-Bera (JB):                3.394
Skew:                          -0.487   Prob(JB):                        0.183
Kurtosis:                       3.003   Cond. No.                         702.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
``````

You can also use `numpy` arrays instead of formulas:

``````In [7]: import numpy as np

In [8]: import statsmodels.api as sm

# Generate artificial data (2 regressors + constant)
In [9]: nobs = 100

In [10]: X = np.random.random((nobs, 2))

In [12]: beta = [1, .1, .5]

In [13]: e = np.random.random(nobs)

In [14]: y = np.dot(X, beta) + e

# Fit regression model
In [15]: results = sm.OLS(y, X).fit()

# Inspect the results
In [16]: print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.183
Method:                 Least Squares   F-statistic:                     10.83
Date:                Mon, 14 May 2018   Prob (F-statistic):           5.68e-05
Time:                        21:48:10   Log-Likelihood:                -23.528
No. Observations:                 100   AIC:                             53.06
Df Residuals:                      97   BIC:                             60.87
Df Model:                           2
Covariance Type:            nonrobust
==============================================================================
coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.4355      0.081     17.716      0.000       1.275       1.596
x1             0.2664      0.101      2.650      0.009       0.067       0.466
x2             0.4224      0.116      3.635      0.000       0.192       0.653
==============================================================================
Omnibus:                       75.567   Durbin-Watson:                   2.054
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                7.752
Skew:                           0.065   Prob(JB):                       0.0207
Kurtosis:                       1.642   Cond. No.                         5.32
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
``````

Have a look at `dir(results)` to see available results. Attributes are described in `results.__doc__` and results methods have their own docstrings.

## Citation

When using statsmodels in scientific publication, please consider using the following citation:

Seabold, Skipper, and Josef Perktold. “Statsmodels: Econometric and statistical modeling with python.Proceedings of the 9th Python in Science Conference. 2010.

Bibtex entry:

``````@inproceedings{seabold2010statsmodels,
title={Statsmodels: Econometric and statistical modeling with python},
author={Seabold, Skipper and Perktold, Josef},
booktitle={9th Python in Science Conference},
year={2010},
}
``````