I have previously been exploring uncertainty measures that we can build into our machine learning models, making it easier to see whether a concrete prediction can be trusted or not. This involved confidence intervals for datasets and prediction intervals for models; see the previous posts in this series for a more in-depth treatment of all of these.

I have been getting many people contacting me about implementations of these methods, as it is still somewhat of a hassle to implement these methods if we simply have a model or a dataset at hand, and we just want some quick uncertainty estimates. This led to me to develop the Python library doubt, which aims to make this process as easy as possible. In this post I will cover a few common use cases of the library and attempt to convince you that the step from only having point predictions to also having uncertainty bounds do not have to be complicated.

This post is part of my series on quantifying uncertainty:

  1. Confidence intervals
  2. Parametric prediction intervals
  3. Bootstrap prediction intervals
  4. Quantile regression
  5. Quantile regression forests
  6. Doubt

Setting up

Installing the library is as simple as most Python libraries. You simply write

pip install doubt

in your favorite terminal, and you are good to go!

Prelude: doubt.datasets

Throughout this demo post, I will be using different test datasets coming from the real world, all of which have also been implemented in the doubt library as well. These all have the same uniform API. We load in the dataset as follows, here using the FacebookComments dataset:

>>> from doubt.datasets import FacebookComments
>>> dataset = FacebookComments()
>>> dataset.shape
(199030, 54)

To see more information about the individual dataset, simply use the help function:

>>> help(dataset)
class FacebookComments(doubt.datasets._dataset.BaseDataset)
 |  FacebookComments(cache: Union[str, NoneType] = '.dataset_cache')
 |
 |  Instances in this dataset contain features extracted from Facebook posts.
 |  The task associated with the data is to predict how many comments the
 |  post will receive.
 |
 |
 |  Parameters:
 |      cache (str or None, optional):
 |          The name of the cache. It will be saved to `cache` in the
 |          current working directory. If None then no cache will be saved.
 |          Defaults to '.dataset_cache'.
 |
 |  Attributes:
 |      shape (tuple of integers):
 |          Dimensions of the data set
 |      columns (list of strings):
 |          List of column names in the data set
 |
 |  Class attributes:
 |      url (string):
 |          The url where the raw data files can be downloaded
 |      feats (iterable):
 |          The column indices of the feature variables
 |      trgts (iterable):
 |          The column indices of the target variables
 |
 |  Features:
 |      page_popularity (int):
 |          Defines the popularity of support for the source of the document
 |      (...)
 |
 |  Targets:
 |      ncomments (int): The number of comments in the next `h_local` hours
 |
 |  Source:
 |      https://archive.ics.uci.edu/ml/datasets/Facebook+Comment+Volume+Dataset

To split the dataset into a feature matrix and a target vector, use the split method, which also allows splitting into train/test sets:

>>> X, y = dataset.split()
>>> X.shape, y.shape
((199030, 54), (199030,))
>>>
>>> X_train, X_test, y_train, y_test = dataset.split(test_size=0.1)
>>> X_train.shape, X_test.shape, y_train.shape, y_test.shape
((179035, 54), (19995, 54), (179035,), (19995,))

Uncertainty estimates from an existing model

A common scenario is when you have an existing model, usually an off-the-shelf model from scikit-learn, and you would like to produce good uncertainty bounds around your predictions. This would allow you to be more helpful to your clients, who might not be interested in the point estimate, but moreso on roughly what they should expect to happen.

The doubt library handles this using bootstrapping methods, as described in my previous post.

Normally our pipeline would look as follows:

>>> from doubt.datasets import PowerPlant
>>> from sklearn.linear_model import LinearRegression
>>>
>>> model = LinearRegression()
>>> model.fit(X, y)
>>> model.predict([10, 30, 1000, 50])
481.9203102126274

We only have to change that to the following:

>>> from doubt.datasets import PowerPlant
>>> from sklearn.linear_model import LinearRegression
>>> from doubt import Boot
>>>
>>> model = Boot(LinearRegression())
>>> model.fit(X, y)
>>> model.predict([10, 30, 1000, 50], uncertainty=0.05)
(481.9203102126274, array([473.43314309, 490.0313962 ]))

Here the uncertainty parameter denotes how much uncertainty we allow in our estimates, so that an uncertainty being 0.0 would mean that our prediction interval had to span all the possible values, and an uncertainty of 1.0 would give us a point estimate. A common value is 0.05, giving us the traditional 95% prediction interval.

Note that predictions will take longer, as the Boot wrapper computes many predictions instead of a single one, to be able to measure the uncertainty. By default it produces $\sqrt{N}$ predictions, where $N$ is the number of samples in the dataset. But this can be manually set by changing the n_boots parameter in the predict method:

>>> model.predict([10, 30, 1000, 50], uncertainty=0.05, n_boots=3)
(482.09909346090336, array([473.68305016, 490.16338123]))

Uncertainty estimates with random forests

The above bootstrapping methods works really well, but for ensemble models like random forests the predictions become prohibitively slow. This is because of the sheer amount of predictions that need to be calculated: if the forest consists of 100 decision trees, and we are producing 100 bootstrapped predicitions, we suddenly have to compute 10,000 predictions just to get a single prediction out.

An alternative way faster method is to use quantile regression forests. These only need to compute a single prediction for every decision tree in the forest, just like normally. But here the idea is that the uncertainties are based on the predictions present in the leaf nodes. To read more about this, see my previous post.

Note however that this model requires multiple predictions to be present in each leaf node, meaning that to get optimal prediction intervals we need to enforce this by limiting the size of the tree. Here we do that by limiting the amount of leaf nodes:

>>> from doubt import QuantileRegressionForest
>>> from doubt.datasets import Concrete
>>> import numpy as np
>>>
>>> X, y = Concrete().split()
>>> model = QuantileRegressionForest(max_leaf_nodes=8)
>>> model.fit(X, y)
>>> model.predict(np.ones(8), uncertainty=0.25)
(16.933590347847982, array([ 8.93456428, 26.0664534 ]))

Linear quantile regression

More classical quantile regression methods are also available in doubt, wrapping the corresponding model from the excellent statsmodels library. The procedure is the same as above:

>>> from doubt import QuantileLinearRegression
>>> from doubt.datasets import ForestFire
>>>
>>> X, y = ForestFire().split()
>>> model = QuantileLinearRegression(uncertainty=0.05)
>>> model.fit(X, y)
>>> model.predict([7, 5, 2, 4, 80, 20, 90, 5, 8, 50, 7, 0])
(7.342714665649355, array([2.27209153e-08, 6.38229624e+01]))

Note that a mean difference with the quantile regression model is that we have to include the uncertainty parameter in the constructor and not in the predict metod. This is simply because the model needs to be fitted to a specific uncertainty, and will need to be re-fitted if the uncertainty changes.

Future development

There are several features I would like to implement in the doubt library.

Firstly, implementing prediction intervals for classification tasks. This is not as straightforward as simply using the probabilities as a regression task, as the target values are the rounded probabilities, skewing the residuals.

Secondly, I would like to include support for neural networks. The bootstrapped methods still work, but inference is very time-consuming. I have previously covered quantile neural networks, but my implementation of that seems quite fragile, and a more robust version of that would be useful.