Analyzing the UncertaintyForest Class by Reproducing Mutual Information Estimates

This set of four tutorials (uncertaintyforest_running_example.ipynb, uncertaintyforest_posteriorestimates.ipynb, uncertaintyforest_conditionalentropyestimates.ipynb, and uncertaintyforest_mutualinformationestimates.ipynb) will explain the UncertaintyForest class. After following these tutorials, you should have the ability to run UncertaintyForest on your own machine and generate Figures 1, 2, and 3 from this paper, which help you to visualize a comparison of the estimated posteriors and conditional entropy values for several different algorithms.

If you haven’t seen it already, take a look at other tutorials to setup and install the ProgLearn package: installation_guide.ipynb.

Goal: Run the UncertaintyForest class to produce a figure that compares estimated normalized mutual information values for the UncertaintyForest, KSG, Mixed KSG, and IRF algorithms, as in Figure 3 fromthis paper

Import Required Packages

import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import CalibratedClassifierCV

from proglearn.forest import UncertaintyForest
from functions.unc_forest_tutorials_functions import plot_setting, plot_fig3
# Setting figures.
settings = [
        "name": "Spherical Gaussians",
        "kwargs": {},
        "name": "Elliptical Gaussians",
        "kwargs": {"var1": 3},
        "name": "Three Class Gaussians",
        "kwargs": {"three_class": True},
# Plot data.
fig, axes = plt.subplots(1, len(settings), figsize=(18, 4))
for i, setting in enumerate(settings):
    plot_setting(2000, setting, axes[i])
<Figure size 432x288 with 0 Axes>

Specify Parameters

# The following are two sets of parameters.
# The first are those that were actually used to produce Figure 3.
# Below those, you'll find some scaled-down parameters so that you can see the results more quickly.

# Here are the paper reproduction parameters
# n = 6000
# mus = range(5)
# ds = range(1, 16)
# mu = 1
# num_trials = 20
# d = 2
# pis = [0.05 * i for i in range(1, 20)]

# Here are the scaled-down tutorial parameters
n = 400  # number of samples
mus = range(3)  # range of means
ds = range(2, 5)  # range of dimensions
mu = 1  # mean
num_trials = 3  # number of trials to run
d = 1  # dimension
pis = [0.05 * i for i in range(3, 6)]  # prior distribution

Specify Learners

Now, we’ll specify which learners we’ll compare (by label). Figure 3 uses four different learners, which are further specified in the function estimate_mi, which returns estimates of mutual information for a given dataset (X, y) and type of learner.

# Algorithms used to produce Figure 3
algos = [
        "label": "IRF",
        "title": "Isotonic Reg. Forest",
        "color": "#fdae61",
        "label": "KSG",
        "title": "KSG",
        "color": "#1b9e77",
        "label": "Mixed KSG",
        "title": "Mixed KSG",
        "color": "purple",
        "label": "UF",
        "title": "Uncertainty Forest",
        "color": "#F41711",

parallel = False

Plot Figure 3

Finally, we’ll run the code to obtain and plot the spherical, elliptical, and three class Gaussians, as well as estimated mutual information vs. class priors and dimensionality (9 subplots).

plot_fig3(algos, n, d, mu, settings, pis, ds, num_trials, parallel=parallel)