How to Run UncertaintyForest

This set of four tutorials (uncertaintyforest_running_example.ipynb, uncertaintyforest_posteriorestimates.ipynb, uncertaintyforest_conditionalentropyestimates.ipynb, and uncertaintyforest_mutualinformationestimates.ipynb) will explain the UncertaintyForest class. After following these tutorials, you should have the ability to run UncertaintyForest on your own machine and generate Figures 1, 2, and 3 from this paper, which help you to visualize a comparison of the estimated posteriors and conditional entropy values for several different algorithms.

If you haven’t seen it already, take a look at other tutorials to setup and install the ProgLearn package: installation_guide.ipynb.

Goal: Train the UncertaintyForest classifier on some training data and produce a metric of accuracy on some test data

Import required packages and set parameters for the forest

[1]:
from proglearn.forest import UncertaintyForest
from proglearn.sims import generate_gaussian_parity
[2]:
# Real Params.
n_train = 10000  # number of training data points
n_test = 1000  # number of testing data points
num_trials = 10  # number of trials
n_estimators = 100  # number of estimators

We’ve done a lot. Can we just run it now? Yes!

Create and train our UncertaintyForest

First, generate our data:

[3]:
X, y = generate_gaussian_parity(n_train + n_test)

Now, split that data into training and testing data. We don’t want to accidently train on our test data.

[4]:
X_train = X[
    0:n_train
]  # Takes the first n_train number of data points and saves as X_train
y_train = y[0:n_train]  # same as above for the labels
X_test = X[
    n_train:
]  # Takes the remainder of the data (n_test data points) and saves as X_test
y_test = y[n_train:]  # same as above for the labels

Then, create our forest:

[5]:
UF = UncertaintyForest(n_estimators=n_estimators)

Then fit our learner:

[6]:
UF.fit(X_train, y_train)
[6]:
<proglearn.forest.UncertaintyForest at 0x10ce8ca58>

Well, we’re done. Exciting right?

Produce a metric of accuracy for our learner

We’ve now created our learner and trained it. But to actually show if what we did is effective at predicting the class labels of the data, we’ll create some test data (with the same distribution as the train data) and see if we classify it correctly.

[7]:
X_test, y_test = generate_gaussian_parity(n_test)  # creates the test data
[8]:
predictions = UF.predict(X_test)  # predict the class labels of the test data

To see the learner’s accuracy, we’ll now compare the predictions with the actual test data labels. We’ll find the number correct and divide by the number of data.

[9]:
accuracy = sum(predictions == y_test) / n_test

And, let’s take a look at our accuracy:

[10]:
print(accuracy)
0.933

Ta-da. That’s an uncertainty forest at work.

What’s next?

See metrics on the power of UncertaintyForest by generating Figures 1 and 2 from this paper.

To do this, check out uncertaintyforest_fig1.ipynb and uncertaintyforest_fig2.ipynb.