How to Run UncertaintyForest¶
This set of four tutorials (
uncertaintyforest_mutualinformationestimates.ipynb) will explain the UncertaintyForest class. After following these tutorials, you should have the ability to run UncertaintyForest on your own machine and generate Figures 1, 2, and 3 from this paper, which help you to
visualize a comparison of the estimated posteriors and conditional entropy values for several different algorithms.
If you haven’t seen it already, take a look at other tutorials to setup and install the ProgLearn package:
Goal: Train the UncertaintyForest classifier on some training data and produce a metric of accuracy on some test data
Import required packages and set parameters for the forest¶
from proglearn.forest import UncertaintyForest from proglearn.sims import generate_gaussian_parity
# Real Params. n_train = 10000 # number of training data points n_test = 1000 # number of testing data points num_trials = 10 # number of trials n_estimators = 100 # number of estimators
We’ve done a lot. Can we just run it now? Yes!¶
Create and train our UncertaintyForest¶
First, generate our data:
X, y = generate_gaussian_parity(n_train + n_test)
Now, split that data into training and testing data. We don’t want to accidently train on our test data.
X_train = X[ 0:n_train ] # Takes the first n_train number of data points and saves as X_train y_train = y[0:n_train] # same as above for the labels X_test = X[ n_train: ] # Takes the remainder of the data (n_test data points) and saves as X_test y_test = y[n_train:] # same as above for the labels
Then, create our forest:
UF = UncertaintyForest(n_estimators=n_estimators)
Then fit our learner:
<proglearn.forest.UncertaintyForest at 0x10ce8ca58>
Well, we’re done. Exciting right?
Produce a metric of accuracy for our learner¶
We’ve now created our learner and trained it. But to actually show if what we did is effective at predicting the class labels of the data, we’ll create some test data (with the same distribution as the train data) and see if we classify it correctly.
X_test, y_test = generate_gaussian_parity(n_test) # creates the test data
predictions = UF.predict(X_test) # predict the class labels of the test data
To see the learner’s accuracy, we’ll now compare the predictions with the actual test data labels. We’ll find the number correct and divide by the number of data.
accuracy = sum(predictions == y_test) / n_test
And, let’s take a look at our accuracy:
Ta-da. That’s an uncertainty forest at work.
See metrics on the power of UncertaintyForest by generating Figures 1 and 2 from this paper.
To do this, check out