{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# How to Run UncertaintyForest\n", "\n", "This set of four tutorials (`uncertaintyforest_running_example.ipynb`, `uncertaintyforest_posteriorestimates.ipynb`, `uncertaintyforest_conditionalentropyestimates.ipynb`, and `uncertaintyforest_mutualinformationestimates.ipynb`) will explain the UncertaintyForest class. After following these tutorials, you should have the ability to run UncertaintyForest on your own machine and generate Figures 1, 2, and 3 from [this paper](https://arxiv.org/pdf/1907.00325.pdf), which help you to visualize a comparison of the estimated posteriors and conditional entropy values for several different algorithms.\n", "\n", "If you haven't seen it already, take a look at other tutorials to setup and install the ProgLearn package: `installation_guide.ipynb`.\n", "\n", "*Goal: Train the UncertaintyForest classifier on some training data and produce a metric of accuracy on some test data*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import required packages and set parameters for the forest " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from proglearn.forest import UncertaintyForest\n", "from proglearn.sims import generate_gaussian_parity" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Real Params.\n", "n_train = 10000 # number of training data points\n", "n_test = 1000 # number of testing data points\n", "num_trials = 10 # number of trials\n", "n_estimators = 100 # number of estimators" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### We've done a lot. Can we just run it now? Yes!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create and train our UncertaintyForest \n", "First, generate our data:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "X, y = generate_gaussian_parity(n_train + n_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, split that data into training and testing data. We don't want to accidently train on our test data." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "X_train = X[\n", " 0:n_train\n", "] # Takes the first n_train number of data points and saves as X_train\n", "y_train = y[0:n_train] # same as above for the labels\n", "X_test = X[\n", " n_train:\n", "] # Takes the remainder of the data (n_test data points) and saves as X_test\n", "y_test = y[n_train:] # same as above for the labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, create our forest:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "UF = UncertaintyForest(n_estimators=n_estimators)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then fit our learner:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "UF.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, we're done. Exciting right?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Produce a metric of accuracy for our learner\n", "We've now created our learner and trained it. But to actually show if what we did is effective at predicting the class labels of the data, we'll create some test data (with the same distribution as the train data) and see if we classify it correctly." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "X_test, y_test = generate_gaussian_parity(n_test) # creates the test data" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "predictions = UF.predict(X_test) # predict the class labels of the test data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see the learner's accuracy, we'll now compare the predictions with the actual test data labels. We'll find the number correct and divide by the number of data." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "accuracy = sum(predictions == y_test) / n_test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And, let's take a look at our accuracy:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.933\n" ] } ], "source": [ "print(accuracy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ta-da. That's an uncertainty forest at work. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What's next?\n", "\n", "See metrics on the power of UncertaintyForest by generating Figures 1 and 2 from [this paper](https://arxiv.org/pdf/1907.00325.pdf).\n", "\n", "To do this, check out `uncertaintyforest_fig1.ipynb` and `uncertaintyforest_fig2.ipynb`." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }