{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Label Shuffle Experiment\n", "\n", "The progressive learning package utilizes representation ensembling algorithms to sequentially learn a representation for each task and ensemble both old and new representations for all future decisions. \n", "\n", "Here, a representation ensembling algorithm based on decision forests (Synergistic Forest) demonstrate forward and backward knowledge transfer of tasks on the CIFAR100 dataset with the labels shuffled. The experiment reproduces the benchmarking adversarial experiment ran in the paper \"Ensembling Representations for Synergistic Lifelong Learning with Quasilinear Complexity\" by Vogelstein, et al (2020). The following is a link to the aforementioned paper: https://arxiv.org/pdf/2004.12908.pdf \n", "\n", "### Import necessary packages and modules" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from tensorflow import keras\n", "from joblib import Parallel, delayed\n", "from itertools import product" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load CIFAR100 data \n", "We load the CIFAR100 dataset from Keras, and store it in a variable. The training and test partitions are concatenated into one variable called `data_x`. The data is obtained from https://keras.io/api/datasets/cifar100/ .\n", "\n", "The label shuffle experiment randomly permutes the class labels within each task from task 2 to 10, rendering each of these tasks adversarial with regard to the first task. We show through this experiment that SynF are invariant to class lable shuffling, and both demonstrate transfer. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "(X_train, y_train), (X_test, y_test) = keras.datasets.cifar100.load_data()\n", "data_x = np.concatenate([X_train, X_test])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define hyperparameters for the model and preprocess data\n", "Running the cells below will define the hyperparameters the experimental setting \n", "\n", "`num_points_per_task`: The number of points per task \n", "\n", "`shifts`: The number of data ways to split the data into train and test" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "num_points_per_task = 500\n", "\n", "shifts = 2\n", "\n", "num_slots = int(5000 // num_points_per_task)\n", "slot_fold = range(int(5000 // num_points_per_task))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This cell will preprocess the data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Reshape the data\n", "data_x = data_x.reshape(\n", " (data_x.shape[0], data_x.shape[1] * data_x.shape[2] * data_x.shape[3])\n", ")\n", "data_y = np.concatenate([y_train, y_test])\n", "data_y = data_y[:, 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train the model and perform validation\n", "\n", "#### run_parallel_exp: \n", "Wrapper method for the `label_shuffle_experiment` function which declares and trains the model, and performs validation with respect to the test data to compute the error of the model at a particular iteration\n", "\n", "`ntree`: Number of trees for Uncertainty Forest" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from functions.label_shuffle_functions import run_parallel_exp\n", "\n", "n_trees = [10] # Number of trees in SynF\n", "\n", "shift_fold = range(1, shifts, 1) # Number of shifts\n", "iterable = product(n_trees, shift_fold, slot_fold)\n", "\n", "df_list = Parallel(n_jobs=-1, verbose=0)(\n", " delayed(run_parallel_exp)(\n", " data_x, data_y, ntree, num_points_per_task, slot=slot, shift=shift\n", " )\n", " for ntree, shift, slot in iterable\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Function to calculate backward transfer efficiency\n", "\n", "The backward transfer efficiency of $f_n$ for task $t$ given $n$ samples is \n", "$$BTE^t (f_n) := \\mathbb{E} [R^t (f_n^{ 1$. Intuitively, this means that the progressive learner has used data associated with new tasks to improve performance on previous tasks. \n", "\n", "#### calc_bte:\n", "Function used to calculate bte across tasks, averaged across all shifts and folds" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from functions.label_shuffle_functions import calc_bte\n", "\n", "btes = calc_bte(df_list, num_slots, shifts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting the backward transfer efficiency\n", "Run cell to generate plot of backward transfer efficiency of the Synergistic Forest algorithm. We see that we achieve backwards transfer overall that increases as more tasks are seen.\n", "\n", "#### plot_bte:\n", "Function used to plot bte across tasks" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from functions.label_shuffle_functions import plot_bte\n", "\n", "plot_bte(btes)" ] } ], "metadata": { "interpreter": { "hash": "77d3befdf72f5c1a0d6b4996fdd6befdfb972b784410fca14e27e6ae1841315c" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }