FTE/BTE Experiment for MNIST & Fashion-MNIST

As an extension of the FTE/BTE experiments demonstrated on the CIFAR and food-101 datasets, we now look to examine the performance of progressive learning algorithms on the MNIST and fashion-MNIST datasets.

Due to their similarity in structure, both containing 60,000 training and 10,000 testing samples of 28x28 grayscale images, MNIST and fashion-MNIST are ideal for studying recruitment between two different datasets. We are interested in obtaining benchmarks for how inter-dataset training performs, and do so using the FTE/BTE experiment.

[3]:
import numpy as np
from tensorflow import keras

import functions.fte_bte_mnist_functions as fn

Note: This notebook tutorial uses functions stored externally within functions/fte_bte_mnist_functions.py to simplify presentation of code. These functions are imported above, along with other libraries.

Benchmark Individual Datasets

Before we compare performance between datasets, we begin by first benchmarking the individual datasets, such that we are able to compare relative performance. We run the FTE/BTE experiments on MNIST and Fashion-MNIST individually.

Import Data

First, let’s import the data. Both the MNIST and Fashion-MNIST datasets can be imported via the keras package.

[4]:
(
    (MNIST_x_train, MNIST_y_train),
    (MNIST_x_test, MNIST_y_test),
) = keras.datasets.mnist.load_data()

MNIST_x_data = np.concatenate((MNIST_x_train, MNIST_x_test))
MNIST_y_data = np.concatenate((MNIST_y_train, MNIST_y_test))
[5]:
(
    (FASHION_x_train, FASHION_y_train),
    (FASHION_x_test, FASHION_y_test),
) = keras.datasets.fashion_mnist.load_data()

FASHION_x_data = np.concatenate((FASHION_x_train, FASHION_x_test))
FASHION_y_data = np.concatenate((FASHION_y_train, FASHION_y_test))

Define Hyperparameters

Next, let’s define the hyperparameters to be used for the experiment, which are as follows: - model: model to be used for FTE/BTE experiment - num_tasks: number of tasks - num_trees: number of trees - num_points_per_task: number of samples to take from the data set for each task - reps: number of repetitions

[6]:
### MAIN HYPERPARAMS ###
model = "uf"
num_tasks = 5
num_trees = 10
num_points_per_task = 500
reps = 100
########################

By default, for the individual datasets, we are using a forest with 10 trees. From the 5 tasks, each of which contains 2 different labels, we take 500 samples randomly and run the experiment on it. This is repeated 100 times.

MNIST

First, let’s look at MNIST, which contains images of handwritten numerical digits from 0-9. Since we are using 5 tasks, each task contains data for two numbers.

We call the function to run the experiment:

[ ]:
accuracy_all_task = fn.run_experiment(
    MNIST_x_data, MNIST_y_data, num_tasks, num_points_per_task
)

Next, we calculate the accuracy over tasks, as well as the forwards transfer efficiency (FTE), backwards transfer efficiency (BTE), and the overall transfer efficiency (TE). Given these values, we can plot them as follows:

[8]:
acc, bte, fte, te = fn.calculate_results(accuracy_all_task, num_tasks)
fn.plot_results(acc, bte, fte, te)
../_images/experiments_fte_bte_mnist_14_0.png

Fashion-MNIST

Next, we do the same for Fashion-MNIST, which contains images of clothing. Each task contains randomly selected images of two pieces of clothing.

We call the function to run the experiment:

[ ]:
accuracy_all_task = fn.run_experiment(
    FASHION_x_data, FASHION_y_data, num_tasks, num_points_per_task
)

Next, we again calculate the accuracy over tasks, as well as the forwards transfer efficiency (FTE), backwards transfer efficiency (BTE), and the overall transfer efficiency (TE). Given these values, we can plot them as follows:

[12]:
acc, bte, fte, te = fn.calculate_results(accuracy_all_task, num_tasks)
fn.plot_results(acc, bte, fte, te)
../_images/experiments_fte_bte_mnist_18_0.png

FTE/BTE Between Datasets

Now that the individual datasets’ transfer capabilities have been evaluated, let’s look at how learning transfers between different datasets.

Update Hyperparameters

For this, we want to use the first dataset as the first task and the second dataset as the second task, which makes it two tasks of 10 labels each. We therefore update the hyperparameters such that num_tasks = 2:

[12]:
### MAIN HYPERPARAMS ###
model = "uf"
num_tasks = 2
num_trees = 10
num_points_per_task = 500
reps = 100
########################

Reformat Data

Since we want to train between the datasets,

[14]:
x_data = np.concatenate((FASHION_x_data, MNIST_x_data))
y_data = np.concatenate((FASHION_y_data, MNIST_y_data + 10))

MNIST -> Fashion-MNIST

Now, we run the experiment across datasets, calling the function as follows:

[ ]:
accuracy_all_task = fn.run_experiment(x_data, y_data, num_tasks, num_points_per_task)

Given the accuracies, we calculate accuracies and transfer efficiencies, and then plot the results:

[16]:
acc, bte, fte, te = fn.calculate_results(accuracy_all_task, num_tasks)
fn.plot_results(acc, bte, fte, te)
../_images/experiments_fte_bte_mnist_27_0.png