Recruitment Across Datasets

In this notebook, we further examine the capability of ODIF to transfer across datasets, building upon the prior FTE/BTE experiments on MNIST and Fashion-MNIST. Using the datasets found in this repo, we perform a series of experiments to evaluate the transfer efficiency and recruitment capabilities of ODIF across five different datasets. The datasets and their content are as follows: - Caltech-101: contains images of objects in 101 categories - CIFAR-10: contains 32x32 color images of objects in 10 classes - CIFAR-100: contains 32x32 color images of objects in 100 classes - Food-101: contains images of dishes in 101 categories - DTD: contains images of describable textures

[1]:
import functions.recruitacrossdatasets_functions as fn

Note: This notebook tutorial uses functions stored externally within functions/recruitacrossdatasets_functions.py to simplify presentation of code. These functions are imported above, along with other libraries.

FTE/BTE Experiment

We begin our examination of ODIF’s transfer capabilities across datasets with the FTE/BTE experiment, which provides background metrics for what the expected performance should be. This helps inform the later recruitment experiment.

Base Experiment

Import and Process Data

Let’s first import the data and perform some preprocessing so that it is in the correct format for feeding to ODIF. The following function does so for us:

[2]:
data, classes = fn.import_data(normalize=False)

Define Hyperparameters

We then define the hyperparameters to be used for the experiment: - model: model to be used for FTE/BTE experiment - num_tasks: number of tasks - num_trees: nuber of trees - reps: number of repetitions, fewer than actual figures to reduce running time

[3]:
##### MAIN HYPERPARAMS ##################
model = "odif"
num_tasks = 5
num_trees = 10
reps = 4
#########################################

Taking each dataset as a separate task, we have 5 tasks, and we also set a default of 10 trees, with the experiment being run for 30 reps.

Note, in comparison to previous FTE/BTE experiments, the lack of the num_points_per_task parameter. Here, we sample based on the label with the least number of samples and take 31 samples from each label.

Run Experiment and Plot Results

First, we call the function to run the experiment:

[4]:
accuracy_all_task = fn.ftebte_exp(
    data, classes, model, num_tasks, num_trees, reps, shift=0
)

Using the accuracies over all tasks, we can calculate the error, the forwards transfer efficiency (FTE), the backwards transfer efficiency (BTE), and the overall transfer efficiency (TE).

[5]:
err, bte, fte, te = fn.get_metrics(accuracy_all_task, num_tasks)

These results are therefore plotted using the function as follows:

[6]:
fn.plot_ftebte(num_tasks, err, bte, fte, te)
../_images/experiments_recruitment_across_datasets_14_0.png

As can be seen from above, there is generally positive forwards and backwards transfer efficiency when evaluating transfer across datasets, even though the datasets contained very different content.

Varying the Number of Trees

We were also curious how changing the number of trees would affect the results of the FTE/BTE experiment across datasets, and therefore also reran the experiment using 50 trees:

[9]:
##### MAIN HYPERPARAMS ##################
model = "odif"
num_tasks = 5
num_trees = 50
reps = 4
#########################################

Running the experiment, we find the following results:

[10]:
accuracy_all_task = fn.ftebte_exp(
    data, classes, model, num_tasks, num_trees, reps, shift=0
)
err, bte, fte, te = fn.get_metrics(accuracy_all_task, num_tasks)
fn.plot_ftebte(num_tasks, err, bte, fte, te)
../_images/experiments_recruitment_across_datasets_19_0.png

It seems as if more trees leads to lower transfer efficiency.

We use 10 trees for the remainder of the experiments to save on computing power.

Recruitment Experiment

Now that we have roughly assessed the performance of ODIF via the FTE/BTE experiment, we are also interested in which recruitment scheme works the best for this set of data.

Base Experiment

To quickly reiterate some of the background on the recruitment experiment, there are generally two main schemes for developing lifelong learning algorithms: building and reallocating. The former involves adding new resources as new data comes in, whereas the latter involves compressing current representations to make room for new ones. We want to examine whether current resources could be better leveraged by testing a range of approaches: 1. Building (default for Omnidirectional Forest): train num_trees new trees 2. Uncertainty forest: ignore all prior trees 3. Recruiting: select num_trees (out of all 450 existing trees) that perform best on the newly introduced 10th task 4. Hybrid: builds num_trees/2 new trees AND recruits num_trees/2 best-forming trees

We compare the results of these approaches based on varying training sample sizes, in the range of [1, 5, 10, 25] samples per label.

Define Hyperparameters

As always, we define the hyperparameters: - num_tasks: number of tasks - num_trees: nuber of trees - reps: number of repetitions - estimation_set: size of set used to train for the last task, as a proportion (1-estimation_set is the size of the set used for validation, aka the selection of best trees)

[11]:
############################
### Main hyperparameters ###
############################
num_tasks = 5
num_trees = 10
reps = 4
estimation_set = 0.63

Run Experiment and Plot Results

We call our experiment function and input the main hyperparameters:

[12]:
# run recruitment experiment
means, stds, last_task_sample = fn.recruitment_exp(
    data, classes, num_tasks, num_trees, reps, estimation_set, shift=0
)

And then we plot the results:

[13]:
# plot results
fn.recruitment_plot(means, stds, last_task_sample, num_tasks)
../_images/experiments_recruitment_across_datasets_27_0.png

We therefore see that though generalization error remains high on the final task, the lifelong learning algorithm still outperforms the other recruitment schemes overall.

Shifting Dataset Order

Since the above experiment involves fixing DTD as the final dataset, a further experiment involves shifting the order of datasets, so that there is a different dataset as task 5 each time. This allows us to see whether different dataset content would significantly impact the results on the final task.

To do so, we define the shift parameter in our call to the recruitment_exp function. This, in turn, calls the shift_data function, which moves the first task to the end and thus reorders the sequence of tasks.

More specifically, if we define shift=1, as done below, we would get the following order of datasets: 1. CIFAR-10 2. CIFAR-100 3. Food-101 4. DTD 5. Caltech-101

[14]:
# run recruitment experiment
means, stds, last_task_sample = fn.recruitment_exp(
    data, classes, num_tasks, num_trees, reps, estimation_set, shift=1
)
# plot results
fn.recruitment_plot(means, stds, last_task_sample, num_tasks)
../_images/experiments_recruitment_across_datasets_30_0.png

A shift=2 results in a dataset order of: 1. CIFAR-100 2. Food-101 3. DTD 4. Caltech-101 5. CIFAR-10

[15]:
# run recruitment experiment
means, stds, last_task_sample = fn.recruitment_exp(
    data, classes, num_tasks, num_trees, reps, estimation_set, shift=2
)
# plot results
fn.recruitment_plot(means, stds, last_task_sample, num_tasks)
../_images/experiments_recruitment_across_datasets_32_0.png

shift=3 gives us: 1. Food-101 2. DTD 3. Caltech-101 4. CIFAR-10 5. CIFAR-100

[16]:
# run recruitment experiment
means, stds, last_task_sample = fn.recruitment_exp(
    data, classes, num_tasks, num_trees, reps, estimation_set, shift=3
)
# plot results
fn.recruitment_plot(means, stds, last_task_sample, num_tasks)
../_images/experiments_recruitment_across_datasets_34_0.png

And finally, shift=4 yields: 1. DTD 2. Caltech-101 3. CIFAR-10 4. CIFAR-100 5. Food-101

[17]:
# run recruitment experiment
means, stds, last_task_sample = fn.recruitment_exp(
    data, classes, num_tasks, num_trees, reps, estimation_set, shift=4
)
# plot results
fn.recruitment_plot(means, stds, last_task_sample, num_tasks)
../_images/experiments_recruitment_across_datasets_36_0.png

Throughout all the above experiments, even though generalization error remains high due to the sheer amount of different labels across all the different datsets, our lifelong learning algorithm still outperforms the other recruitment methods.

Other Experiments

Effect of Normalization

When examining data across different datasets, normalization and standardization of data is often of interest. However, this can also lead to loss of information, as we are placing all the images on the same scale. As a final experiment, we also look into the effect of normalization on the FTE/BTE results.

Import and Process Data

The import_data function has a normalize parameter, where one can specify whether they want to normalize the data, normalize across the dataset, or just normalize across each image. Previously, for the original FTE/BTE experiment, we set normalize=False.

Here, we look at the other two options.

[18]:
# normalize across dataset
data1, classes1 = fn.import_data(normalize="dataset")
[19]:
# normalize across each image
data2, classes2 = fn.import_data(normalize="image")

Define Hyperparameters

We use the same parameters as before:

[20]:
##### MAIN HYPERPARAMS ##################
model = "odif"
num_tasks = 5
num_trees = 10
reps = 4
#########################################

Run Experiment and Plot Results

We first run the FTE/BTE experiment by normalizing across each dataset, such that the images in each dataset have a range of [0,1] in each channel.

[21]:
accuracy_all_task = fn.ftebte_exp(
    data1, classes1, model, num_tasks, num_trees, reps, shift=0
)
err, bte, fte, te = fn.get_metrics(accuracy_all_task, num_tasks)
fn.plot_ftebte(num_tasks, err, bte, fte, te)
../_images/experiments_recruitment_across_datasets_46_0.png

We then run the FTE/BTE experiment with normalizing per image, so that each channel in each image is scaled to a range of [0,1].

[22]:
accuracy_all_task = fn.ftebte_exp(
    data2, classes2, model, num_tasks, num_trees, reps, shift=0
)
err, bte, fte, te = fn.get_metrics(accuracy_all_task, num_tasks)
fn.plot_ftebte(num_tasks, err, bte, fte, te)
../_images/experiments_recruitment_across_datasets_48_0.png

It seems as if normalizing both across the dataset and within each image yield relatively similar results to not normalizing, so we did not perform further experiments to explore this area more at the current point in time.