

Solving the HAHA challenge¶

This script runs an instance of AutoML in the HAHA 2019 challenge. The full source code can be found here.

The dataset used is:

Dataset	URL
HAHA 2019	https://www.fing.edu.uy/inco/grupos/pln/haha/index.html#data

Experimentation parameters¶

This experiment was run with the following parameters:

Parameter	Value
Total epochs	1
Maximum iterations	10000
Timeout per pipeline	30 min
Global timeout	-
Max RAM per pipeline	20 GB
Population size	50
Selection (k-best)	10
Early stop	-

The experiments were run in the following hardware configurations (allocated indistinctively according to available resources):

Config	CPU	Cache	Memory	HDD
A	12 core Intel Xeon Gold 6126	19712 KB	191927.2MB	999.7GB
B	6 core Intel Xeon E5-1650 v3	15360 KB	32045.5MB	2500.5GB
C	Quad core Intel Core i7-2600	8192 KB	15917.1MB	1480.3GB

Note

The hardware configuration details were extracted with inxi -CmD and summarized.

Relevant imports¶

Most of this example follows the same logic as the UCI example. First the necessary imports

from autogoal.ml import AutoML
from autogoal.datasets import haha
from autogoal.search import (
    PESearch,
    RichLogger,
)
from autogoal.kb import Seq, Sentence, VectorCategorical, Supervised
from autogoal.contrib import find_classes
from sklearn.metrics import f1_score

Next, we parse the command line arguments to configure the experiment.

Parsing arguments¶

The default values are the ones used for the experimentation reported in the paper.

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--iterations", type=int, default=10000)
parser.add_argument("--timeout", type=int, default=60)
parser.add_argument("--memory", type=int, default=2)
parser.add_argument("--popsize", type=int, default=50)
parser.add_argument("--selection", type=int, default=10)
parser.add_argument("--global-timeout", type=int, default=None)
parser.add_argument("--examples", type=int, default=None)
parser.add_argument("--token", default=None)
parser.add_argument("--channel", default=None)

args = parser.parse_args()

print(args)

The next line will print all the algorithms that AutoGOAL found in the contrib library, i.e., anything that could be potentially used to solve an AutoML problem.

for cls in find_classes():
    print("Using: %s" % cls.__name__)

Experimentation¶

Instantiate the classifier. Note that the input and output types here are defined to match the problem statement, i.e., text classification.

classifier = AutoML(
    search_algorithm=PESearch,
    input=(Seq[Sentence], Supervised[VectorCategorical]),
    output=VectorCategorical,
    search_iterations=args.iterations,
    score_metric=f1_score,
    errors="warn",
    pop_size=args.popsize,
    search_timeout=args.global_timeout,
    evaluation_timeout=args.timeout,
    memory_limit=args.memory * 1024 ** 3,
)

loggers = [RichLogger()]

if args.token:
    from autogoal.contrib.telegram import TelegramLogger

    telegram = TelegramLogger(token=args.token, name=f"HAHA", channel=args.channel,)
    loggers.append(telegram)

Finally, loading the HAHA dataset, running the AutoML instance, and printing the results.

X_train, y_train, X_test, y_test = haha.load(max_examples=args.examples)

classifier.fit(X_train, y_train, logger=loggers)
score = classifier.score(X_test, y_test)

print(score)