TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data
Machine learning practitioners often have access to a spectrum of data: labeled data for the target task (which is often limited), unlabeled data, and auxiliary data, the many available labeled datasets for other tasks. We describe TAGLETS, a system built to study techniques for automatically exploi...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
08-11-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Machine learning practitioners often have access to a spectrum of data:
labeled data for the target task (which is often limited), unlabeled data, and
auxiliary data, the many available labeled datasets for other tasks. We
describe TAGLETS, a system built to study techniques for automatically
exploiting all three types of data and creating high-quality, servable
classifiers. The key components of TAGLETS are: (1) auxiliary data organized
according to a knowledge graph, (2) modules encapsulating different methods for
exploiting auxiliary and unlabeled data, and (3) a distillation stage in which
the ensembled modules are combined into a servable model. We compare TAGLETS
with state-of-the-art transfer learning and semi-supervised learning methods on
four image classification tasks. Our study covers a range of settings, varying
the amount of labeled data and the semantic relatedness of the auxiliary data
to the target task. We find that the intelligent incorporation of auxiliary and
unlabeled data into multiple learning techniques enables TAGLETS to match-and
most often significantly surpass-these alternatives. TAGLETS is available as an
open-source system at github.com/BatsResearch/taglets. |
---|---|
DOI: | 10.48550/arxiv.2111.04798 |