Search Results - "Shvetsova, Nina"
-
1
Anomaly Detection in Medical Imaging With Deep Perceptual Autoencoders
Published in IEEE access (2021)“…Anomaly detection is the problem of recognizing abnormal inputs based on the seen examples of normal data. Despite recent advances of deep learning in…”
Get full text
Journal Article -
2
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval
Published in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2022)“…Multi-modal learning from video data has seen increased attention recently as it allows training of semantically meaningful embeddings without human…”
Get full text
Conference Proceeding -
3
MOOD 2020: A Public Benchmark for Out-of-Distribution Detection and Localization on Medical Images
Published in IEEE transactions on medical imaging (01-10-2022)“…Detecting Out-of-Distribution (OoD) data is one of the greatest challenges in safe and robust deployment of machine learning algorithms in medicine. When the…”
Get full text
Journal Article -
4
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Published in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (04-06-2023)“…Multilingual text-video retrieval methods have improved significantly in recent years, but the performance for languages other than English still lags. We…”
Get full text
Conference Proceeding -
5
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Published 16-09-2023“…Large-scale noisy web image-text datasets have been proven to be efficient for learning robust vision-language models. However, when transferring them to the…”
Get full text
Journal Article -
6
Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Published 05-01-2023“…Contrastive learning has become an important tool in learning representations from unlabeled data mainly relying on the idea of minimizing distance between…”
Get full text
Journal Article -
7
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Published 12-09-2022“…Vision-language models trained on large, randomly collected data had significant impact in many areas since they appeared. But as they show great performance…”
Get full text
Journal Article -
8
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Published 07-10-2023“…Instructional videos are a common source for learning text-video or even multimodal representations by leveraging subtitles extracted with automatic speech…”
Get full text
Journal Article -
9
Preserving Modality Structure Improves Multi-Modal Learning
Published 24-08-2023“…Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space…”
Get full text
Journal Article -
10
What, When, and Where? Self-Supervised Spatio- Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Published in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (16-06-2024)“…Spatio-temporal grounding describes the task of localizing events in space and time, e.g., in video data, based on verbal descriptions only. Models for this…”
Get full text
Conference Proceeding -
11
Augmentation Learning for Semi-Supervised Classification
Published 03-08-2022“…Recently, a number of new Semi-Supervised Learning methods have emerged. As the accuracy for ImageNet and similar datasets increased over time, the performance…”
Get full text
Journal Article -
12
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Published 29-03-2023“…Spatio-temporal grounding describes the task of localizing events in space and time, e.g., in video data, based on verbal descriptions only. Models for this…”
Get full text
Journal Article -
13
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Published 15-03-2023“…Large scale Vision-Language (VL) models have shown tremendous success in aligning representations between visual and text modalities. This enables remarkable…”
Get full text
Journal Article -
14
Anomaly Detection in Medical Imaging with Deep Perceptual Autoencoders
Published 13-09-2021“…IEEE Access, vol. 9, pp. 118571-118583, 2021 Anomaly detection is the problem of recognizing abnormal inputs based on the seen examples of normal data. Despite…”
Get full text
Journal Article -
15
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Published 08-12-2021“…Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation…”
Get full text
Journal Article -
16
Routing with Self-Attention for Multimodal Capsule Networks
Published 01-12-2021“…The task of multimodal learning has seen a growing interest recently as it allows for training neural architectures based on different modalities such as…”
Get full text
Journal Article -
17
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Published 07-10-2022“…Multilingual text-video retrieval methods have improved significantly in recent years, but the performance for other languages lags behind English. We propose…”
Get full text
Journal Article