Search Results - "Min, Kyle"
-
1
TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection
Published in 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (01-10-2019)“…TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: first, the encoder network…”
Get full text
Conference Proceeding -
2
Hierarchical Novelty Detection for Visual Object Recognition
Published in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (01-06-2018)“…Deep neural networks have achieved impressive success in large-scale visual object recognition tasks with a predefined set of classes. However, recognizing…”
Get full text
Conference Proceeding -
3
SViTT: Temporal Learning of Sparse Video-Text Transformers
Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2023)“…Do video-text transformers learn to model temporal relationships across frames? Despite their immense capacity and the abundance of multimodal training data,…”
Get full text
Conference Proceeding -
4
Integrating Human Gaze into Attention for Egocentric Activity Recognition
Published in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) (01-01-2021)“…It is well known that human gaze carries significant information about visual attention. However, there are three main difficulties in incorporating the gaze…”
Get full text
Conference Proceeding -
5
Unbiased Scene Graph Generation in Videos
Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2023)“…The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of…”
Get full text
Conference Proceeding -
6
STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization
Published 18-06-2023“…This report introduces our novel method named STHG for the Audio-Visual Diarization task of the Ego4D Challenge 2023. Our key innovation is that we model all…”
Get full text
Journal Article -
7
Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization
Published 14-10-2022“…This report describes our approach for the Audio-Visual Diarization (AVD) task of the Ego4D Challenge 2022. Specifically, we present multiple technical…”
Get full text
Journal Article -
8
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
Published 12-06-2024“…Pretraining egocentric vision-language models has become essential to improving downstream egocentric video-text tasks. These egocentric foundation models…”
Get full text
Journal Article -
9
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
Published 25-05-2024“…In the evolving landscape of text-to-image (T2I) diffusion models, the remarkable capability to generate high-quality images from textual descriptions faces…”
Get full text
Journal Article -
10
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Published 28-07-2024“…Video understanding typically requires fine-tuning the large backbone when adapting to new domains. In this paper, we leverage the egocentric video foundation…”
Get full text
Journal Article -
11
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Published in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (16-06-2024)“…The rapid advancement of generative models, facilitating the creation of hyper-realistic images from textual de-scriptions, has concurrently escalated critical…”
Get full text
Conference Proceeding -
12
Contrastive Language Video Time Pre-training
Published 03-06-2024“…We introduce LAVITI, a novel approach to learning language, video, and temporal representations in long-form videos via contrastive learning. Different from…”
Get full text
Journal Article -
13
Integrating Human Gaze into Attention for Egocentric Activity Recognition
Published 08-11-2020“…It is well known that human gaze carries significant information about visual attention. However, there are three main difficulties in incorporating the gaze…”
Get full text
Journal Article -
14
Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization
Published 13-07-2020“…Temporally localizing activities within untrimmed videos has been extensively studied in recent years. Despite recent advances, existing methods for…”
Get full text
Journal Article -
15
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Published 07-06-2023“…The rapid advancement of generative models, facilitating the creation of hyper-realistic images from textual descriptions, has concurrently escalated critical…”
Get full text
Journal Article -
16
SViTT: Temporal Learning of Sparse Video-Text Transformers
Published 18-04-2023“…Do video-text transformers learn to model temporal relationships across frames? Despite their immense capacity and the abundance of multimodal training data,…”
Get full text
Journal Article -
17
Unbiased Scene Graph Generation in Videos
Published 03-04-2023“…The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of…”
Get full text
Journal Article -
18
TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection
Published 15-08-2019“…TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: first, the encoder network…”
Get full text
Journal Article -
19
Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Published in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (16-06-2024)“…We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos. EASGs extend standard…”
Get full text
Conference Proceeding -
20
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Published 15-07-2022“…Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and…”
Get full text
Journal Article