Search Results - "Chen, Joya"
-
1
Communication-efficient federated learning with stagewise training strategy
Published in Neural networks (01-10-2023)“…The efficiency of communication across workers is a significant factor that affects the performance of federated learning. Though periodic communication…”
Get full text
Journal Article -
2
Residual objectness for imbalance reduction
Published in Pattern recognition (01-10-2022)“…•We discover that the foreground-background imbalance in object detection could be addressed in a learning-based manner, without any hard-crafted resampling…”
Get full text
Journal Article -
3
Is Heuristic Sampling Necessary in Training Deep Object Detectors?
Published in IEEE transactions on image processing (2021)“…To train accurate deep object detectors under the extreme foreground-background imbalance, heuristic sampling methods are always necessary, which either…”
Get full text
Journal Article -
4
Affordance Grounding from Demonstration Video to Target Image
Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2023)“…Humans excel at learning from expert demonstrations and solving their own problems. To equip intelligent robots and assistants, such as AR glasses, with this…”
Get full text
Conference Proceeding -
5
Overlap Sampler for Region-Based Object Detection
Published in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV) (01-03-2020)“…The top accuracy of object detection to date is led by region-based approaches, where the per-region stage is responsible for recognizing proposals generated…”
Get full text
Conference Proceeding -
6
Foreground-Background Imbalance Problem in Deep Object Detectors: A Review
Published in 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) (01-08-2020)“…Recent years have witnessed the remarkable developments made by deep learning techniques for object detection, a fundamentally challenging problem of computer…”
Get full text
Conference Proceeding -
7
Affordance Grounding from Demonstration Video to Target Image
Published 26-03-2023“…Humans excel at learning from expert demonstrations and solving their own problems. To equip intelligent robots and assistants, such as AR glasses, with this…”
Get full text
Journal Article -
8
Bootstrapping SparseFormers from Vision Foundation Models
Published in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (16-06-2024)“…The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual…”
Get full text
Conference Proceeding -
9
From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Published 12-06-2024“…People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific…”
Get full text
Journal Article -
10
Bootstrapping SparseFormers from Vision Foundation Models
Published 04-12-2023“…The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual…”
Get full text
Journal Article -
11
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Published 28-02-2022“…A standard hardware bottleneck when training deep neural networks is GPU memory. The bulk of memory is occupied by caching intermediate tensors for gradient…”
Get full text
Journal Article -
12
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Published 29-09-2024“…We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos…”
Get full text
Journal Article -
13
Learning Video Context as Interleaved Multimodal Sequences
Published 31-07-2024“…Narrative videos, such as movies, pose significant challenges in video understanding due to their rich contexts (characters, dialogues, storylines) and diverse…”
Get full text
Journal Article -
14
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Published 14-06-2023“…Recent research on Large Language Models (LLMs) has led to remarkable advancements in general NLP AI assistants. Some studies have further explored the use of…”
Get full text
Journal Article -
15
Capturing Implicit Spatial Cues for Monocular 3d Hand Reconstruction
Published in 2021 IEEE International Conference on Multimedia and Expo (ICME) (05-07-2021)“…With the development of the parameterized hand model (e.g. MANO), it is possible to reconstruct the 3D hand mesh from a single 2D hand image by learning a few…”
Get full text
Conference Proceeding -
16
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Published 29-08-2024“…A well-known dilemma in large vision-language models (e.g., GPT-4, LLaVA) is that while increasing the number of vision tokens generally enhances visual…”
Get full text
Journal Article -
17
UniVTG: Towards Unified Video-Language Temporal Grounding
Published 31-07-2023“…Video Temporal Grounding (VTG), which aims to ground target clips from videos (such as consecutive intervals or disjoint shots) according to custom language…”
Get full text
Journal Article -
18
VideoLLM-online: Online Video Large Language Model for Streaming Video
Published 17-06-2024“…Recent Large Language Models have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content…”
Get full text
Journal Article -
19
VideoLLM-online: Online Video Large Language Model for Streaming Video
Published in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (16-06-2024)“…Recent Large Language Models (LLMs) have been en-hanced with vision capabilities, enabling them to compre-hend images, videos, and interleaved vision-language…”
Get full text
Conference Proceeding -
20
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
Published 08-03-2022“…A long-standing goal of intelligent assistants such as AR glasses/robots has been to assist users in affordance-centric real-world scenarios, such as "how can…”
Get full text
Journal Article