Search Results - "Chen, Joya"

1
Communication-efficient federated learning with stagewise training strategy by Cheng, Yifei, Shen, Shuheng, Liang, Xianfeng, Liu, Jingchang, Chen, Joya, Zhang, Tie, Chen, Enhong

Published in Neural networks (01-10-2023)
“…The efficiency of communication across workers is a significant factor that affects the performance of federated learning. Though periodic communication…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
2
Residual objectness for imbalance reduction by Chen, Joya, Liu, Dong, Luo, Bin, Peng, Xuezheng, Xu, Tong, Chen, Enhong

Published in Pattern recognition (01-10-2022)
“…•We discover that the foreground-background imbalance in object detection could be addressed in a learning-based manner, without any hard-crafted resampling…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
3
Is Heuristic Sampling Necessary in Training Deep Object Detectors? by Chen, Joya, Liu, Dong, Xu, Tong, Wu, Shiwei, Cheng, Yifei, Chen, Enhong

Published in IEEE transactions on image processing (2021)
“…To train accurate deep object detectors under the extreme foreground-background imbalance, heuristic sampling methods are always necessary, which either…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
4
Affordance Grounding from Demonstration Video to Target Image by Chen, Joya, Gao, Difei, Lin, Kevin Qinghong, Shou, Mike Zheng

Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2023)
“…Humans excel at learning from expert demonstrations and solving their own problems. To equip intelligent robots and assistants, such as AR glasses, with this…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
5
Overlap Sampler for Region-Based Object Detection by Chen, Joya, Luo, Bin, Wu, Qi, Chen, Jia, Peng, Xuezheng

Published in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV) (01-03-2020)
“…The top accuracy of object detection to date is led by region-based approaches, where the per-region stage is responsible for recognizing proposals generated…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
6
Foreground-Background Imbalance Problem in Deep Object Detectors: A Review by Chen, Joya, Wu, Qi, Liu, Dong, Xu, Tong

Published in 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) (01-08-2020)
“…Recent years have witnessed the remarkable developments made by deep learning techniques for object detection, a fundamentally challenging problem of computer…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
7
Affordance Grounding from Demonstration Video to Target Image by Chen, Joya, Gao, Difei, Lin, Kevin Qinghong, Shou, Mike Zheng

Published 26-03-2023
“…Humans excel at learning from expert demonstrations and solving their own problems. To equip intelligent robots and assistants, such as AR glasses, with this…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
8
Bootstrapping SparseFormers from Vision Foundation Models by Gao, Ziteng, Tong, Zhan, Lin, Kevin Qinghong, Chen, Joya, Shou, Mike Zheng

Published in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (16-06-2024)
“…The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
9
From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition by Wu, Shiwei, Zhang, Chao, Chen, Joya, Xu, Tong, Wu, Likang, Hu, Yao, Chen, Enhong

Published 12-06-2024
“…People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
10
Bootstrapping SparseFormers from Vision Foundation Models by Gao, Ziteng, Tong, Zhan, Lin, Kevin Qinghong, Chen, Joya, Shou, Mike Zheng

Published 04-12-2023
“…The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
11
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training by Chen, Joya, Xu, Kai, Wang, Yuhui, Cheng, Yifei, Yao, Angela

Published 28-02-2022
“…A standard hardware bottleneck when training deep neural networks is GPU memory. The bulk of memory is occupied by caching intermediate tensors for gradient…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
12
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos by Bai, Zechen, He, Tong, Mei, Haiyang, Wang, Pichao, Gao, Ziteng, Chen, Joya, Liu, Lei, Zhang, Zheng, Shou, Mike Zheng

Published 29-09-2024
“…We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
13
Learning Video Context as Interleaved Multimodal Sequences by Lin, Kevin Qinghong, Zhang, Pengchuan, Gao, Difei, Xia, Xide, Chen, Joya, Gao, Ziteng, Xie, Jinheng, Xiao, Xuhong, Shou, Mike Zheng

Published 31-07-2024
“…Narrative videos, such as movies, pose significant challenges in video understanding due to their rich contexts (characters, dialogues, storylines) and diverse…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
14
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn by Gao, Difei, Ji, Lei, Zhou, Luowei, Lin, Kevin Qinghong, Chen, Joya, Fan, Zihan, Shou, Mike Zheng

Published 14-06-2023
“…Recent research on Large Language Models (LLMs) has led to remarkable advancements in general NLP AI assistants. Some studies have further explored the use of…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
15
Capturing Implicit Spatial Cues for Monocular 3d Hand Reconstruction by Wu, Qi, Chen, Joya, Zhou, Xu, Yao, Zhiming, Yang, Xianjun

Published in 2021 IEEE International Conference on Multimedia and Expo (ICME) (05-07-2021)
“…With the development of the parameterized hand model (e.g. MANO), it is possible to reconstruct the 3D hand mesh from a single 2D hand image by learning a few…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
16
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation by Wu, Shiwei, Chen, Joya, Lin, Kevin Qinghong, Wang, Qimeng, Gao, Yan, Xu, Qianli, Xu, Tong, Hu, Yao, Chen, Enhong, Shou, Mike Zheng

Published 29-08-2024
“…A well-known dilemma in large vision-language models (e.g., GPT-4, LLaVA) is that while increasing the number of vision tokens generally enhances visual…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
17
UniVTG: Towards Unified Video-Language Temporal Grounding by Lin, Kevin Qinghong, Zhang, Pengchuan, Chen, Joya, Pramanick, Shraman, Gao, Difei, Wang, Alex Jinpeng, Yan, Rui, Shou, Mike Zheng

Published 31-07-2023
“…Video Temporal Grounding (VTG), which aims to ground target clips from videos (such as consecutive intervals or disjoint shots) according to custom language…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
18
VideoLLM-online: Online Video Large Language Model for Streaming Video by Chen, Joya, Lv, Zhaoyang, Wu, Shiwei, Lin, Kevin Qinghong, Song, Chenan, Gao, Difei, Liu, Jia-Wei, Gao, Ziteng, Mao, Dongxing, Shou, Mike Zheng

Published 17-06-2024
“…Recent Large Language Models have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
19
VideoLLM-online: Online Video Large Language Model for Streaming Video by Chen, Joya, Lv, Zhaoyang, Wu, Shiwei, Lin, Kevin Qinghong, Song, Chenan, Gao, Difei, Liu, Jia-Wei, Gao, Ziteng, Mao, Dongxing, Shou, Mike Zheng

Published in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (16-06-2024)
“…Recent Large Language Models (LLMs) have been en-hanced with vision capabilities, enabling them to compre-hend images, videos, and interleaved vision-language…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
20
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant by Wong, Benita, Chen, Joya, Wu, You, Lei, Stan Weixian, Mao, Dongxing, Gao, Difei, Shou, Mike Zheng

Published 08-03-2022
“…A long-standing goal of intelligent assistants such as AR glasses/robots has been to assist users in affordance-centric real-world scenarios, such as "how can…”

Get full text

Journal Article
QR Code
Save to List

Saved in:

Search Results - "Chen, Joya"

Communication-efficient federated learning with stagewise training strategy by Cheng, Yifei, Shen, Shuheng, Liang, Xianfeng, Liu, Jingchang, Chen, Joya, Zhang, Tie, Chen, Enhong

Residual objectness for imbalance reduction by Chen, Joya, Liu, Dong, Luo, Bin, Peng, Xuezheng, Xu, Tong, Chen, Enhong

Is Heuristic Sampling Necessary in Training Deep Object Detectors? by Chen, Joya, Liu, Dong, Xu, Tong, Wu, Shiwei, Cheng, Yifei, Chen, Enhong

Affordance Grounding from Demonstration Video to Target Image by Chen, Joya, Gao, Difei, Lin, Kevin Qinghong, Shou, Mike Zheng

Overlap Sampler for Region-Based Object Detection by Chen, Joya, Luo, Bin, Wu, Qi, Chen, Jia, Peng, Xuezheng

Foreground-Background Imbalance Problem in Deep Object Detectors: A Review by Chen, Joya, Wu, Qi, Liu, Dong, Xu, Tong

Affordance Grounding from Demonstration Video to Target Image by Chen, Joya, Gao, Difei, Lin, Kevin Qinghong, Shou, Mike Zheng

Bootstrapping SparseFormers from Vision Foundation Models by Gao, Ziteng, Tong, Zhan, Lin, Kevin Qinghong, Chen, Joya, Shou, Mike Zheng

From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition by Wu, Shiwei, Zhang, Chao, Chen, Joya, Xu, Tong, Wu, Likang, Hu, Yao, Chen, Enhong

Bootstrapping SparseFormers from Vision Foundation Models by Gao, Ziteng, Tong, Zhan, Lin, Kevin Qinghong, Chen, Joya, Shou, Mike Zheng

DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training by Chen, Joya, Xu, Kai, Wang, Yuhui, Cheng, Yifei, Yao, Angela

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos by Bai, Zechen, He, Tong, Mei, Haiyang, Wang, Pichao, Gao, Ziteng, Chen, Joya, Liu, Lei, Zhang, Zheng, Shou, Mike Zheng

Learning Video Context as Interleaved Multimodal Sequences by Lin, Kevin Qinghong, Zhang, Pengchuan, Gao, Difei, Xia, Xide, Chen, Joya, Gao, Ziteng, Xie, Jinheng, Xiao, Xuhong, Shou, Mike Zheng

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn by Gao, Difei, Ji, Lei, Zhou, Luowei, Lin, Kevin Qinghong, Chen, Joya, Fan, Zihan, Shou, Mike Zheng

Capturing Implicit Spatial Cues for Monocular 3d Hand Reconstruction by Wu, Qi, Chen, Joya, Zhou, Xu, Yao, Zhiming, Yang, Xianjun

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation by Wu, Shiwei, Chen, Joya, Lin, Kevin Qinghong, Wang, Qimeng, Gao, Yan, Xu, Qianli, Xu, Tong, Hu, Yao, Chen, Enhong, Shou, Mike Zheng

UniVTG: Towards Unified Video-Language Temporal Grounding by Lin, Kevin Qinghong, Zhang, Pengchuan, Chen, Joya, Pramanick, Shraman, Gao, Difei, Wang, Alex Jinpeng, Yan, Rui, Shou, Mike Zheng

VideoLLM-online: Online Video Large Language Model for Streaming Video by Chen, Joya, Lv, Zhaoyang, Wu, Shiwei, Lin, Kevin Qinghong, Song, Chenan, Gao, Difei, Liu, Jia-Wei, Gao, Ziteng, Mao, Dongxing, Shou, Mike Zheng

VideoLLM-online: Online Video Large Language Model for Streaming Video by Chen, Joya, Lv, Zhaoyang, Wu, Shiwei, Lin, Kevin Qinghong, Song, Chenan, Gao, Difei, Liu, Jia-Wei, Gao, Ziteng, Mao, Dongxing, Shou, Mike Zheng

AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant by Wong, Benita, Chen, Joya, Wu, You, Lei, Stan Weixian, Mao, Dongxing, Gao, Difei, Shou, Mike Zheng

Search Tools:

Refine Results

Format

Subject Area

Topic

Language

Year of Publication