Search Results - "Chen, Joya"

Refine Results
  1. 1

    Communication-efficient federated learning with stagewise training strategy by Cheng, Yifei, Shen, Shuheng, Liang, Xianfeng, Liu, Jingchang, Chen, Joya, Zhang, Tie, Chen, Enhong

    Published in Neural networks (01-10-2023)
    “…The efficiency of communication across workers is a significant factor that affects the performance of federated learning. Though periodic communication…”
    Get full text
    Journal Article
  2. 2

    Residual objectness for imbalance reduction by Chen, Joya, Liu, Dong, Luo, Bin, Peng, Xuezheng, Xu, Tong, Chen, Enhong

    Published in Pattern recognition (01-10-2022)
    “…•We discover that the foreground-background imbalance in object detection could be addressed in a learning-based manner, without any hard-crafted resampling…”
    Get full text
    Journal Article
  3. 3

    Is Heuristic Sampling Necessary in Training Deep Object Detectors? by Chen, Joya, Liu, Dong, Xu, Tong, Wu, Shiwei, Cheng, Yifei, Chen, Enhong

    “…To train accurate deep object detectors under the extreme foreground-background imbalance, heuristic sampling methods are always necessary, which either…”
    Get full text
    Journal Article
  4. 4

    Affordance Grounding from Demonstration Video to Target Image by Chen, Joya, Gao, Difei, Lin, Kevin Qinghong, Shou, Mike Zheng

    “…Humans excel at learning from expert demonstrations and solving their own problems. To equip intelligent robots and assistants, such as AR glasses, with this…”
    Get full text
    Conference Proceeding
  5. 5

    Overlap Sampler for Region-Based Object Detection by Chen, Joya, Luo, Bin, Wu, Qi, Chen, Jia, Peng, Xuezheng

    “…The top accuracy of object detection to date is led by region-based approaches, where the per-region stage is responsible for recognizing proposals generated…”
    Get full text
    Conference Proceeding
  6. 6

    Foreground-Background Imbalance Problem in Deep Object Detectors: A Review by Chen, Joya, Wu, Qi, Liu, Dong, Xu, Tong

    “…Recent years have witnessed the remarkable developments made by deep learning techniques for object detection, a fundamentally challenging problem of computer…”
    Get full text
    Conference Proceeding
  7. 7

    Affordance Grounding from Demonstration Video to Target Image by Chen, Joya, Gao, Difei, Lin, Kevin Qinghong, Shou, Mike Zheng

    Published 26-03-2023
    “…Humans excel at learning from expert demonstrations and solving their own problems. To equip intelligent robots and assistants, such as AR glasses, with this…”
    Get full text
    Journal Article
  8. 8

    Bootstrapping SparseFormers from Vision Foundation Models by Gao, Ziteng, Tong, Zhan, Lin, Kevin Qinghong, Chen, Joya, Shou, Mike Zheng

    “…The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual…”
    Get full text
    Conference Proceeding
  9. 9

    From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition by Wu, Shiwei, Zhang, Chao, Chen, Joya, Xu, Tong, Wu, Likang, Hu, Yao, Chen, Enhong

    Published 12-06-2024
    “…People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific…”
    Get full text
    Journal Article
  10. 10

    Bootstrapping SparseFormers from Vision Foundation Models by Gao, Ziteng, Tong, Zhan, Lin, Kevin Qinghong, Chen, Joya, Shou, Mike Zheng

    Published 04-12-2023
    “…The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual…”
    Get full text
    Journal Article
  11. 11

    DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training by Chen, Joya, Xu, Kai, Wang, Yuhui, Cheng, Yifei, Yao, Angela

    Published 28-02-2022
    “…A standard hardware bottleneck when training deep neural networks is GPU memory. The bulk of memory is occupied by caching intermediate tensors for gradient…”
    Get full text
    Journal Article
  12. 12

    One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos by Bai, Zechen, He, Tong, Mei, Haiyang, Wang, Pichao, Gao, Ziteng, Chen, Joya, Liu, Lei, Zhang, Zheng, Shou, Mike Zheng

    Published 29-09-2024
    “…We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos…”
    Get full text
    Journal Article
  13. 13

    Learning Video Context as Interleaved Multimodal Sequences by Lin, Kevin Qinghong, Zhang, Pengchuan, Gao, Difei, Xia, Xide, Chen, Joya, Gao, Ziteng, Xie, Jinheng, Xiao, Xuhong, Shou, Mike Zheng

    Published 31-07-2024
    “…Narrative videos, such as movies, pose significant challenges in video understanding due to their rich contexts (characters, dialogues, storylines) and diverse…”
    Get full text
    Journal Article
  14. 14

    AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn by Gao, Difei, Ji, Lei, Zhou, Luowei, Lin, Kevin Qinghong, Chen, Joya, Fan, Zihan, Shou, Mike Zheng

    Published 14-06-2023
    “…Recent research on Large Language Models (LLMs) has led to remarkable advancements in general NLP AI assistants. Some studies have further explored the use of…”
    Get full text
    Journal Article
  15. 15

    Capturing Implicit Spatial Cues for Monocular 3d Hand Reconstruction by Wu, Qi, Chen, Joya, Zhou, Xu, Yao, Zhiming, Yang, Xianjun

    “…With the development of the parameterized hand model (e.g. MANO), it is possible to reconstruct the 3D hand mesh from a single 2D hand image by learning a few…”
    Get full text
    Conference Proceeding
  16. 16

    VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation by Wu, Shiwei, Chen, Joya, Lin, Kevin Qinghong, Wang, Qimeng, Gao, Yan, Xu, Qianli, Xu, Tong, Hu, Yao, Chen, Enhong, Shou, Mike Zheng

    Published 29-08-2024
    “…A well-known dilemma in large vision-language models (e.g., GPT-4, LLaVA) is that while increasing the number of vision tokens generally enhances visual…”
    Get full text
    Journal Article
  17. 17

    UniVTG: Towards Unified Video-Language Temporal Grounding by Lin, Kevin Qinghong, Zhang, Pengchuan, Chen, Joya, Pramanick, Shraman, Gao, Difei, Wang, Alex Jinpeng, Yan, Rui, Shou, Mike Zheng

    Published 31-07-2023
    “…Video Temporal Grounding (VTG), which aims to ground target clips from videos (such as consecutive intervals or disjoint shots) according to custom language…”
    Get full text
    Journal Article
  18. 18

    VideoLLM-online: Online Video Large Language Model for Streaming Video by Chen, Joya, Lv, Zhaoyang, Wu, Shiwei, Lin, Kevin Qinghong, Song, Chenan, Gao, Difei, Liu, Jia-Wei, Gao, Ziteng, Mao, Dongxing, Shou, Mike Zheng

    Published 17-06-2024
    “…Recent Large Language Models have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content…”
    Get full text
    Journal Article
  19. 19

    VideoLLM-online: Online Video Large Language Model for Streaming Video by Chen, Joya, Lv, Zhaoyang, Wu, Shiwei, Lin, Kevin Qinghong, Song, Chenan, Gao, Difei, Liu, Jia-Wei, Gao, Ziteng, Mao, Dongxing, Shou, Mike Zheng

    “…Recent Large Language Models (LLMs) have been en-hanced with vision capabilities, enabling them to compre-hend images, videos, and interleaved vision-language…”
    Get full text
    Conference Proceeding
  20. 20

    AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant by Wong, Benita, Chen, Joya, Wu, You, Lei, Stan Weixian, Mao, Dongxing, Gao, Difei, Shou, Mike Zheng

    Published 08-03-2022
    “…A long-standing goal of intelligent assistants such as AR glasses/robots has been to assist users in affordance-centric real-world scenarios, such as "how can…”
    Get full text
    Journal Article