Search Results - "2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)"

Refine Results
  1. 1

    YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors by Wang, Chien-Yao, Bochkovskiy, Alexey, Liao, Hong-Yuan Mark

    “…Real-time object detection is one of the most important research topics in computer vision. As new approaches regarding architecture optimization and training…”
    Get full text
    Conference Proceeding
  2. 2

    Multi-Concept Customization of Text-to-Image Diffusion by Kumari, Nupur, Zhang, Bingliang, Zhang, Richard, Shechtman, Eli, Zhu, Jun-Yan

    “…While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their…”
    Get full text
    Conference Proceeding
  3. 3

    Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models by Xu, Jiarui, Liu, Sifei, Vahdat, Arash, Byeon, Wonmin, Wang, Xiaolong, De Mello, Shalini

    “…We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform…”
    Get full text
    Conference Proceeding
  4. 4

    SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy by Li, Jiafeng, Wen, Ying, He, Lianghua

    “…Convolutional Neural Networks (CNNs) have achieved remarkable performance in various computer vision tasks but this comes at the cost of tremendous…”
    Get full text
    Conference Proceeding
  5. 5

    RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion by Wang, Tengfei, Zhang, Bo, Zhang, Ting, Gu, Shuyang, Bao, Jianmin, Baltrusaitis, Tadas, Shen, Jingjing, Chen, Dong, Wen, Fang, Chen, Qifeng, Guo, Baining

    “…This paper presents a 3D diffusion model that automatically generates 3D digital avatars represented as neural radiance fields (NeRFs). A significant challenge…”
    Get full text
    Conference Proceeding
  6. 6

    Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation by Li, Feng, Zhang, Hao, Xu, Huaizhe, Liu, Shilong, Zhang, Lei, Ni, Lionel M., Shum, Heung-Yeung

    “…In this paper we present Mask DINO, a unified object detection and segmentation framework. Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes)…”
    Get full text
    Conference Proceeding
  7. 7

    DynIBaR: Neural Dynamic Image-Based Rendering by Li, Zhengqi, Wang, Qianqian, Cole, Forrester, Tucker, Richard, Snavely, Noah

    “…We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. State-of-the-art methods based on temporally…”
    Get full text
    Conference Proceeding
  8. 8

    OneFormer: One Transformer to Rule Universal Image Segmentation by Jain, Jitesh, Li, Jiachen, Chiu, MangTik, Hassani, Ali, Orlov, Nikita, Shi, Humphrey

    “…Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation include scene parsing, panoptic segmentation, and, more recently,…”
    Get full text
    Conference Proceeding
  9. 9

    Images Speak in Images: A Generalist Painter for In-Context Visual Learning by Wang, Xinlong, Wang, Wen, Cao, Yue, Shen, Chunhua, Huang, Tiejun

    “…In-context learning, as a new paradigm in NLP, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. But in computer…”
    Get full text
    Conference Proceeding
  10. 10

    Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation by Yang, Lihe, Qi, Lei, Feng, Litong, Zhang, Wayne, Shi, Yinghuan

    “…In this work, we revisit the weak-to-strong consistency framework, popularized by FixMatch from semi-supervised classification, where the prediction of a…”
    Get full text
    Conference Proceeding
  11. 11

    CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion by Zhao, Zixiang, Bai, Haowen, Zhang, Jiangshe, Zhang, Yulun, Xu, Shuang, Lin, Zudi, Timofte, Radu, Van Gool, Luc

    “…Multi-modality (MM) image fusion aims to render fused images that maintain the merits of different modalities, e.g., functional highlight and detailed…”
    Get full text
    Conference Proceeding
  12. 12

    DiffRF: Rendering-Guided 3D Radiance Field Diffusion by Muller, Norman, Siddiqui, Yawar, Porzi, Lorenzo, Bulo, Samuel Rota, Kontschieder, Peter, NieBner, Matthias

    “…We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. While existing diffusion-based methods…”
    Get full text
    Conference Proceeding
  13. 13

    Learning A Sparse Transformer Network for Effective Image Deraining by Chen, Xiang, Li, Hao, Li, Mingqiang, Pan, Jinshan

    “…Transformers-based methods have achieved significant performance in image deraining as they can model the non-local information which is vital for high-quality…”
    Get full text
    Conference Proceeding
  14. 14

    OpenScene: 3D Scene Understanding with Open Vocabularies by Peng, Songyou, Genova, Kyle, Jiang, Chiyu, Tagliasacchi, Andrea, Pollefeys, Marc, Funkhouser, Thomas

    “…Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an…”
    Get full text
    Conference Proceeding
  15. 15

    Neighborhood Attention Transformer by Hassani, Ali, Walton, Steven, Li, Jiachen, Li, Shen, Shi, Humphrey

    “…We present Neighborhood Attention (NA), the first efficient and scalable sliding window attention mechanism for vision. NA is a pixel-wise operation,…”
    Get full text
    Conference Proceeding
  16. 16

    Instant Volumetric Head Avatars by Zielonka, Wojciech, Bolkart, Timo, Thies, Justus

    “…We present Instant Volumetric Head Avatars (INSTA), a novel approach for reconstructing photo-realistic digital avatars instantaneously. INSTA models a dynamic…”
    Get full text
    Conference Proceeding
  17. 17

    Cut and Learn for Unsupervised Object Detection and Instance Segmentation by Wang, Xudong, Girdhar, Rohit, Yu, Stella X., Misra, Ishan

    “…We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and seg-mentation models. We leverage the property of…”
    Get full text
    Conference Proceeding
  18. 18

    Imagic: Text-Based Real Image Editing with Diffusion Models by Kawar, Bahjat, Zada, Shiran, Lang, Oran, Tov, Omer, Chang, Huiwen, Dekel, Tali, Mosseri, Inbar, Irani, Michal

    “…Text-conditioned image editing has recently attracted considerable interest. However, most methods are currently limited to one of the following: specific…”
    Get full text
    Conference Proceeding
  19. 19

    Learning Video Representations from Large Language Models by Zhao, Yue, Misra, Ishan, Krahenbuhl, Philipp, Girdhar, Rohit

    “…We introduce LAVILA, a new approach to learning video-language representations by leveraging Large Language Models (LLMs). We repurpose pre-trained LLMs to be…”
    Get full text
    Conference Proceeding
  20. 20

    MobileOne: An Improved One millisecond Mobile Backbone by Vasu, Pavan Kumar Anasosalu, Gabriel, James, Zhu, Jeff, Tuzel, Oncel, Ranjan, Anurag

    “…Efficient neural network backbones for mobile devices are often optimized for metrics such as FLOPs or parameter count. However, these metrics may not…”
    Get full text
    Conference Proceeding