Search Results - "2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)"

Refine Results
  1. 1

    Fine-tuned CLIP Models are Efficient Video Learners by Rasheed, Hanoona, Khattak, Muhammad Uzair, Maaz, Muhammad, Khan, Salman, Khan, Fahad Shahbaz

    “…Large-scale multi-modal training with image-text pairs imparts strong generalization to CLIP model. Since training on a similar scale for videos is infeasible,…”
    Get full text
    Conference Proceeding
  2. 2

    Person Image Synthesis via Denoising Diffusion Model by Kumar Bhunia, Ankan, Khan, Salman, Cholakkal, Hisham, Anwer, Rao Muhammad, Laaksonen, Jorma, Shah, Mubarak, Khan, Fahad Shahbaz

    “…The pose-guided person image generation task requires synthesizing photorealistic images of humans in arbitrary poses. The existing approaches use generative…”
    Get full text
    Conference Proceeding
  3. 3

    DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients by Pautrat, Remi, Barath, Daniel, Larsson, Viktor, Oswald, Martin R., Pollefeys, Marc

    “…Line segments are ubiquitous in our human-made world and are increasingly used in vision tasks. They are complementary to feature points thanks to their…”
    Get full text
    Conference Proceeding
  4. 4

    Burstormer: Burst Image Restoration and Enhancement Transformer by Dudhane, Akshay, Zamir, Syed Waqas, Khan, Salman, Khan, Fahad Shahbaz, Yang, Ming-Hsuan

    “…On a shutter press, modern handheld cameras capture multiple images in rapid succession and merge them to gen-erate a single image. However, individual frames…”
    Get full text
    Conference Proceeding
  5. 5

    PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery by Zhang, Sheng, Khan, Salman, Shen, Zhiqiang, Naseer, Muzammal, Chen, Guangyi, Khan, Fahad Shahbaz

    “…Although existing semi-supervised learning models achieve remarkable success in learning with unannotated in-distribution data, they mostly fail to learn on…”
    Get full text
    Conference Proceeding
  6. 6

    Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection by Li, Long, Han, Junwei, Zhang, Ni, Liu, Nian, Khan, Salman, Cholakkal, Hisham, Anwer, Rao Muhammad, Khan, Fahad Shahbaz

    “…Most previous co-salient object detection works mainly focus on extracting co-salient cues via mining the consistency relations across images while ignore…”
    Get full text
    Conference Proceeding
  7. 7

    Revisiting the P3P Problem by Ding, Yaqing, Yang, Jian, Larsson, Viktor, Olsson, Carl, Astrom, Kalle

    “…One of the classical multi-view geometry problems is the so called P3P problem, where the absolute pose of a calibrated camera is determined from three…”
    Get full text
    Conference Proceeding
  8. 8

    Revisiting Rotation Averaging: Uncertainties and Robust Losses by Zhang, Ganlin, Larsson, Viktor, Barath, Daniel

    “…In this paper, we revisit the rotation averaging problem applied in global Structure-from-Motion pipelines. We argue that the main problem of current methods…”
    Get full text
    Conference Proceeding
  9. 9

    Four-view Geometry with Unknown Radial Distortion by Hruby, Petr, Korotynskiy, Viktor, Duff, Timothy, Oeding, Luke, Pollefeys, Marc, Pajdla, Tomas, Larsson, Viktor

    “…We present novel solutions to previously unsolved prob-lems of relative pose estimation from images whose calibration parameters, namely focal lengths and…”
    Get full text
    Conference Proceeding
  10. 10

    Privacy-Preserving Representations are not Enough: Recovering Scene Content from Camera Poses by Chelani, Kunal, Sattler, Torsten, Kahl, Fredrik, Kukelova, Zuzana

    “…Visual localization is the task of estimating the camera pose from which a given image was taken and is central to several 3D computer vision applications…”
    Get full text
    Conference Proceeding
  11. 11

    Bayesian Posterior Approximation With Stochastic Ensembles by Balabanov, Oleksandr, Mehlig, Bernhard, Linander, Hampus

    “…We introduce ensembles of stochastic neural networks to approximate the Bayesian posterior, combining stochastic methods such as dropout with deep ensembles…”
    Get full text
    Conference Proceeding
  12. 12

    YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors by Wang, Chien-Yao, Bochkovskiy, Alexey, Liao, Hong-Yuan Mark

    “…Real-time object detection is one of the most important research topics in computer vision. As new approaches regarding architecture optimization and training…”
    Get full text
    Conference Proceeding
  13. 13

    DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation by Ruiz, Nataniel, Li, Yuanzhen, Jampani, Varun, Pritch, Yael, Rubinstein, Michael, Aberman, Kfir

    “…Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt…”
    Get full text
    Conference Proceeding
  14. 14

    Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks by Chen, Jierun, Kao, Shiu-hong, He, Hao, Zhuo, Weipeng, Wen, Song, Lee, Chul-Ho, Chan, S.-H. Gary

    “…To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in…”
    Get full text
    Conference Proceeding
  15. 15

    ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders by Woo, Sanghyun, Debnath, Shoubhik, Hu, Ronghang, Chen, Xinlei, Liu, Zhuang, Kweon, In So, Xie, Saining

    “…Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance…”
    Get full text
    Conference Proceeding
  16. 16

    Multi-Concept Customization of Text-to-Image Diffusion by Kumari, Nupur, Zhang, Bingliang, Zhang, Richard, Shechtman, Eli, Zhu, Jun-Yan

    “…While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their…”
    Get full text
    Conference Proceeding
  17. 17

    EVA: Exploring the Limits of Masked Visual Representation Learning at Scale by Fang, Yuxin, Wang, Wen, Xie, Binhui, Sun, Quan, Wu, Ledell, Wang, Xinggang, Huang, Tiejun, Wang, Xinlong, Cao, Yue

    “…We launch EVA, a vision-centric foundation model to Explore the limits of Visual representation at scAle using only publicly accessible data. EVA is a vanilla…”
    Get full text
    Conference Proceeding
  18. 18

    Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures by Metzer, Gal, Richardson, Elad, Patashnik, Or, Giryes, Raja, Cohen-Or, Daniel

    “…Text-guided image generation has progressed rapidly in recent years, inspiring major breakthroughs in text-guided shape generation. Recently, it has been shown…”
    Get full text
    Conference Proceeding
  19. 19

    Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models by Xu, Jiarui, Liu, Sifei, Vahdat, Arash, Byeon, Wonmin, Wang, Xiaolong, De Mello, Shalini

    “…We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform…”
    Get full text
    Conference Proceeding
  20. 20

    All are Worth Words: A ViT Backbone for Diffusion Models by Bao, Fan, Nie, Shen, Xue, Kaiwen, Cao, Yue, Li, Chongxuan, Su, Hang, Zhu, Jun

    “…Vision transformers (ViT) have shown promise in various vision tasks while the U-Net based on a convolutional neural network (CNN) remains dominant in…”
    Get full text
    Conference Proceeding