Search Results - "Seo, Paul Hongsuck"

1
MarioQA: Answering Questions by Watching Gameplay Videos by Jonghwan Mun, Hongsuck Seo, Paul, Ilchae Jung, Bohyung Han

Published in 2017 IEEE International Conference on Computer Vision (ICCV) (01-10-2017)
“…We present a framework to analyze various aspects of models for video question answering (VideoQA) using customizable synthetic datasets, which are constructed…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
2
Look Before you Speak: Visually Contextualized Utterances by Hongsuck Seo, Paul, Nagrani, Arsha, Schmid, Cordelia

Published in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2021)
“…While most conversational AI systems focus on textual dialogue only, conditioning utterances on visual context (when it's available) can lead to more realistic…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
3
End-to-end Generative Pretraining for Multimodal Video Captioning by Seo, Paul Hongsuck, Nagrani, Arsha, Arnab, Anurag, Schmid, Cordelia

Published in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2022)
“…Recent video and language pretraining frameworks lack the ability to generate sentences. We present Multimodal Video Generative Pretraining (MV-GPT), a new…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
4
Learning for Single-Shot Confidence Calibration in Deep Neural Networks Through Stochastic Inferences by Seo, Seonguk, Seo, Paul Hongsuck, Han, Bohyung

Published in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2019)
“…We propose a generic framework to calibrate accuracy and confidence of a prediction in deep neural networks through stochastic inferences. We interpret…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
5
Zero-shot Referring Image Segmentation with Global-Local Context Features by Yu, Seonghoon, Seo, Paul Hongsuck, Son, Jeany

Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2023)
“…Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
6
Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction by Hyeonwoo Noh, Seo, Paul Hongsuck, Bohyung Han

Published in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2016)
“…We tackle image question answering (ImageQA) problem by learning a convolutional neural network (CNN) with a dynamic parameter layer whose weights are…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
7
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning by Yang, Antoine, Nagrani, Arsha, Seo, Paul Hongsuck, Miech, Antoine, Pont-Tuset, Jordi, Laptev, Ivan, Sivic, Josef, Schmid, Cordelia

Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2023)
“…In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning model pretrained on narrated videos which are readily-available at scale…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
8
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR by Seo, Paul Hongsuck, Nagrani, Arsha, Schmid, Cordelia

Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2023)
“…Audiovisual automatic speech recognition (AV-ASR) aims to improve the robustness of a speech recognition system by incorporating visual information. Training…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
9
IFSeg: Image-free Semantic Segmentation via Vision-Language Model by Yun, Sukmin, Park, Seong Hyeon, Seo, Paul Hongsuck, Shin, Jinwoo

Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2023)
“…Vision-language (VL) pre-training has recently gained much attention for its transferability and flexibility in novel concepts (e.g., cross-modality transfer)…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
10
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation by Yu, Seonghoon, Seo, Paul Hongsuck, Son, Jeany

Published 10-07-2024
“…We propose a new framework that automatically generates high-quality segmentation masks with their referring expressions as pseudo supervisions for referring…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
11
Zero-shot Referring Image Segmentation with Global-Local Context Features by Yu, Seonghoon, Seo, Paul Hongsuck, Son, Jeany

Published 31-03-2023
“…Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
12
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR by Seo, Paul Hongsuck, Nagrani, Arsha, Schmid, Cordelia

Published 29-03-2023
“…Audiovisual automatic speech recognition (AV-ASR) aims to improve the robustness of a speech recognition system by incorporating visual information. Training…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
13
AVATAR submission to the Ego4D AV Transcription Challenge by Seo, Paul Hongsuck, Nagrani, Arsha, Schmid, Cordelia

Published 17-11-2022
“…In this report, we describe our submission to the Ego4D AudioVisual (AV) Speech Transcription Challenge 2022. Our pipeline is based on AVATAR, a state of the…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
14
Learning Correlation Structures for Vision Transformers by Kim, Manjin, Seo, Paul Hongsuck, Schmid, Cordelia, Cho, Minsu

Published in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (16-06-2024)
“…We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
15
Learning Correlation Structures for Vision Transformers by Kim, Manjin, Seo, Paul Hongsuck, Schmid, Cordelia, Cho, Minsu

Published 05-04-2024
“…We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
16
IFSeg: Image-free Semantic Segmentation via Vision-Language Model by Yun, Sukmin, Park, Seong Hyeon, Seo, Paul Hongsuck, Shin, Jinwoo

Published 25-03-2023
“…Vision-language (VL) pre-training has recently gained much attention for its transferability and flexibility in novel concepts (e.g., cross-modality transfer)…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
17
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels by Shin, Heeseong, Kim, Chaehyun, Hong, Sunghwan, Cho, Seokju, Arnab, Anurag, Seo, Paul Hongsuck, Kim, Seungryong

Published 29-09-2024
“…Large-scale vision-language models like CLIP have demonstrated impressive open-vocabulary capabilities for image-level tasks, excelling in recognizing what…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
18
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation by Cho, Seokju, Shin, Heeseong, Hong, Sunghwan, Arnab, Anurag, Seo, Paul Hongsuck, Kim, Seungryong

Published in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (16-06-2024)
“…Open-vocabulary semantic segmentation presents the challenge of labeling each pixel within an image based on a wide range of text descriptions. In this work,…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
19
Look Before you Speak: Visually Contextualized Utterances by Seo, Paul Hongsuck, Nagrani, Arsha, Schmid, Cordelia

Published 10-12-2020
“…While most conversational AI systems focus on textual dialogue only, conditioning utterances on visual context (when it's available) can lead to more realistic…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
20
End-to-end Generative Pretraining for Multimodal Video Captioning by Seo, Paul Hongsuck, Nagrani, Arsha, Arnab, Anurag, Schmid, Cordelia

Published 20-01-2022
“…Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) 2022 Recent video and language pretraining frameworks lack the ability to generate…”

Get full text

Journal Article
QR Code
Save to List

Saved in:

Search Results - "Seo, Paul Hongsuck"

MarioQA: Answering Questions by Watching Gameplay Videos by Jonghwan Mun, Hongsuck Seo, Paul, Ilchae Jung, Bohyung Han

Look Before you Speak: Visually Contextualized Utterances by Hongsuck Seo, Paul, Nagrani, Arsha, Schmid, Cordelia

End-to-end Generative Pretraining for Multimodal Video Captioning by Seo, Paul Hongsuck, Nagrani, Arsha, Arnab, Anurag, Schmid, Cordelia

Learning for Single-Shot Confidence Calibration in Deep Neural Networks Through Stochastic Inferences by Seo, Seonguk, Seo, Paul Hongsuck, Han, Bohyung

Zero-shot Referring Image Segmentation with Global-Local Context Features by Yu, Seonghoon, Seo, Paul Hongsuck, Son, Jeany

Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction by Hyeonwoo Noh, Seo, Paul Hongsuck, Bohyung Han

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning by Yang, Antoine, Nagrani, Arsha, Seo, Paul Hongsuck, Miech, Antoine, Pont-Tuset, Jordi, Laptev, Ivan, Sivic, Josef, Schmid, Cordelia

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR by Seo, Paul Hongsuck, Nagrani, Arsha, Schmid, Cordelia

IFSeg: Image-free Semantic Segmentation via Vision-Language Model by Yun, Sukmin, Park, Seong Hyeon, Seo, Paul Hongsuck, Shin, Jinwoo

Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation by Yu, Seonghoon, Seo, Paul Hongsuck, Son, Jeany

Zero-shot Referring Image Segmentation with Global-Local Context Features by Yu, Seonghoon, Seo, Paul Hongsuck, Son, Jeany

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR by Seo, Paul Hongsuck, Nagrani, Arsha, Schmid, Cordelia

AVATAR submission to the Ego4D AV Transcription Challenge by Seo, Paul Hongsuck, Nagrani, Arsha, Schmid, Cordelia

Learning Correlation Structures for Vision Transformers by Kim, Manjin, Seo, Paul Hongsuck, Schmid, Cordelia, Cho, Minsu

Learning Correlation Structures for Vision Transformers by Kim, Manjin, Seo, Paul Hongsuck, Schmid, Cordelia, Cho, Minsu

IFSeg: Image-free Semantic Segmentation via Vision-Language Model by Yun, Sukmin, Park, Seong Hyeon, Seo, Paul Hongsuck, Shin, Jinwoo

Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels by Shin, Heeseong, Kim, Chaehyun, Hong, Sunghwan, Cho, Seokju, Arnab, Anurag, Seo, Paul Hongsuck, Kim, Seungryong

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation by Cho, Seokju, Shin, Heeseong, Hong, Sunghwan, Arnab, Anurag, Seo, Paul Hongsuck, Kim, Seungryong

Look Before you Speak: Visually Contextualized Utterances by Seo, Paul Hongsuck, Nagrani, Arsha, Schmid, Cordelia

End-to-end Generative Pretraining for Multimodal Video Captioning by Seo, Paul Hongsuck, Nagrani, Arsha, Arnab, Anurag, Schmid, Cordelia

Search Tools:

Refine Results

Format

Subject Area

Topic

Language

Year of Publication