System Identification of Neural Systems: Going Beyond Images to Modelling Dynamics
Extensive literature has drawn comparisons between recordings of biological neurons in the brain and deep neural networks. This comparative analysis aims to advance and interpret deep neural networks and enhance our understanding of biological neural systems. However, previous works did not consider...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
19-02-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Extensive literature has drawn comparisons between recordings of biological
neurons in the brain and deep neural networks. This comparative analysis aims
to advance and interpret deep neural networks and enhance our understanding of
biological neural systems. However, previous works did not consider the time
aspect and how the encoding of video and dynamics in deep networks relate to
the biological neural systems within a large-scale comparison. Towards this
end, we propose the first large-scale study focused on comparing video
understanding models with respect to the visual cortex recordings using video
stimuli. The study encompasses more than two million regression fits, examining
image vs. video understanding, convolutional vs. transformer-based and fully
vs. self-supervised models. Additionally, we propose a novel neural encoding
scheme to better encode biological neural systems. We provide key insights on
how video understanding models predict visual cortex responses; showing video
understanding better than image understanding models, convolutional models are
better in the early-mid visual cortical regions than transformer based ones
except for multiscale transformers, and that two-stream models are better than
single stream. Furthermore, we propose a novel neural encoding scheme that is
built on top of the best performing video understanding models, while
incorporating inter-intra region connectivity across the visual cortex. Our
neural encoding leverages the encoded dynamics from video stimuli, through
utilizing two-stream networks and multiscale transformers, while taking
connectivity priors into consideration. Our results show that merging both
intra and inter-region connectivity priors increases the encoding performance
over each one of them standalone or no connectivity priors. It also shows the
necessity for encoding dynamics to fully benefit from such connectivity priors. |
---|---|
DOI: | 10.48550/arxiv.2402.12519 |