Aller au menu Aller au contenu Aller à la recherche
  • Français
  • English
aA - +Imprimer la page

PhD defense: Yu Liu

Yu Liu has defended his thesis “Lightweight Architectures for Spatiotemporal Action Detection in Real-Time”

Abstract

In the last decade, the explosive growth of video content has driven numerous application demands for automating action detection in space and time. Aside from accurate detection, vast sensing scenarios in the real-world mandate incremental, instantaneous processing of scenes under restricted computational budgets. The main challenge here lies in dependence on heavy 3D Convolutional Neural Networks (CNN) or explicit motion computation (e.g., optical flow) to extract pertinent spatiotemporal information.

To this end, we propose three lightweight action detection architectures coupling various spatiotemporal modeling schemes with compact 2D CNNs. Our first intuition was to accelerate frame-level action detection by allocating computationally expensive feature extraction to only a sparse set of video frames while approximating the
rest. Meanwhile, we accumulated multiple observations over time to efficiently model temporal variations of actions.

Subsequently, we explored processing a series of video frames and predicting the underlying action-specific bounding boxes concurrently (i.e., tubelets). Specifically, modeling of an action sequence was decoupled into multi-frame feature aggregation and trajectory tracking for enhanced action classification and localization, respectively.

Finally, we devised a flow-like motion representation that can be computed on-the-fly from raw video frames. Our aforementioned tubelet detector was extended into
two-CNN pathways to jointly extract actions’ static visual and dynamic cues. We demonstrate that our online action detectors progressively improve and obtain a superior mix of accuracy, efficiency, and speed performance.

kc_data:
a:8:{i:0;s:0:"";s:4:"mode";s:0:"";s:3:"css";s:0:"";s:9:"max_width";s:0:"";s:7:"classes";s:0:"";s:9:"thumbnail";s:0:"";s:9:"collapsed";s:0:"";s:9:"optimized";s:0:"";}
kc_raw_content:

Yu Liu has defended his thesis "Lightweight Architectures for Spatiotemporal Action Detection in Real-Time"

Abstract

In the last decade, the explosive growth of video content has driven numerous application demands for automating action detection in space and time. Aside from accurate detection, vast sensing scenarios in the real-world mandate incremental, instantaneous processing of scenes under restricted computational budgets. The main challenge here lies in dependence on heavy 3D Convolutional Neural Networks (CNN) or explicit motion computation (e.g., optical flow) to extract pertinent spatiotemporal information.

To this end, we propose three lightweight action detection architectures coupling various spatiotemporal modeling schemes with compact 2D CNNs. Our first intuition was to accelerate frame-level action detection by allocating computationally expensive feature extraction to only a sparse set of video frames while approximating the
rest. Meanwhile, we accumulated multiple observations over time to efficiently model temporal variations of actions.

Subsequently, we explored processing a series of video frames and predicting the underlying action-specific bounding boxes concurrently (i.e., tubelets). Specifically, modeling of an action sequence was decoupled into multi-frame feature aggregation and trajectory tracking for enhanced action classification and localization, respectively.

Finally, we devised a flow-like motion representation that can be computed on-the-fly from raw video frames. Our aforementioned tubelet detector was extended into
two-CNN pathways to jointly extract actions’ static visual and dynamic cues. We demonstrate that our online action detectors progressively improve and obtain a superior mix of accuracy, efficiency, and speed performance.

extrait:
lien_externe:
equipe:
a:1:{i:0;s:5:"CORES";}
tags:
Évenement

Log In

Create an account