Video summarization through reinforcement learning with a 3D spatio-temporal U-Net
File(s)2106.10528.pdf (3.15 MB)
Accepted version
OA Location
Author(s)
Type
Journal Article
Abstract
Intelligent video summarization algorithms allow to quickly convey the most relevant information in videos through the identification of the most essential and explanatory content while removing redundant video frames. In this paper, we introduce the 3DST-UNet-RL framework for video summarization. A 3D spatio-temporal U-Net is used to efficiently encode spatio-temporal information of the input videos for downstream reinforcement learning (RL). An RL agent learns from spatio-temporal latent scores and predicts actions for keeping or rejecting a video frame in a video summary. We investigate if real/inflated 3D spatio-temporal CNN features are better suited to learn representations from videos than commonly used 2D image features. Our framework can operate in both, a fully unsupervised mode and a supervised training mode. We analyse the impact of prescribed summary lengths and show experimental evidence for the effectiveness of 3DST-UNet-RL on two commonly used general video summarization benchmarks. We also applied our method on a medical video summarization task. The proposed video summarization method has the potential to save storage costs of ultrasound screening videos as well as to increase efficiency when browsing patient video data during retrospective analysis or audit without loosing essential information.
Date Issued
2022-01-24
Date Acceptance
2021-12-31
Citation
IEEE Transactions on Image Processing, 2022, 31, pp.1573-1586
ISSN
1057-7149
Publisher
Institute of Electrical and Electronics Engineers
Start Page
1573
End Page
1586
Journal / Book Title
IEEE Transactions on Image Processing
Volume
31
Copyright Statement
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Identifier
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000750373700004&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202
Subjects
Science & Technology
Technology
Computer Science, Artificial Intelligence
Engineering, Electrical & Electronic
Computer Science
Engineering
Three-dimensional displays
Feature extraction
Biomedical imaging
Reinforcement learning
Task analysis
Solid modeling
Training
Video summarization
reinforcement learning
3D convolutions
3D U-Net
medical video processing
ultrasound
Publication Status
Published online
Date Publish Online
2022-01-24