大量の映像から欲しい情報を探す 映像メディア解析

Multimedia Event Detection
Using Segment-based Representation
Sang Phan, Duy-Dinh Le, Shin’ichi Satoh
研究の目的
映像中の
イベントの
検出
YES
アプローチ
既存の手法は、特徴量を統合して映像全体を表
す特徴量を利用している。この方法では映像の
各セグメントをすべて同等に扱うことになり、
映像理解においてエラーを生じる。われわれは
より短いセグメントレベルの表現について検討
を行っている。
NO
Event: Grooming an animal
- Definition: One or more people groom an animal
- Scene: outdoors, in a yard or corral, indoors in bathroom,
grooming salon
- Objects/people: sink, bathtub, hose, shower, soap,
shampoo, scissors
- Activities: spraying hose, rinsing, cutting fur, clipping nails
Methods
Sum-max video pooling
Segment-based representation
Video-based approach
Our proposed segment-based approach:
the basic idea is to examine shorter
segments instead of using the entire video
 Non-overlapping: uniform sampling at 30, 60, 90, 120, 200, 400 seconds
 Overlapping sampling: uniform sampling + 50% overlapping
 Segment sampling based on shot boundary detection
Video-based approach Our proposed Sum-max Video Pooling: Sum
pooling is used to keep sufficient relevant features
at the low layer. Max pooling is used to retrieve
the most relevant features at the high layer.
Therefore it can discard irrelevant features in the
final video representation
Results
Segment-based representation
Results from using segment-based approach with non-overlapping and
overlapping sampling on MED 2011
Comparison of different segment-based approaches with
the video-based approach on the MED 2010 dataset
Sum-max video pooling
Results on the MED 2010 dataset using the sum-max pooling technique at different
segment lengths
Performance comparison of different video
pooling strategies on the MED 2010 dataset
Segment-based approach outperforms video-based approach
Sang Phan, Thanh Duc Ngo, Vu Lam, Son Tran, Duy-Dinh Le, Duc Anh Duong, Shin'ichi
Satoh: Multimedia Event Detection Using Segment-Based Approach for Motion
Feature. Signal Processing Systems 74(1): 19-31 (2014)
Sang Phan, Duy-Dinh Le, Duc Anh Duong, Shin'ichi Satoh: Sum-max Video Pooling for
Complex Event Recognition. ICIP (2014)
連絡先:Sang Phan/ 国立情報学研究所 コンテンツ科学研究系
TEL : 03-4212-2527
FAX : 03-4212-2120
Email : [email protected]
Object‐based Image Retrieval
Tell me about TV commercials of this product
Siriwat Kasamwattanarote, Cai‐Zhi Zhu, Xiaomeng Wu, Shin’ichi Satoh
研究の⽬的
⼿法の概要
商品の画像を問い合わせとしてコマーシャル映像を検索
コマーシャル映像の関連情報も表⽰
関連研究:
The visual BOW analogy of text retrieval is very efficient for image retrieval.
[Sivic,ICCV’03]
Query‐to‐Class (not Image‐to‐Image) distance is optimal. [Boiman, CVPR’08]
Large vocabulary improves retrieval.
[Nister, CVPR’06]
 画像による検索とあわせテキスト(商品名)による検索も実現
 問い合わせ画像並びにコマーシャル映像におけるaverage pooling
による照合
 その他最先端の技術を利⽤:
Root SIFT.
AKM based large vocabulary (1M).
Inverted indexing.
LO‐RANSAC based spatial verification.
Pipeline of the Algorithm
1. Offline DB preparation
2. Online retrieval
PC/Mobile/Text‐to‐image
Online searching pipeline*
Interactive web interface with commercial playback*
Commercial DB Info.
Commercials recorded from 5 Japanese TV channels
3 recent years or 15 years of commercials in total.
4.5 million commercial clips.
Commercial film broadcasting information*
Our TRECVID INS 2011 performance
0.6
Our performance
0.5
infAP
0.4
*All the returned commercials as seen in all figures above
correspond to a query of a web image of “Asashi beer”.
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
Run ID
論⽂リスト:
• Tell me about TV commercials of this product, Cai-Zhi Zhu, Siriwat Kasamwattanarote,
Xiaomeng Wu, and Shin'ichi Satoh, The 20th Anniversary International Conference on
MultiMedia Modeling (MMM), Dublin, Ireland, pp. 242-253, Jan. 6-10, 2014.
• Connect Commercial Films with Realities, Cai-Zhi Zhu, Siriwat Kasamwattanarote,
Xiaomeng Wu, and Shin'ichi Satoh, 2013 International Conference on Multimedia Retrieval
(ICMR), Dallas, Texas, USA, pp. 323-324, Apr. 16-20, 2013.
連絡先︓シリワット カセムワッタナロット/ 国⽴情報学研究所 コンテンツ科学研究系
TEL : 03-4212-2527 FAX : 03-4212-2120 Email : [email protected]