Multimedia Event Detection Using Segment-based Representation Sang Phan, Duy-Dinh Le, Shin’ichi Satoh 研究の目的 映像中の イベントの 検出 YES アプローチ 既存の手法は、特徴量を統合して映像全体を表 す特徴量を利用している。この方法では映像の 各セグメントをすべて同等に扱うことになり、 映像理解においてエラーを生じる。われわれは より短いセグメントレベルの表現について検討 を行っている。 NO Event: Grooming an animal - Definition: One or more people groom an animal - Scene: outdoors, in a yard or corral, indoors in bathroom, grooming salon - Objects/people: sink, bathtub, hose, shower, soap, shampoo, scissors - Activities: spraying hose, rinsing, cutting fur, clipping nails Methods Sum-max video pooling Segment-based representation Video-based approach Our proposed segment-based approach: the basic idea is to examine shorter segments instead of using the entire video Non-overlapping: uniform sampling at 30, 60, 90, 120, 200, 400 seconds Overlapping sampling: uniform sampling + 50% overlapping Segment sampling based on shot boundary detection Video-based approach Our proposed Sum-max Video Pooling: Sum pooling is used to keep sufficient relevant features at the low layer. Max pooling is used to retrieve the most relevant features at the high layer. Therefore it can discard irrelevant features in the final video representation Results Segment-based representation Results from using segment-based approach with non-overlapping and overlapping sampling on MED 2011 Comparison of different segment-based approaches with the video-based approach on the MED 2010 dataset Sum-max video pooling Results on the MED 2010 dataset using the sum-max pooling technique at different segment lengths Performance comparison of different video pooling strategies on the MED 2010 dataset Segment-based approach outperforms video-based approach Sang Phan, Thanh Duc Ngo, Vu Lam, Son Tran, Duy-Dinh Le, Duc Anh Duong, Shin'ichi Satoh: Multimedia Event Detection Using Segment-Based Approach for Motion Feature. Signal Processing Systems 74(1): 19-31 (2014) Sang Phan, Duy-Dinh Le, Duc Anh Duong, Shin'ichi Satoh: Sum-max Video Pooling for Complex Event Recognition. ICIP (2014) 連絡先:Sang Phan/ 国立情報学研究所 コンテンツ科学研究系 TEL : 03-4212-2527 FAX : 03-4212-2120 Email : [email protected] Object‐based Image Retrieval Tell me about TV commercials of this product Siriwat Kasamwattanarote, Cai‐Zhi Zhu, Xiaomeng Wu, Shin’ichi Satoh 研究の⽬的 ⼿法の概要 商品の画像を問い合わせとしてコマーシャル映像を検索 コマーシャル映像の関連情報も表⽰ 関連研究: The visual BOW analogy of text retrieval is very efficient for image retrieval. [Sivic,ICCV’03] Query‐to‐Class (not Image‐to‐Image) distance is optimal. [Boiman, CVPR’08] Large vocabulary improves retrieval. [Nister, CVPR’06] 画像による検索とあわせテキスト(商品名)による検索も実現 問い合わせ画像並びにコマーシャル映像におけるaverage pooling による照合 その他最先端の技術を利⽤: Root SIFT. AKM based large vocabulary (1M). Inverted indexing. LO‐RANSAC based spatial verification. Pipeline of the Algorithm 1. Offline DB preparation 2. Online retrieval PC/Mobile/Text‐to‐image Online searching pipeline* Interactive web interface with commercial playback* Commercial DB Info. Commercials recorded from 5 Japanese TV channels 3 recent years or 15 years of commercials in total. 4.5 million commercial clips. Commercial film broadcasting information* Our TRECVID INS 2011 performance 0.6 Our performance 0.5 infAP 0.4 *All the returned commercials as seen in all figures above correspond to a query of a web image of “Asashi beer”. 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Run ID 論⽂リスト: • Tell me about TV commercials of this product, Cai-Zhi Zhu, Siriwat Kasamwattanarote, Xiaomeng Wu, and Shin'ichi Satoh, The 20th Anniversary International Conference on MultiMedia Modeling (MMM), Dublin, Ireland, pp. 242-253, Jan. 6-10, 2014. • Connect Commercial Films with Realities, Cai-Zhi Zhu, Siriwat Kasamwattanarote, Xiaomeng Wu, and Shin'ichi Satoh, 2013 International Conference on Multimedia Retrieval (ICMR), Dallas, Texas, USA, pp. 323-324, Apr. 16-20, 2013. 連絡先︓シリワット カセムワッタナロット/ 国⽴情報学研究所 コンテンツ科学研究系 TEL : 03-4212-2527 FAX : 03-4212-2120 Email : [email protected]
© Copyright 2024