Coupled with the massive social multimedia data and mobile visual search applications, techniques towards large-scale visual search and recognition are emerging. With the introduction of local invariant visual features, recent decade has witnessed the fast advance of large-scale image search. Current state-of-the-art image search algorithms and systems are motivated by the classic bag-of-visual-words model and the scalable index structure. Generally, an image search system is involved with several key modules, including feature representation, visual codebook construction, feature quantization, index strategy, scoring scheme, and post processing. Feature representation consists of feature detector and feature descriptor, which aims to represent an image as a “bag” of local features. To achieve a compact representation, similar to text words in information retrieval, visual codebook is trained by clustering with large amount of feature samples to capture feature distribution. Then, each high dimensional local feature can be represented by visual words, and an image can be represented by a “bag” of visual words. Moreover, to achieve scalable image retrieval, inverted index file structure is leveraged from information retrieval. Various scoring schemes can be exploited to weight visual words to discriminate images in retrieval. Further, post-processing techniques, such as geometric verification, query expansion and multi-modal fusion, can be plugged in to boost the retrieval performance.
In the first part of the talk, I will introduce those related works in each module as mentioned above. Besides, I will introduce our research work on large scale image search. We have done comprehensive work on feature representation, codebook learning, feature quantization, spatial verification. Several representative works will be discussed. After that, I will discuss the potential research directions on large scale image search.
Qi Tian is currently a Full Professor in the Department of Computer Science, the University of Texas at San Antonio (UTSA). During 2008-2009, he took one-year Faculty Leave at Microsoft Research Asia (MSRA) in the Media Computing Group. He received his Ph.D. in ECE from University of Illinois at Urbana-Champaign (UIUC) in 2002 and his B.E and M.S degrees from Tsinghua University and Drexel University in 1992 and 1996, respectively, all from electronic engineering. Dr. Tian’s research interests focus on multimedia information retrieval and computer vision and published over 190 refereed journal and conference papers. He received the Best Paper Award in ACM ICIMCS 2012 and MMM 2013, a Top 10% Paper Award in MMSP 2011, the Best Student Paper Award in ICASSP 2006, and was a co-author of a Best Paper Candidate in PCM 2007. His research projects are funded by NSF, ARO, DHS, Google, FXPAL, NEC, SALSI, CIAS, Akiira Media Systems, HP and UTSA. He received 2010 ACM Service Award. He is the Guest Editors of IEEE Transactions on Multimedia, Journal of Computer Vision and Image Understanding, etc, and is the Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) and in the Editorial Board of Journal of Multimedia (JMM) and Journal of Machine Vision and Applications (MVA). He is the Guest or Adjunct Professor in Xi’an Jiaotong University, USTC, Zhejiang University, Xidian University and Institute of Computing Technology, Chinese Academy of Science and a Chaired Professor in Tsinghua University.