Incremental Learning for Visual Recognition

Offer type: 
3 years
Host team: 
Contact person: 
Financial support for thesis: 
Offer details: 

Job opening of PhD Candidate

Starting: Fall 2016.
Duration: 3 years.
Supervisors: Georges Quénot (CNRS) and Denis Pellerin (Univ. Grenoble).
Location: Laboratory of Informatics in Grenoble (, MRIM Team;
          Gipsa-lab (, AGPIG team.
Funding: DeCoRe project ( of Persyval Lab.

- MSc in computer science, applied math, or other relevant field;
- excellent programming skills (C, python, CUDA);
- knowledge of deep learning is a big plus.

Contact: Send the following to Georges Quénot and Denis Pellerin
- CV
- grade transcripts
- reference letter by your MSc thesis advisor

Title: Incremental Learning for Visual Recognition.

This PhD will focus on the detection and localization of visual categories in still images and videos. It will especially study the problem of the dynamic adaptation of CNN models to newly available training data, to new needed target categories and/or to new or specific application domains (e.g. medical, satellite or life-log data). Effective architectures are now very deep (19 layers) [1] and even ultra-deep (152 layers) [2] and need very long training times: up to several weeks even using very powerful multi-GPU hardware. It is not possible or efficient to retrain a complete model for a particular set of new categories or for applying already trained categories to different domains. Incremental learning [3] is a way to adapt already trained networks for such needs at a low marginal cost. Also, various forms of weakly supervised learning and active learning can be used in conjunction to further improve the system performance. Localization of target categories [4] is also very important since knowing where objects are located in images helps building better model, especially in a semi-supervised way.

Specific topics of interest include:

* Incremental learning and evolving network architectures: new methods will be studied for building networks that operate in a "continuous learning" mode for permanently improving themselves. Improvements will be possible by a continuous inclusion on new target concepts (possibly including the full ImageNet set and even beyond), and by the adaptation of already trained concepts to new target domains (e.g. satellite images or life-logging content). Incremental learning methods will be considered as well as network architecture evolution.

* Active learning and weakly supervised learning: various forms of these approaches as well as of semi-supervised learning have proven very effective and efficient for content-based indexing of images and videos, both at the image or shot level and at the region or even pixel level. These also fit very well with incremental learning. The goal here will be to efficiently integrate them in order to extract as much information as possible from all available annotated, non-annotated, and weakly annotated data, and to leverage large amount of new information from very little new annotations or checking. This will also involve classification using hierarchical sets of categories, and knowledge transfer between categories and between application domains. Data augmentation will also be considered specifically in the context of active learning.

* Saliency: saliency is a very important prior in object detection. It can be considered from two perspectives, using either user gaze information or main categories localization. In both cases, saliency can be learned using deep networks and later used for improving object detection and localization. We will explore how saliency extraction and use can be efficiently combined with incremental and active learning.


[1] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.

[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.

[3] Tianjun Xiao, Jiaxing Zhang, Kuiyuan Yang, Yuxin Peng, and Zheng Zhang. Error-driven incremental learning in deep convolutional neural network for large-scale image classification. In Proceedings of the 22Nd ACM International Conference on Multimedia, MM '14, pages 177{186, New York, NY, USA, 2014. ACM.

[4] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Is object localization for free? Weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.