Lidé

doc. Georgios Tolias, Ph.D.

Všechny publikace

Test-time Training for Matching-based Video Object Segmentation

  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The video object segmentation (VOS) task involves the segmentation of an object over time based on a single initial mask. Current state-of-the-art approaches use a memory of previously processed frames and rely on matching to estimate segmentation masks of subsequent frames. Lacking any adaptation mechanism, such methods are prone to test-time distribution shifts. This work focuses on matching-based VOS under distribution shifts such as video corruptions, stylization, and sim-to-real transfer. We explore test-time training strategies that are agnostic to the specific task as well as strategies that are designed specifically for VOS. This includes a variant based on mask cycle consistency tailored to matching-based VOS methods. The experimental results on common benchmarks demonstrate that the proposed test-time training yields significant improvements in performance. In particular for the sim-to-real scenario and despite using only a single test video, our approach manages to recover a substantial portion of the performance gain achieved through training on real videos. Additionally, we introduce DAVIS-C, an augmented version of the popular DAVIS test set, featuring extreme distribution shifts like image-/video-level corruptions and stylizations. Our results illustrate that test-time training enhances performance even in these challenging cases.

Edge Augmentation for Large-Scale Sketch Recognition without Sketches

  • DOI: 10.1109/ICPR56361.2022.9956233
  • Odkaz: https://doi.org/10.1109/ICPR56361.2022.9956233
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This work addresses scaling up the sketch classification task into a large number of categories. Collecting sketches for training is a slow and tedious process that has so far precluded any attempts to large-scale sketch recognition. We overcome the lack of training sketch data by exploiting labeled collections of natural images that are easier to obtain. To bridge the domain gap we present a novel augmentation technique that is tailored to the task of learning sketch recognition from a training set of natural images. Randomization is introduced in the parameters of edge detection and edge selection. Natural images are translated to a pseudo-novel domain called "randomized Binary Thin Edges" (rBTE), which is used as a training domain instead of natural images. The ability to scale up is demonstrated by training CNN-based sketch recognition of more than 2.5 times larger number of categories than used previously. For this purpose, a dataset of natural images from 874 categories is constructed by combining a number of popular computer vision datasets. The categories are selected to be suitable for sketch recognition. To estimate the performance, a subset of 393 categories with sketches is also collected.

Recall@k Surrogate Loss with Large Batches and Similarity Mixup

  • DOI: 10.1109/CVPR52688.2022.00735
  • Odkaz: https://doi.org/10.1109/CVPR52688.2022.00735
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This work focuses on learning deep visual representation models for retrieval by exploring the interplay between a new loss function, the batch size, and a new regularization approach. Direct optimization, by gradient descent, of an evaluation metric, is not possible when it is non-differentiable, which is the case for recall in retrieval. A differentiable surrogate loss for the recall is proposed in this work. Using an implementation that sidesteps the hardware constraints of the GPU memory, the method trains with a very large batch size, which is essential for metrics computed on the entire retrieval database. It is assisted by an efficient mixup regularization approach that operates on pairwise scalar similarities and virtually increases the batch size further. The suggested method achieves state-of-the-art performance in several image retrieval benchmarks when used for deep metric learning. For instance-level recognition, the method outperforms similar approaches that train using an approximation of average precision.

Results and findings of the 2021 Image Similarity Challenge

  • Autoři: Papakipos, Z., doc. Georgios Tolias, Ph.D., Ing. Tomáš Jeníček, Pizzi, E., Yokoo, S., Wang, W., Sun, Y., Zhang, W., Yang, Y., Addicam, S., Papadakis, S.M., Ferrer, C.C., prof. Mgr. Ondřej Chum, Ph.D., Douze, M.
  • Publikace: Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. Proceedings of Machine Learning Research, 2022. p. 1-12. vol. 176. ISSN 1938-7228.
  • Rok: 2022
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The 2021 Image Similarity Challenge introduced a dataset to serve as a benchmark to evaluate image copy detection methods. There were 200 participants to the competition. This paper presents a quantitative and qualitative analysis of the top submissions. It appears that the most difficult image transformations involve either severe image crops or overlaying onto unrelated images, combined with local pixel perturbations. The key algorithmic elements in the winning submissions are: training on strong augmentations, self-supervised learning, score normalization, explicit overlay detection, and global descriptor matching followed by pairwise image comparison.

The Met Dataset:Instance-level Recognition for Artworks

  • Autoři: Ing. Nikolaos-Antonios Ypsilantis, Garcia, N., Han, G., Ibrahimi, S., van Noord, N., doc. Georgios Tolias, Ph.D.,
  • Publikace: NeurIPS Datasets and Benchmarks 2021: The Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. Neural Information Processing Systems Foundation, Inc., 2022. ISBN 978-1-7138-7109-5.
  • Rok: 2022
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This work introduces a dataset for large-scale instance-level recognition in the do-main of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes.We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhibit with photos taken under studio conditions. Testing is primarily performed on photos taken by museum guests depicting exhibits, which introduces a distribution shift between training and testing. Testing is additionally performed on a set of images not related to Met exhibits making the task resemble an out-of-distribution detection problem. The proposed benchmark follows the paradigm of other recent datasets for instance-level recognition on different domains to encourage research on domain independent approaches. A number of suitable approaches are evaluated to offer a testbed for future comparisons. Self-supervised and supervised contrastive learning are effectively combined to train the backbone which is used for non-parametric classification that is shown as a promising direction. Dataset webpage: http://cmp.felk.cvut.cz/met/.

Graph convolutional networks for learning with few clean and many noisy labels

  • DOI: 10.1007/978-3-030-58607-2_17
  • Odkaz: https://doi.org/10.1007/978-3-030-58607-2_17
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In this work we consider the problem of learning a classifier from noisy labels when a few clean labeled examples are given. The structure of clean and noisy data is modeled by a graph per class and Graph Convolutional Networks (GCN) are used to predict class relevance of noisy examples. For each class, the GCN is treated as a binary classifier, which learns to discriminate clean from noisy examples using a weighted binary cross-entropy loss function. The GCN-inferred “clean” probability is then exploited as a relevance measure. Each noisy example is weighted by its relevance when learning a classifier for the end task. We evaluate our method on an extended version of a few-shot learning problem, where the few clean examples of novel classes are supplemented with additional noisy data. Experimental results show that our GCNbased cleaning process significantly improves the classification accuracy over not cleaning the noisy data, as well as standard few-shot classification where only few clean examples are used.

Learning and aggregating deep local descriptors for instance-level recognition

  • DOI: 10.1007/978-3-030-58452-8_27
  • Odkaz: https://doi.org/10.1007/978-3-030-58452-8_27
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose an efficient method to learn deep local descriptors for instance-level recognition. The training only requires examples of positive and negative image pairs and is performed as metric learning of sum-pooled global image descriptors. At inference, the local descriptors are provided by the activations of internal components of the network. We demonstrate why such an approach learns local descriptors that work well for image similarity estimation with classical efficient match kernel methods. The experimental validation studies the trade-off between performance and memory requirements of the state-of-the-art image search approach based on match kernels. Compared to existing local descriptors, the proposed ones perform better in two instance-level recognition tasks and keep memory requirements lower. We experimentally show that global descriptors are not effective enough at large scale and that local descriptors are essential. We achieve state-of-the-art performance, in some cases even with a backbone network as small as ResNet18.

Explicit Spatial Encoding for Deep Local Descriptors

  • DOI: 10.1109/CVPR.2019.00962
  • Odkaz: https://doi.org/10.1109/CVPR.2019.00962
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a kernelized deep local-patch descriptor based on efficient match kernels of neural network activations. Response of each receptive field is encoded together with its spatial location using explicit feature maps. Two location parametrizations, Cartesian and polar, are used to provide robustness to a different types of canonical patch misalignment. Additionally, we analyze how the conventional architecture, i.e. a fully connected layer attached after the convolutional part, encodes responses in a spatially variant way. In contrary, explicit spatial encoding is used in our descriptor, whose potential applications are not limited to local-patches. We evaluate the descriptor on standard benchmarks. Both versions, encoding 32x32 or 64x64 patches, consistently outperform all other methods on all benchmarks. The number of parameters of the model is independent of the input patch resolution.

Fine-tuning CNN Image Retrieval with No Human Annotation

  • DOI: 10.1109/TPAMI.2018.2846566
  • Odkaz: https://doi.org/10.1109/TPAMI.2018.2846566
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Image descriptors based on activations of Convolutional Neural Networks (CNNs) have become dominant in image retrieval due to their discriminative power, compactness of representation, and search efficiency. Training of CNNs, either from scratch or fine-tuning, requires a large amount of annotated data, where a high quality of annotation is often crucial. In this work, we propose to fine-tune CNNs for image retrieval on a large collection of unordered images in a fully automated manner. Reconstructed 3D models obtained by the state-of-the-art retrieval and structure-from-motion methods guide the selection of the training data. We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval. CNN descriptor whitening discriminatively learned from the same training data outperforms commonly used PCA whitening. We propose a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and average pooling and show that it boosts retrieval performance. Applying the proposed method to the VGG network achieves state-of-the-art performance on the standard benchmarks: Oxford Buildings, Paris, and Holidays datasets.

Graph-based particular object discovery

  • DOI: 10.1007/s00138-019-01005-z
  • Odkaz: https://doi.org/10.1007/s00138-019-01005-z
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Severe background clutter is challenging in many computer vision tasks, including large-scale image retrieval. Global descriptors, which are popular due to their memory and search efficiency, are especially prone to corruption by such a clutter. Eliminating the impact of the clutter on the image descriptor increases the chance of retrieving relevant images and prevents topic drift due to actually retrieving the clutter in the case of query expansion. In this work, we propose a novel salient region detection method. It captures, in an unsupervised manner, patterns that are both discriminative and common in the dataset. Saliency is based on a centrality measure of a nearest neighbor graph constructed from regional CNN representations of dataset images. The proposed method exploits recent CNN architectures trained for object retrieval to construct the image representation from the salient regions. We improve particular object retrieval on challenging datasets containing small objects.

Hybrid Diffusion: Spectral-Temporal Graph Filtering for Manifold Ranking

  • Autoři: Iscen, A., Avrithis, Y., doc. Georgios Tolias, Ph.D., Furon, T., prof. Mgr. Ondřej Chum, Ph.D.,
  • Publikace: ACCV 2018: Proceedings of the 14th Asian Conference on Computer Vision, Part II. Springer, 2019. p. 301-316. LNCS. vol. 11362. ISSN 0302-9743. ISBN 978-3-030-20889-9.
  • Rok: 2019
  • DOI: 10.1007/978-3-030-20890-5_20
  • Odkaz: https://doi.org/10.1007/978-3-030-20890-5_20
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    State of the art image retrieval performance is achieved with CNN features and manifold ranking using a k-NN similarity graph that is pre-computed off-line. The two most successful existing approaches are temporal filtering, where manifold ranking amounts to solving a sparse linear system online, and spectral filtering, where eigen-decomposition of the adjacency matrix is performed off-line and then manifold ranking amounts to dot-product search online. The former suffers from expensive queries and the latter from significant space overhead. Here we introduce a novel, theoretically well-founded hybrid filtering approach allowing full control of the space-time trade-off between these two extremes. Experimentally, we verify that our hybrid method delivers results on par with the state of the art, with lower memory demands compared to spectral filtering approaches and faster compared to temporal filtering.

Label Propagation for Deep Semi-supervised Learning

  • DOI: 10.1109/CVPR.2019.00521
  • Odkaz: https://doi.org/10.1109/CVPR.2019.00521
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Semi-supervised learning is becoming increasingly important because it can combine data carefully labeled by humans with abundant unlabeled data to train deep neural networks. Classic methods on semi-supervised learning that have focused on transductive learning have not been fully exploited in the inductive framework followed by modern deep learning. The same holds for the manifold assumption---that similar examples should get the same prediction. In this work, we employ a transductive label propagation method that is based on the manifold assumption to make predictions on the entire dataset and use these predictions to generate pseudo-labels for the unlabeled data and train a deep neural network. At the core of the transductive method lies a nearest neighbor graph of the dataset that we create based on the embeddings of the same network. Therefore our learning process iterates between these two steps. We improve performance on several datasets especially in the few labels regime and show that our work is complementary to current state of the art.

Targeted Mismatch Adversarial Attack: Query With a Flower to Retrieve the Tower

  • DOI: 10.1109/ICCV.2019.00514
  • Odkaz: https://doi.org/10.1109/ICCV.2019.00514
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Access to online visual search engines implies sharing of private user content -- the query images. We introduce the concept of targeted mismatch attack for deep learning based retrieval systems to generate an adversarial image to conceal the query image. The generated image looks nothing like the user intended query, but leads to identical or very similar retrieval results. Transferring attacks to fully unseen networks is challenging. We show successful attacks to partially unknown systems, by designing various loss functions for the adversarial image construction. These include loss functions, for example, for unknown global pooling operation or unknown input resolution by the retrieval system. We evaluate the attacks on standard retrieval benchmarks and compare the results retrieved with the original and adversarial image.

Understanding and Improving Kernel Local Descriptors

  • DOI: 10.1007/s11263-018-1137-8
  • Odkaz: https://doi.org/10.1007/s11263-018-1137-8
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a multiple-kernel local-patch descriptor based on efficient match kernels from pixel gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch mis-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Combined with whitening of the descriptor space, that is learned with or without supervision, the performance is significantly improved. We analyze the effect of the whitening on patch similarity and demonstrate its semantic meaning. Our unsupervised variant is the best performing descriptor constructed without the need of labeled data. Despite the simplicity of the proposed descriptor, it competes well with deep learning approaches on a number of different tasks.

Deep Shape Matching

  • Autoři: Radenović, F., doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
  • Publikace: ECCV2018: Proceedings of the European Conference on Computer Vision, Part V. Springer, Cham, 2018. p. 774-791. Lecture Notes in Computer Science. vol. 11209. ISSN 0302-9743. ISBN 978-3-030-01227-4.
  • Rok: 2018
  • DOI: 10.1007/978-3-030-01228-1_46
  • Odkaz: https://doi.org/10.1007/978-3-030-01228-1_46
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks.

Efficient Contour Match Kernel

  • DOI: 10.1016/j.imavis.2018.04.006
  • Odkaz: https://doi.org/10.1016/j.imavis.2018.04.006
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a novel concept of asymmetric feature maps (AFM), which allows to evaluate multiple kernels between a query and database entries without increasing the memory requirements. To demonstrate the advantages of the AFM method, we derive an efficient contour match kernel – short vector image representation that, due to asymmetric feature maps, supports efficient scale and translation invariant sketch-based image retrieval. Unlike most of the short-code based retrieval systems, the proposed method provides the query localization in the retrieved image. The efficiency of the search is boosted by approximating a 2D translation search via trigonometric polynomial of scores by 1D projections. The projections are a special case of AFM. An order of magnitude speed-up is achieved compared to traditional trigonometric polynomials. The results are boosted by an image-based average query expansion approach and, without any learning, significantly outperform the state-of-the-art hand-crafted descriptors on standard benchmarks. Our method competes well with recent CNN-based approaches that require large amounts of labeled sketches, images and sketch-image pairs.

Fast Spectral Ranking for Similarity Search

  • Autoři: Iscen, A., Avrithis, Y., doc. Georgios Tolias, Ph.D., Furon, T., prof. Mgr. Ondřej Chum, Ph.D.,
  • Publikace: CVPR 2018: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018. p. 7632-7641. ISSN 2575-7075. ISBN 978-1-5386-6420-9.
  • Rok: 2018
  • DOI: 10.1109/CVPR.2018.00796
  • Odkaz: https://doi.org/10.1109/CVPR.2018.00796
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Despite the success of deep learning on representing images for particular object retrieval, recent studies show that the learned representations still lie on manifolds in a high dimensional space. This makes the Euclidean nearest neighbor search biased for this task. Exploring the manifolds online remains expensive even if a nearest neighbor graph has been computed offline. This work introduces an explicit embedding reducing manifold search to Euclidean search followed by dot product similarity search. This is equivalent to linear graph filtering of a sparse signal in the frequency domain. To speed up online search, we compute an approximate Fourier basis of the graph offline. We improve the state of art on particular object retrieval datasets including the challenging Instre dataset containing small objects. At a scale of 105 images, the offl

Mining on Manifolds: Metric Learning without Labels

  • Autoři: Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., prof. Mgr. Ondřej Chum, Ph.D.,
  • Publikace: CVPR 2018: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018. p. 7642-7651. ISSN 2575-7075. ISBN 978-1-5386-6420-9.
  • Rok: 2018
  • DOI: 10.1109/CVPR.2018.00797
  • Odkaz: https://doi.org/10.1109/CVPR.2018.00797
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In this work we present a novel unsupervised framework for hard training example mining. The only input to the method is a collection of images relevant to the target application and a meaningful initial representation, provided e.g. by pre-trained CNN. Positive examples are distant points on a single manifold, while negative examples are nearby points on different manifolds. Both types of examples are revealed by disagreements between Euclidean and manifold similarities. The discovered examples can be used in training with any discriminative loss. The method is applied to unsupervised fine-tuning of pre-trained networks for fine-grained classification and particular object retrieval. Our models are on par or are outperforming prior models that are fully or partially supervised.

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

  • Autoři: Radenović, F., Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., prof. Mgr. Ondřej Chum, Ph.D.,
  • Publikace: CVPR 2018: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018. p. 5706-5715. ISSN 2575-7075. ISBN 978-1-5386-6420-9.
  • Rok: 2018
  • DOI: 10.1109/CVPR.2018.00598
  • Odkaz: https://doi.org/10.1109/CVPR.2018.00598
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In this paper we address issues with image retrieval benchmarking on standard and popular Oxford 5k and Paris 6k datasets. In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth. Three new protocols of varying difficulty are introduced. The protocols allow fair comparison between different methods, including those using a dataset pre-processing stage. For each dataset, 15 new challenging queries are introduced. Finally, a new set of 1M hard, semi automatically cleaned distractors is selected. An extensive comparison of the state-of-the-art methods is performed on the new benchmark. Different types of methods are evaluated, ranging from local-feature-based to modern CNN based methods. The best results are achieved by taking the best of the two worlds. Most importantly, image retrieval appears far from being solved.

Unsupervised object discovery for instance recognition

  • Autoři: Simeoni, O., Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., prof. Mgr. Ondřej Chum, Ph.D.,
  • Publikace: 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018. Institute of Electrical and Electronics Engineers Inc, 2018. p. 1745-1754. ISSN 2472-6737. ISBN 978-1-5386-4886-5.
  • Rok: 2018
  • DOI: 10.1109/WACV.2018.00194
  • Odkaz: https://doi.org/10.1109/WACV.2018.00194
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Severe background clutter is challenging in many computer vision tasks, including large-scale image retrieval. Global descriptors, that are popular due to their memory and search efficiency, are especially prone to corruption by such a clutter. Eliminating the impact of the clutter on the image descriptor increases the chance of retrieving relevant images and prevents topic drift due to actually retrieving the clutter in the case of query expansion. In this work, we propose a novel salient region detection method. It captures, in an unsupervised manner, patterns that are both discriminative and common in the dataset. Saliency is based on a centrality measure of a nearest neighbor graph constructed from regional CNN representations of dataset images. The descriptors derived from the salient regions improve particular object retrieval, most noticeably in a large collections containing small objects.

Asymmetric Feature Maps with Application to Sketch Based Retrieval

  • DOI: 10.1109/CVPR.2017.655
  • Odkaz: https://doi.org/10.1109/CVPR.2017.655
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a novel concept of asymmetric feature maps (AFM), which allows to evaluate multiple kernels between a query and database entries without increasing the memory requirements. To demonstrate the advantages of the AFM method, we derive a short vector image representation that, due to asymmetric feature maps, supports efficient scale and translation invariant sketch-based image retrieval. Unlike most of the short-code based retrieval systems, the proposed method provides the query localization in the retrieved image. The efficiency of the search is boosted by approximating a 2D translation search via trigonometric polynomial of scores by 1D projections. The projections are a special case of AFM. An order of magnitude speed-up is achieved compared to traditional trigonometric polynomials. The results are boosted by an image-based average query expansion, exceeding significantly the state of the art on standard benchmarks.

Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations

  • Autoři: Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., Furon, T., prof. Mgr. Ondřej Chum, Ph.D.,
  • Publikace: CVPR 2017: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, 2017. p. 926-935. ISSN 1063-6919. ISBN 978-1-5386-0457-1.
  • Rok: 2017
  • DOI: 10.1109/CVPR.2017.105
  • Odkaz: https://doi.org/10.1109/CVPR.2017.105
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Query expansion is a popular method to improve the quality of image retrieval with both conventional and CNN representations. It has been so far limited to global image similarity. This work focuses on diffusion, a mechanism that captures the image manifold in the feature space. The diffusion is carried out on descriptors of overlapping image regions rather than on a global image descriptor like in previous approaches. An efficient off-line stage allows optional reduction in the number of stored regions. In the on-line stage, the proposed handling of unseen queries in the indexing stage removes additional computation to adjust the precomputed data. We perform diffusion through a sparse linear system solver, yielding practical query times well below one second. Experimentally, we observe a significant boost in performance of image retrieval with compact CNN descriptors on standard benchmarks, especially when the query object covers only a small part of the image. Small objects have been a common failure case of CNN-based retrieval.

Multiple-Kernel Local-Patch Descriptor

  • DOI: 10.5244/C.31.184
  • Odkaz: https://doi.org/10.5244/C.31.184
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a multiple-kernel local-patch descriptor based on efficient match kernels of patch gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch miss-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Even though handcrafted, the proposed method consistently outperforms the state-of-the-art methods on two local patch benchmarks.

Panorama to panorama matching for location recognition

  • DOI: 10.1145/3078971.3079033
  • Odkaz: https://doi.org/10.1145/3078971.3079033
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Location recognition is commonly treated as visual instance retrieval on “street view” imagery. i.e. dataset items and queries are panoramic views, i.e. groups of images taken at a single location. This work introduces a novel panorama-to-panorama matching process, either by aggregating features of individual images in a group or by explicitly constructing a larger panorama. In either case, multiple views are used as queries. We reach near perfect location recognition on a standard benchmark with only four query views.

Robust data whitening as an iteratively re-weighted least squares problem

  • DOI: 10.1007/978-3-319-59126-1_20
  • Odkaz: https://doi.org/10.1007/978-3-319-59126-1_20
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The entries of high-dimensional measurements, such as image or feature descriptors, are often correlated, which leads to a bias in similarity estimation. To remove the correlation, a linear transformation, called whitening, is commonly used. In this work, we analyze robust estimation of the whitening transformation in the presence of outliers. Inspired by the Iteratively Re-weighted Least Squares approach, we iterate between centering and applying a transformation matrix, a process which is shown to converge to a solution that minimizes the sum of ℓ2 norms. The approach is developed for unsupervised scenarios, but further extend to supervised cases. We demonstrate the robustness of our method to outliers on synthetic 2D data and also show improvements compared to conventional whitening on real data for image retrieval with CNN-based representation. Finally, our robust estimation is not limited to data whitening, but can be used for robust patch rectification, e.g. with MSER features.

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples

  • Autoři: Radenovič, F., doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
  • Publikace: Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I. Springer, 2016. p. 3-20. Lecture Notes in Computer Science. vol. 9905. ISSN 0302-9743. ISBN 978-3-319-46447-3.
  • Rok: 2016
  • DOI: 10.1007/978-3-319-46448-0_1
  • Odkaz: https://doi.org/10.1007/978-3-319-46448-0_1
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.

Particular object retrieval with integral max-pooling of CNN activations

  • Autoři: doc. Georgios Tolias, Ph.D., Sicre, R., Jegou, H.
  • Publikace: International Conference on Learning Representations 2016. Computational and Biological Learning Society, 2016.
  • Rok: 2016
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN. We build compact feature vectors that encode several image regions without the need to feed multiple inputs to the network. Furthermore, we extend integral images to handle max-pooling on convolutional layer activations, allowing us to efficiently localize matching objects. The resulting bounding box is finally used for image re-ranking. As a result, this paper significantly improves existing CNN-based recognition pipeline: We report for the first time results competing with traditional methods on the challenging Oxford5k and Paris6k datasets.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk