doc. Georgios Tolias, Ph.D.

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-Level Retrieval

Autoři: Ing. Pavel Šuma, Georgios Kordopatis-Zilos, Ph.D., Iscen, A., doc. Georgios Tolias, Ph.D.,
Publikace: Computer Vision – ECCV 2024, Part LIX. Springer, Cham, 2025. p. 307-325. LNCS. vol. 15117. ISSN 0302-9743. ISBN 978-3-031-73201-0.
Rok: 2025

DOI: 10.1007/978-3-031-73202-7_18
Odkaz: https://doi.org/10.1007/978-3-031-73202-7_18
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
This work investigates the problem of instance-level image retrieval re-ranking with the constraint of memory efficiency, ultimately aiming to limit memory usage to 1KB per image. Departing from the prevalent focus on performance enhancements, this work prioritizes the crucial trade-off between performance and memory requirements. The proposed model uses a transformer-based architecture designed to estimate image-to-image similarity by capturing interactions within and across images based on their local descriptors. A distinctive property of the model is the capability for asymmetric similarity estimation. Database images are represented with a smaller number of descriptors compared to query images, enabling performance improvements without increasing memory consumption. To ensure adaptability across different applications, a universal model is introduced that adjusts to a varying number of local descriptors during the testing phase. Results on standard benchmarks demonstrate the superiority of our approach over both hand-crafted and learned models. In particular, compared with current state-of-the-art methods that overlook their memory footprint, our approach not only attains superior performance but does so with a significantly reduced memory footprint. The code and pretrained models are publicly available at: https://github.com/pavelsuma/ames

Composed Image Retrieval for Remote Sensing

Autoři: Vasileios Psomas, Ph.D., Kakogeorgiou, I., Ing. Nikolaos Efthymiadis, doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D., Avrithis, Y., Karantzalos, K.
Publikace: IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium Proceedings. Piscataway: IEEE, 2024. p. 8526-8534. ISSN 2153-6996. ISBN 979-8-3503-6033-2.
Rok: 2024

DOI: 10.1109/IGARSS53475.2024.10642874
Odkaz: https://doi.org/10.1109/IGARSS53475.2024.10642874
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
This work introduces composed image retrieval to remote sensing. It allows to query a large image archive by image examples alternated by a textual description, enriching the descriptive power over unimodal queries, either visual or textual. Various attributes can be modified by the textual part, such as shape, color, or context. A novel method fusing image-to-image and text-to-image similarity is introduced. We demonstrate that a vision-language model possesses sufficient descriptive power and no further learning step or training data are necessary. We present a new evaluation benchmark focused on color, context, density, existence, quantity, and shape modifications. Our work not only sets the state-of-the-art for this task, but also serves as a foundational step in addressing a gap in the field of remote sensing image retrieval. Code at: https://github.com/billpsomas/rscir.

HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer

Autoři: Wang, S., Laskar, Z., Melekhov, I., Li, X., Zhao, Y., doc. Georgios Tolias, Ph.D., Kannala, J.
Publikace: International Journal of Computer Vision. 2024, 132(7), 2530-2550. ISSN 0920-5691.
Rok: 2024

DOI: 10.1007/s11263-023-01982-9
Odkaz: https://doi.org/10.1007/s11263-023-01982-9
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments. It sets a new state-of-the-art for single-image localization on the 7-Scenes, 12-Scenes, Cambridge Landmarks datasets, and the combined indoor scenes.

Label Propagation for Zero-shot Classification with Vision-Language Models

Autoři: Ing. Vladan Stojnić, Kalantidis, Y., doc. Georgios Tolias, Ph.D.,
Publikace: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos: IEEE Computer Society, 2024. p. 23209-23218. ISSN 2575-7075. ISBN 979-8-3503-5300-6.
Rok: 2024

DOI: 10.1109/CVPR52733.2024.02190
Odkaz: https://doi.org/10.1109/CVPR52733.2024.02190
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Vision-Language Models (VLMs) have demonstrated im-pressive performance on zero-shot classification, i.e. classi-fication when provided merely with a list of class names. In this paper, we tackle the case of zero-shot classification in the presence of unlabeled data. We leverage the graph structure of the unlabeled data and introduce ZLaP, a method based on label propagation (LP) that utilizes geodesic distances for classification. We tailor LP to graphs containing both text and image features and further pro-pose an efficient method for performing inductive infer-ence based on a dual solution and a sparsification step. We perform extensive experiments to evaluate the effectiveness of our method on 14 common datasets and show that ZLaP outperforms the latest related works. Code: https://github.com/vladan-stojnic/ZLaP

The 2023 video similarity dataset and challenge

Autoři: Pizzi, E., Georgios Kordopatis-Zilos, Ph.D., Patel, H., Postelnicu, G., doc. Georgios Tolias, Ph.D.,
Publikace: Computer Vision and Image Understanding. 2024, 243 ISSN 1077-3142.
Rok: 2024

DOI: 10.1016/j.cviu.2024.103997
Odkaz: https://doi.org/10.1016/j.cviu.2024.103997
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
This work introduces a dataset, benchmark, and challenge for the problem of video copy tracing. There are two related tasks: determining whether a query video shares content with a reference video ("detection") and temporally localizing the shared content within each video ("localization"). The benchmark is designed to evaluate methods on these two tasks. It simulates a realistic needle -in -haystack setting, where the majority of both query and reference videos are "distractors"containing no copied content. We propose an accuracy metric for both tasks. The associated challenge imposes computing resource restrictions that reflect realworld settings. We also analyze the results and methods of the top submissions to the challenge. The dataset, baseline methods, and evaluation code are publicly available and were discussed at the Visual Copy Detection Workshop (VCDW) at CVPR'23. We provide reference code for evaluation and baselines at: https://github.com/facebookresearch/vsc2022.

Training Ensembles with Inliers and Outliers for Semi-supervised Active Learning

Autoři: Ing. Vladan Stojnić, Laskar, Z., doc. Georgios Tolias, Ph.D.,
Publikace: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE, 2024. p. 259-268. ISSN 2642-9381. ISBN 979-8-3503-1892-0.
Rok: 2024

DOI: 10.1109/WACV57701.2024.00033
Odkaz: https://doi.org/10.1109/WACV57701.2024.00033
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Deep active learning in the presence of outlier examples poses a realistic yet challenging scenario. Acquiring unlabeled data for annotation requires a delicate balance between avoiding outliers to conserve the annotation budget and prioritizing useful inlier examples for effective training. In this work, we present an approach that leverages three highly synergistic components, which are identified as key ingredients: joint classifier training with inliers and outliers, semi-supervised learning through pseudo-labeling, and model ensembling. Our work demonstrates that ensembling significantly enhances the accuracy of pseudolabeling and improves the quality of data acquisition. By enabling semi-supervision through the joint training process, where outliers are properly handled, we observe a substantial boost in classifier accuracy through the use of all available unlabeled examples. Notably, we reveal that the integration of joint training renders explicit outlier detection unnecessary; a conventional component for acquisition in prior work. The three key components align seamlessly with numerous existing approaches. Through empirical evaluations, we showcase that their combined use leads to a performance increase. Remarkably, despite its simplicity, our proposed approach outperforms all other methods in terms of performance. Code: https://github.com/vladan-stojnic/active-outliers

Large-to-small Image Resolution Asymmetry in Deep Metric Learning

Autoři: Ing. Pavel Šuma, doc. Georgios Tolias, Ph.D.,
Publikace: Proc. of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE, 2023. p. 1451-1460. ISSN 2642-9381. ISBN 978-1-6654-9346-8.
Rok: 2023

DOI: 10.1109/WACV56688.2023.00150
Odkaz: https://doi.org/10.1109/WACV56688.2023.00150
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Deep metric learning for vision is trained by optimizing a representation network to map (non-)matching image pairs to (non-)similar representations. During testing, which typically corresponds to image retrieval, both database and query examples are processed by the same network to obtain the representation used for similarity estimation and ranking. In this work, we explore an asymmetric setup by light-weight processing of the query at a small image resolution to enable fast representation extraction. The goal is to obtain a network for database examples that is trained to operate on large resolution images and benefits from fine-grained image details, and a second network for query examples that operates on small resolution images but preserves a representation space aligned with that of the database network. We achieve this with a distillation approach that transfers knowledge from a fixed teacher network to a student via a loss that operates per image and solely relies on coupled augmentations without the use of any labels. In contrast to prior work that explores such asymmetry from the point of view of different network architectures, this work uses the same architecture but modifies the image resolution. We conclude that resolution asymmetry is a better way to optimize the performance/efficiency trade-off than architecture asymmetry. Evaluation is performed on three standard deep metric learning benchmarks, namely CUB200, Cars196, and SOP. Code: https://github.com/pavelsuma/raml

Rethinking matching-based few-shot action recognition

Autoři: Bertrand, J., Kalantidis, Y., doc. Georgios Tolias, Ph.D.,
Publikace: SCIA 2023: Image Analysis, Part I. Cham: Springer, 2023. p. 215-236. LNCS. vol. 13885. ISSN 0302-9743. ISBN 978-3-031-31434-6.
Rok: 2023

DOI: 10.1007/978-3-031-31435-3_15
Odkaz: https://doi.org/10.1007/978-3-031-31435-3_15
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Few-shot action recognition, i.e. recognizing new action classes given only a few examples, beneﬁts from incorporating temporal information. Prior work either encodes such information in the representation itself and learns classiﬁers at test time, or obtains frame-level features and performs pairwise temporal matching. We ﬁrst evaluate a number of matching-based approaches using features from spatio-temporal back- bones, a comparison missing from the literature, and show that the gap in performance between simple baselines and more complicated methods is signiﬁcantly reduced. Inspired by this, we propose Chamfer++, a non-temporal matching function that achieves state-of-the-art results in few-shot action recognition. We show that, when starting from temporal features, our parameter-free and interpretable approach can outperform all other matching-based and classiﬁer methods for one-shot action recognition on three common datasets without using temporal information in the matching stage. Project page: https://jbertrand89.github.io/matching-based-fsar

Self-Supervised Video Similarity Learning

Autoři: Georgios Kordopatis-Zilos, Ph.D., doc. Georgios Tolias, Ph.D., Tzelepis, C., Kompatsiaris, I., Patras, I., Papadopoulos, S.
Publikace: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Whorkshops (CVPRW). USA: IEEE Computer Society, 2023. p. 4756-4766. ISSN 2160-7516. ISBN 979-8-3503-0249-3.
Rok: 2023

DOI: 10.1109/CVPRW59228.2023.00504
Odkaz: https://doi.org/10.1109/CVPRW59228.2023.00504
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
We introduce S2VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning. Here, in contrast to prior work, SSL is used to perform video similarity learning and address multiple retrieval and detection tasks at once with no use of labeled data. This is achieved by learning via instance-discrimination with task-tailored augmentations and the widely used InfoNCE loss together with an additional loss operating jointly on self-similarity and hard-negative similarity. We benchmark our method on tasks where video relevance is defined with varying granularity, ranging from video copies to videos depicting the same incident or event. We learn a single universal model that achieves state-of-the-art performance on all tasks, surpassing previously proposed methods that use labeled data. The code and pretrained models are publicly available at: https://github.com/gkordo/s2vs

Test-time Training for Matching-based Video Object Segmentation

Autoři: Bertrand, J., Georgios Kordopatis-Zilos, Ph.D., Kalantidis, Y., doc. Georgios Tolias, Ph.D.,
Publikace: Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Montreal: Neural Information Processing Society, 2023. vol. 36. ISSN 1049-5258.
Rok: 2023

Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
The video object segmentation (VOS) task involves the segmentation of an object over time based on a single initial mask. Current state-of-the-art approaches use a memory of previously processed frames and rely on matching to estimate segmentation masks of subsequent frames. Lacking any adaptation mechanism, such methods are prone to test-time distribution shifts. This work focuses on matching-based VOS under distribution shifts such as video corruptions, stylization, and sim-to-real transfer. We explore test-time training strategies that are agnostic to the specific task as well as strategies that are designed specifically for VOS. This includes a variant based on mask cycle consistency tailored to matching-based VOS methods. The experimental results on common benchmarks demonstrate that the proposed test-time training yields significant improvements in performance. In particular for the sim-to-real scenario and despite using only a single test video, our approach manages to recover a substantial portion of the performance gain achieved through training on real videos. Additionally, we introduce DAVIS-C, an augmented version of the popular DAVIS test set, featuring extreme distribution shifts like image-/video-level corruptions and stylizations. Our results illustrate that test-time training enhances performance even in these challenging cases.

Edge Augmentation for Large-Scale Sketch Recognition without Sketches

Autoři: Ing. Nikolaos Efthymiadis, doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: 2022 26th International Conference on Pattern Recognition (ICPR). Piscataway: IEEE, 2022. p. 3595-3602. ISSN 2831-7475. ISBN 978-1-6654-9062-7.
Rok: 2022

DOI: 10.1109/ICPR56361.2022.9956233
Odkaz: https://doi.org/10.1109/ICPR56361.2022.9956233
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
This work addresses scaling up the sketch classification task into a large number of categories. Collecting sketches for training is a slow and tedious process that has so far precluded any attempts to large-scale sketch recognition. We overcome the lack of training sketch data by exploiting labeled collections of natural images that are easier to obtain. To bridge the domain gap we present a novel augmentation technique that is tailored to the task of learning sketch recognition from a training set of natural images. Randomization is introduced in the parameters of edge detection and edge selection. Natural images are translated to a pseudo-novel domain called "randomized Binary Thin Edges" (rBTE), which is used as a training domain instead of natural images. The ability to scale up is demonstrated by training CNN-based sketch recognition of more than 2.5 times larger number of categories than used previously. For this purpose, a dataset of natural images from 874 categories is constructed by combining a number of popular computer vision datasets. The categories are selected to be suitable for sketch recognition. To estimate the performance, a subset of 393 categories with sketches is also collected.

Recall@k Surrogate Loss with Large Batches and Similarity Mixup

Autoři: Patel, Y., doc. Georgios Tolias, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: Proceeding 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022. p. 7492-7501. ISSN 2575-7075. ISBN 978-1-6654-6946-3.
Rok: 2022

DOI: 10.1109/CVPR52688.2022.00735
Odkaz: https://doi.org/10.1109/CVPR52688.2022.00735
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
This work focuses on learning deep visual representation models for retrieval by exploring the interplay between a new loss function, the batch size, and a new regularization approach. Direct optimization, by gradient descent, of an evaluation metric, is not possible when it is non-differentiable, which is the case for recall in retrieval. A differentiable surrogate loss for the recall is proposed in this work. Using an implementation that sidesteps the hardware constraints of the GPU memory, the method trains with a very large batch size, which is essential for metrics computed on the entire retrieval database. It is assisted by an efficient mixup regularization approach that operates on pairwise scalar similarities and virtually increases the batch size further. The suggested method achieves state-of-the-art performance in several image retrieval benchmarks when used for deep metric learning. For instance-level recognition, the method outperforms similar approaches that train using an approximation of average precision.

Results and findings of the 2021 Image Similarity Challenge

Autoři: Papakipos, Z., doc. Georgios Tolias, Ph.D., Jeníček, T., Pizzi, E., Yokoo, S., Wang, W., Sun, Y., Zhang, W., Yang, Y., Addicam, S., Papadakis, S.M., Ferrer, C.C., prof. Mgr. Ondřej Chum, Ph.D., Douze, M.
Publikace: Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. Proceedings of Machine Learning Research, 2022. p. 1-12. vol. 176. ISSN 1938-7228.
Rok: 2022

Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
The 2021 Image Similarity Challenge introduced a dataset to serve as a benchmark to evaluate image copy detection methods. There were 200 participants to the competition. This paper presents a quantitative and qualitative analysis of the top submissions. It appears that the most difficult image transformations involve either severe image crops or overlaying onto unrelated images, combined with local pixel perturbations. The key algorithmic elements in the winning submissions are: training on strong augmentations, self-supervised learning, score normalization, explicit overlay detection, and global descriptor matching followed by pairwise image comparison.

The Met Dataset:Instance-level Recognition for Artworks

Autoři: Ing. Nikolaos-Antonios Ypsilantis, Garcia, N., Han, G., Ibrahimi, S., van Noord, N., doc. Georgios Tolias, Ph.D.,
Publikace: NeurIPS Datasets and Benchmarks 2021: The Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. Neural Information Processing Systems Foundation, Inc., 2022. ISBN 978-1-7138-7109-5.
Rok: 2022

Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
This work introduces a dataset for large-scale instance-level recognition in the do-main of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes.We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhibit with photos taken under studio conditions. Testing is primarily performed on photos taken by museum guests depicting exhibits, which introduces a distribution shift between training and testing. Testing is additionally performed on a set of images not related to Met exhibits making the task resemble an out-of-distribution detection problem. The proposed benchmark follows the paradigm of other recent datasets for instance-level recognition on different domains to encourage research on domain independent approaches. A number of suitable approaches are evaluated to offer a testbed for future comparisons. Self-supervised and supervised contrastive learning are effectively combined to train the backbone which is used for non-parametric classification that is shown as a promising direction. Dataset webpage: http://cmp.felk.cvut.cz/met/.

Graph convolutional networks for learning with few clean and many noisy labels

Autoři: Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., prof. Mgr. Ondřej Chum, Ph.D., Schmid, C.
Publikace: Computer Vision - ECCV 2020, Part X. Cham: Springer International Publishing, 2020. p. 286-302. LNCS. vol. 12355. ISSN 0302-9743. ISBN 978-3-030-58606-5.
Rok: 2020

DOI: 10.1007/978-3-030-58607-2_17
Odkaz: https://doi.org/10.1007/978-3-030-58607-2_17
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
In this work we consider the problem of learning a classifier from noisy labels when a few clean labeled examples are given. The structure of clean and noisy data is modeled by a graph per class and Graph Convolutional Networks (GCN) are used to predict class relevance of noisy examples. For each class, the GCN is treated as a binary classifier, which learns to discriminate clean from noisy examples using a weighted binary cross-entropy loss function. The GCN-inferred “clean” probability is then exploited as a relevance measure. Each noisy example is weighted by its relevance when learning a classifier for the end task. We evaluate our method on an extended version of a few-shot learning problem, where the few clean examples of novel classes are supplemented with additional noisy data. Experimental results show that our GCNbased cleaning process significantly improves the classification accuracy over not cleaning the noisy data, as well as standard few-shot classification where only few clean examples are used.

Learning and aggregating deep local descriptors for instance-level recognition

Autoři: doc. Georgios Tolias, Ph.D., Jeníček, T., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: Computer Vision - ECCV 2020, Part I. Cham: Springer International Publishing, 2020. p. 460-477. LNCS. vol. 12346. ISSN 0302-9743. ISBN 978-3-030-58451-1.
Rok: 2020

DOI: 10.1007/978-3-030-58452-8_27
Odkaz: https://doi.org/10.1007/978-3-030-58452-8_27
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
We propose an efficient method to learn deep local descriptors for instance-level recognition. The training only requires examples of positive and negative image pairs and is performed as metric learning of sum-pooled global image descriptors. At inference, the local descriptors are provided by the activations of internal components of the network. We demonstrate why such an approach learns local descriptors that work well for image similarity estimation with classical efficient match kernel methods. The experimental validation studies the trade-off between performance and memory requirements of the state-of-the-art image search approach based on match kernels. Compared to existing local descriptors, the proposed ones perform better in two instance-level recognition tasks and keep memory requirements lower. We experimentally show that global descriptors are not effective enough at large scale and that local descriptors are essential. We achieve state-of-the-art performance, in some cases even with a backbone network as small as ResNet18.

Explicit Spatial Encoding for Deep Local Descriptors

Autoři: Mukundan, A., doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: CVPR 2019: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2019. p. 9386-9395. vol. 2019. ISSN 2575-7075. ISBN 978-1-7281-3293-8.
Rok: 2019

DOI: 10.1109/CVPR.2019.00962
Odkaz: https://doi.org/10.1109/CVPR.2019.00962
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
We propose a kernelized deep local-patch descriptor based on efficient match kernels of neural network activations. Response of each receptive field is encoded together with its spatial location using explicit feature maps. Two location parametrizations, Cartesian and polar, are used to provide robustness to a different types of canonical patch misalignment. Additionally, we analyze how the conventional architecture, i.e. a fully connected layer attached after the convolutional part, encodes responses in a spatially variant way. In contrary, explicit spatial encoding is used in our descriptor, whose potential applications are not limited to local-patches. We evaluate the descriptor on standard benchmarks. Both versions, encoding 32x32 or 64x64 patches, consistently outperform all other methods on all benchmarks. The number of parameters of the model is independent of the input patch resolution.

Fine-tuning CNN Image Retrieval with No Human Annotation

Autoři: Radenović, F., doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019, 41(7), 1655-1668. ISSN 0162-8828.
Rok: 2019

DOI: 10.1109/TPAMI.2018.2846566
Odkaz: https://doi.org/10.1109/TPAMI.2018.2846566
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Image descriptors based on activations of Convolutional Neural Networks (CNNs) have become dominant in image retrieval due to their discriminative power, compactness of representation, and search efficiency. Training of CNNs, either from scratch or fine-tuning, requires a large amount of annotated data, where a high quality of annotation is often crucial. In this work, we propose to fine-tune CNNs for image retrieval on a large collection of unordered images in a fully automated manner. Reconstructed 3D models obtained by the state-of-the-art retrieval and structure-from-motion methods guide the selection of the training data. We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval. CNN descriptor whitening discriminatively learned from the same training data outperforms commonly used PCA whitening. We propose a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and average pooling and show that it boosts retrieval performance. Applying the proposed method to the VGG network achieves state-of-the-art performance on the standard benchmarks: Oxford Buildings, Paris, and Holidays datasets.

Graph-based particular object discovery

Autoři: Simeoni, O., Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: Machine Vision and Applications. 2019, 30(2), 243-254. ISSN 0932-8092.
Rok: 2019

DOI: 10.1007/s00138-019-01005-z
Odkaz: https://doi.org/10.1007/s00138-019-01005-z
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Severe background clutter is challenging in many computer vision tasks, including large-scale image retrieval. Global descriptors, which are popular due to their memory and search efficiency, are especially prone to corruption by such a clutter. Eliminating the impact of the clutter on the image descriptor increases the chance of retrieving relevant images and prevents topic drift due to actually retrieving the clutter in the case of query expansion. In this work, we propose a novel salient region detection method. It captures, in an unsupervised manner, patterns that are both discriminative and common in the dataset. Saliency is based on a centrality measure of a nearest neighbor graph constructed from regional CNN representations of dataset images. The proposed method exploits recent CNN architectures trained for object retrieval to construct the image representation from the salient regions. We improve particular object retrieval on challenging datasets containing small objects.

Hybrid Diffusion: Spectral-Temporal Graph Filtering for Manifold Ranking

Autoři: Iscen, A., Avrithis, Y., doc. Georgios Tolias, Ph.D., Furon, T., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: ACCV 2018: Proceedings of the 14th Asian Conference on Computer Vision, Part II. Springer, 2019. p. 301-316. LNCS. vol. 11362. ISSN 0302-9743. ISBN 978-3-030-20889-9.
Rok: 2019

DOI: 10.1007/978-3-030-20890-5_20
Odkaz: https://doi.org/10.1007/978-3-030-20890-5_20
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
State of the art image retrieval performance is achieved with CNN features and manifold ranking using a k-NN similarity graph that is pre-computed off-line. The two most successful existing approaches are temporal filtering, where manifold ranking amounts to solving a sparse linear system online, and spectral filtering, where eigen-decomposition of the adjacency matrix is performed off-line and then manifold ranking amounts to dot-product search online. The former suffers from expensive queries and the latter from significant space overhead. Here we introduce a novel, theoretically well-founded hybrid filtering approach allowing full control of the space-time trade-off between these two extremes. Experimentally, we verify that our hybrid method delivers results on par with the state of the art, with lower memory demands compared to spectral filtering approaches and faster compared to temporal filtering.

Label Propagation for Deep Semi-supervised Learning

Autoři: Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: CVPR 2019: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2019. p. 5065-5074. vol. 2019. ISSN 2575-7075. ISBN 978-1-7281-3293-8.
Rok: 2019

DOI: 10.1109/CVPR.2019.00521
Odkaz: https://doi.org/10.1109/CVPR.2019.00521
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Semi-supervised learning is becoming increasingly important because it can combine data carefully labeled by humans with abundant unlabeled data to train deep neural networks. Classic methods on semi-supervised learning that have focused on transductive learning have not been fully exploited in the inductive framework followed by modern deep learning. The same holds for the manifold assumption---that similar examples should get the same prediction. In this work, we employ a transductive label propagation method that is based on the manifold assumption to make predictions on the entire dataset and use these predictions to generate pseudo-labels for the unlabeled data and train a deep neural network. At the core of the transductive method lies a nearest neighbor graph of the dataset that we create based on the embeddings of the same network. Therefore our learning process iterates between these two steps. We improve performance on several datasets especially in the few labels regime and show that our work is complementary to current state of the art.

Targeted Mismatch Adversarial Attack: Query With a Flower to Retrieve the Tower

Autoři: doc. Georgios Tolias, Ph.D., Radenović, F., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: 2019 IEEE International Conference on Computer Vision (ICCV 2019). Los Alamitos: IEEE Computer Society Press, 2019. p. 5036-5045. ISSN 2380-7504. ISBN 978-1-7281-4803-8.
Rok: 2019

DOI: 10.1109/ICCV.2019.00514
Odkaz: https://doi.org/10.1109/ICCV.2019.00514
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Access to online visual search engines implies sharing of private user content -- the query images. We introduce the concept of targeted mismatch attack for deep learning based retrieval systems to generate an adversarial image to conceal the query image. The generated image looks nothing like the user intended query, but leads to identical or very similar retrieval results. Transferring attacks to fully unseen networks is challenging. We show successful attacks to partially unknown systems, by designing various loss functions for the adversarial image construction. These include loss functions, for example, for unknown global pooling operation or unknown input resolution by the retrieval system. We evaluate the attacks on standard retrieval benchmarks and compare the results retrieved with the original and adversarial image.

Understanding and Improving Kernel Local Descriptors

Autoři: Mukundan, A., doc. Georgios Tolias, Ph.D., Bursuc, A., Jégou, H., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: International Journal of Computer Vision. 2019, 127(11-12), 1723-1737. ISSN 0920-5691.
Rok: 2019

DOI: 10.1007/s11263-018-1137-8
Odkaz: https://doi.org/10.1007/s11263-018-1137-8
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
We propose a multiple-kernel local-patch descriptor based on efficient match kernels from pixel gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch mis-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Combined with whitening of the descriptor space, that is learned with or without supervision, the performance is significantly improved. We analyze the effect of the whitening on patch similarity and demonstrate its semantic meaning. Our unsupervised variant is the best performing descriptor constructed without the need of labeled data. Despite the simplicity of the proposed descriptor, it competes well with deep learning approaches on a number of different tasks.

Deep Shape Matching

Autoři: Radenović, F., doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: ECCV2018: Proceedings of the European Conference on Computer Vision, Part V. Springer, Cham, 2018. p. 774-791. Lecture Notes in Computer Science. vol. 11209. ISSN 0302-9743. ISBN 978-3-030-01227-4.
Rok: 2018

DOI: 10.1007/978-3-030-01228-1_46
Odkaz: https://doi.org/10.1007/978-3-030-01228-1_46
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks.

Efficient Contour Match Kernel

Autoři: doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: Image and Vision Computing. 2018, 76 14-26. ISSN 0262-8856.
Rok: 2018

DOI: 10.1016/j.imavis.2018.04.006
Odkaz: https://doi.org/10.1016/j.imavis.2018.04.006
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
We propose a novel concept of asymmetric feature maps (AFM), which allows to evaluate multiple kernels between a query and database entries without increasing the memory requirements. To demonstrate the advantages of the AFM method, we derive an efficient contour match kernel – short vector image representation that, due to asymmetric feature maps, supports efficient scale and translation invariant sketch-based image retrieval. Unlike most of the short-code based retrieval systems, the proposed method provides the query localization in the retrieved image. The efficiency of the search is boosted by approximating a 2D translation search via trigonometric polynomial of scores by 1D projections. The projections are a special case of AFM. An order of magnitude speed-up is achieved compared to traditional trigonometric polynomials. The results are boosted by an image-based average query expansion approach and, without any learning, significantly outperform the state-of-the-art hand-crafted descriptors on standard benchmarks. Our method competes well with recent CNN-based approaches that require large amounts of labeled sketches, images and sketch-image pairs.

Fast Spectral Ranking for Similarity Search

Autoři: Iscen, A., Avrithis, Y., doc. Georgios Tolias, Ph.D., Furon, T., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: CVPR 2018: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018. p. 7632-7641. ISSN 2575-7075. ISBN 978-1-5386-6420-9.
Rok: 2018

DOI: 10.1109/CVPR.2018.00796
Odkaz: https://doi.org/10.1109/CVPR.2018.00796
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Despite the success of deep learning on representing images for particular object retrieval, recent studies show that the learned representations still lie on manifolds in a high dimensional space. This makes the Euclidean nearest neighbor search biased for this task. Exploring the manifolds online remains expensive even if a nearest neighbor graph has been computed offline. This work introduces an explicit embedding reducing manifold search to Euclidean search followed by dot product similarity search. This is equivalent to linear graph filtering of a sparse signal in the frequency domain. To speed up online search, we compute an approximate Fourier basis of the graph offline. We improve the state of art on particular object retrieval datasets including the challenging Instre dataset containing small objects. At a scale of 105 images, the offl

Mining on Manifolds: Metric Learning without Labels

Autoři: Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: CVPR 2018: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018. p. 7642-7651. ISSN 2575-7075. ISBN 978-1-5386-6420-9.
Rok: 2018

DOI: 10.1109/CVPR.2018.00797
Odkaz: https://doi.org/10.1109/CVPR.2018.00797
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
In this work we present a novel unsupervised framework for hard training example mining. The only input to the method is a collection of images relevant to the target application and a meaningful initial representation, provided e.g. by pre-trained CNN. Positive examples are distant points on a single manifold, while negative examples are nearby points on different manifolds. Both types of examples are revealed by disagreements between Euclidean and manifold similarities. The discovered examples can be used in training with any discriminative loss. The method is applied to unsupervised fine-tuning of pre-trained networks for fine-grained classification and particular object retrieval. Our models are on par or are outperforming prior models that are fully or partially supervised.

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Autoři: Radenović, F., Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: CVPR 2018: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018. p. 5706-5715. ISSN 2575-7075. ISBN 978-1-5386-6420-9.
Rok: 2018

DOI: 10.1109/CVPR.2018.00598
Odkaz: https://doi.org/10.1109/CVPR.2018.00598
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
In this paper we address issues with image retrieval benchmarking on standard and popular Oxford 5k and Paris 6k datasets. In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth. Three new protocols of varying difficulty are introduced. The protocols allow fair comparison between different methods, including those using a dataset pre-processing stage. For each dataset, 15 new challenging queries are introduced. Finally, a new set of 1M hard, semi automatically cleaned distractors is selected. An extensive comparison of the state-of-the-art methods is performed on the new benchmark. Different types of methods are evaluated, ranging from local-feature-based to modern CNN based methods. The best results are achieved by taking the best of the two worlds. Most importantly, image retrieval appears far from being solved.

Unsupervised object discovery for instance recognition

Autoři: Simeoni, O., Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018. Institute of Electrical and Electronics Engineers Inc, 2018. p. 1745-1754. ISSN 2472-6737. ISBN 978-1-5386-4886-5.
Rok: 2018

DOI: 10.1109/WACV.2018.00194
Odkaz: https://doi.org/10.1109/WACV.2018.00194
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Severe background clutter is challenging in many computer vision tasks, including large-scale image retrieval. Global descriptors, that are popular due to their memory and search efficiency, are especially prone to corruption by such a clutter. Eliminating the impact of the clutter on the image descriptor increases the chance of retrieving relevant images and prevents topic drift due to actually retrieving the clutter in the case of query expansion. In this work, we propose a novel salient region detection method. It captures, in an unsupervised manner, patterns that are both discriminative and common in the dataset. Saliency is based on a centrality measure of a nearest neighbor graph constructed from regional CNN representations of dataset images. The descriptors derived from the salient regions improve particular object retrieval, most noticeably in a large collections containing small objects.

Asymmetric Feature Maps with Application to Sketch Based Retrieval

Autoři: doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: CVPR 2017: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, 2017. p. 6185-6193. vol. January. ISSN 1063-6919. ISBN 978-1-5386-0457-1.
Rok: 2017

DOI: 10.1109/CVPR.2017.655
Odkaz: https://doi.org/10.1109/CVPR.2017.655
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
We propose a novel concept of asymmetric feature maps (AFM), which allows to evaluate multiple kernels between a query and database entries without increasing the memory requirements. To demonstrate the advantages of the AFM method, we derive a short vector image representation that, due to asymmetric feature maps, supports efficient scale and translation invariant sketch-based image retrieval. Unlike most of the short-code based retrieval systems, the proposed method provides the query localization in the retrieved image. The efficiency of the search is boosted by approximating a 2D translation search via trigonometric polynomial of scores by 1D projections. The projections are a special case of AFM. An order of magnitude speed-up is achieved compared to traditional trigonometric polynomials. The results are boosted by an image-based average query expansion, exceeding significantly the state of the art on standard benchmarks.

Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations

Autoři: Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., Furon, T., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: CVPR 2017: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, 2017. p. 926-935. vol. January. ISSN 1063-6919. ISBN 978-1-5386-0457-1.
Rok: 2017

DOI: 10.1109/CVPR.2017.105
Odkaz: https://doi.org/10.1109/CVPR.2017.105
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Query expansion is a popular method to improve the quality of image retrieval with both conventional and CNN representations. It has been so far limited to global image similarity. This work focuses on diffusion, a mechanism that captures the image manifold in the feature space. The diffusion is carried out on descriptors of overlapping image regions rather than on a global image descriptor like in previous approaches. An efficient off-line stage allows optional reduction in the number of stored regions. In the on-line stage, the proposed handling of unseen queries in the indexing stage removes additional computation to adjust the precomputed data. We perform diffusion through a sparse linear system solver, yielding practical query times well below one second. Experimentally, we observe a significant boost in performance of image retrieval with compact CNN descriptors on standard benchmarks, especially when the query object covers only a small part of the image. Small objects have been a common failure case of CNN-based retrieval.

Multiple-Kernel Local-Patch Descriptor

Autoři: Mukundan, A., doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: 28th British Machine Vision Conference 2017 Proceedings. British Machine Vision Association, 2017. ISBN 978-1-901725-60-5.
Rok: 2017

DOI: 10.5244/C.31.184
Odkaz: https://doi.org/10.5244/C.31.184
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
We propose a multiple-kernel local-patch descriptor based on efficient match kernels of patch gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch miss-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Even though handcrafted, the proposed method consistently outperforms the state-of-the-art methods on two local patch benchmarks.

Panorama to panorama matching for location recognition

Autoři: Iscen, A., doc. Georgios Tolias, Ph.D., Avrithis, Y., Furon, T., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: ICMR '17 Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. New York: ACM, 2017. p. 392-396. ISBN 978-1-4503-4701-3.
Rok: 2017

DOI: 10.1145/3078971.3079033
Odkaz: https://doi.org/10.1145/3078971.3079033
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Location recognition is commonly treated as visual instance retrieval on “street view” imagery. i.e. dataset items and queries are panoramic views, i.e. groups of images taken at a single location. This work introduces a novel panorama-to-panorama matching process, either by aggregating features of individual images in a group or by explicitly constructing a larger panorama. In either case, multiple views are used as queries. We reach near perfect location recognition on a standard benchmark with only four query views.

Robust data whitening as an iteratively re-weighted least squares problem

Autoři: Mukundan, A., doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: Image Analysis. Cham: Springer International Publishing, 2017. p. 234-247. Lecture Notes in Computer Science. vol. 10269. ISSN 0302-9743. ISBN 978-3-319-59125-4.
Rok: 2017

DOI: 10.1007/978-3-319-59126-1_20
Odkaz: https://doi.org/10.1007/978-3-319-59126-1_20
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
The entries of high-dimensional measurements, such as image or feature descriptors, are often correlated, which leads to a bias in similarity estimation. To remove the correlation, a linear transformation, called whitening, is commonly used. In this work, we analyze robust estimation of the whitening transformation in the presence of outliers. Inspired by the Iteratively Re-weighted Least Squares approach, we iterate between centering and applying a transformation matrix, a process which is shown to converge to a solution that minimizes the sum of ℓ2 norms. The approach is developed for unsupervised scenarios, but further extend to supervised cases. We demonstrate the robustness of our method to outliers on synthetic 2D data and also show improvements compared to conventional whitening on real data for image retrieval with CNN-based representation. Finally, our robust estimation is not limited to data whitening, but can be used for robust patch rectification, e.g. with MSER features.

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples

Autoři: Radenovič, F., doc. Georgios Tolias, Ph.D., prof. Mgr. Ondřej Chum, Ph.D.,
Publikace: Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I. Springer, 2016. p. 3-20. Lecture Notes in Computer Science. vol. 9905. ISSN 0302-9743. ISBN 978-3-319-46447-3.
Rok: 2016

DOI: 10.1007/978-3-319-46448-0_1
Odkaz: https://doi.org/10.1007/978-3-319-46448-0_1
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.

Particular object retrieval with integral max-pooling of CNN activations

Autoři: doc. Georgios Tolias, Ph.D., Sicre, R., Jegou, H.
Publikace: International Conference on Learning Representations 2016. Computational and Biological Learning Society, 2016.
Rok: 2016

Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN. We build compact feature vectors that encode several image regions without the need to feed multiple inputs to the network. Furthermore, we extend integral images to handle max-pooling on convolutional layer activations, allowing us to efficiently localize matching objects. The resulting bounding box is finally used for image re-ranking. As a result, this paper significantly improves existing CNN-based recognition pipeline: We report for the first time results competing with traditional methods on the challenging Oxford5k and Paris6k datasets.

doc. Georgios Tolias, Ph.D.

Všechny publikace

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-Level Retrieval

Composed Image Retrieval for Remote Sensing

HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer

Label Propagation for Zero-shot Classification with Vision-Language Models

The 2023 video similarity dataset and challenge

Training Ensembles with Inliers and Outliers for Semi-supervised Active Learning

Large-to-small Image Resolution Asymmetry in Deep Metric Learning

Rethinking matching-based few-shot action recognition

Self-Supervised Video Similarity Learning

Test-time Training for Matching-based Video Object Segmentation

Edge Augmentation for Large-Scale Sketch Recognition without Sketches

Recall@k Surrogate Loss with Large Batches and Similarity Mixup

Results and findings of the 2021 Image Similarity Challenge

The Met Dataset:Instance-level Recognition for Artworks

Graph convolutional networks for learning with few clean and many noisy labels

Learning and aggregating deep local descriptors for instance-level recognition

Explicit Spatial Encoding for Deep Local Descriptors

Fine-tuning CNN Image Retrieval with No Human Annotation

Graph-based particular object discovery

Hybrid Diffusion: Spectral-Temporal Graph Filtering for Manifold Ranking

Label Propagation for Deep Semi-supervised Learning

Targeted Mismatch Adversarial Attack: Query With a Flower to Retrieve the Tower

Understanding and Improving Kernel Local Descriptors

Deep Shape Matching

Efficient Contour Match Kernel

Fast Spectral Ranking for Similarity Search

Mining on Manifolds: Metric Learning without Labels

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Unsupervised object discovery for instance recognition

Asymmetric Feature Maps with Application to Sketch Based Retrieval

Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations

Multiple-Kernel Local-Patch Descriptor

Panorama to panorama matching for location recognition

Robust data whitening as an iteratively re-weighted least squares problem

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples

Particular object retrieval with integral max-pooling of CNN activations

Mějte přehled