Lidé
Ing. Vladan Stojnić
Všechny publikace
ILIAS: Instance-Level Image retrieval At Scale
- Autoři: Georgios Kordopatis-Zilos, Ph.D., Ing. Vladan Stojnić, Manko, A., Ing. Pavel Šuma, Ypsilantis, N., Ing. Nikolaos Efthymiadis, Laskar, Z., prof. Ing. Jiří Matas, Ph.D., prof. Mgr. Ondřej Chum, Ph.D., doc. Georgios Tolias, Ph.D.,
- Publikace: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos: IEEE Computer Society, 2025. p. 14777-14787. ISSN 2575-7075. ISBN 979-8-3315-4364-8.
- Rok: 2025
- DOI: 10.1109/CVPR52734.2025.01377
- Odkaz: https://doi.org/10.1109/CVPR52734.2025.01377
- Pracoviště: Skupina vizuálního rozpoznávání
-
Anotace:
This work introduces ILIAS, a new test dataset for Instance-Level Image retrieval At Scale. It is designed to evaluate the ability of current and future foundation models and retrieval techniques to recognize particular objects. The key benefits over existing datasets include large scale, domain diversity, accurate ground truth, and a performance that is far from saturated. ILIAS includes query and positive images for 1,000 object instances, manually collected to capture challenging conditions and diverse domains. Large-scale retrieval is conducted against 100 million distractor images from YFCC100M. To avoid false negatives without extra annotation effort, we include only query objects confirmed to have emerged after 2014, i.e. the compilation date of YFCC100M. An extensive benchmarking is performed with the following observations: i) models fine-tuned on specific domains, such as landmarks or products, excel in that domain but fail on ILIAS ii) learning a linear adaptation layer using multi-domain class supervision results in performance improvements, especially for vision-language models iii) local descriptors in retrieval re-ranking are still a key ingredient, especially in the presence of severe background clutter iv) the text-to-image performance of the vision-language foundation models is surprisingly close to the corresponding image-to-image case. website: https://vrg.fel.cvut.cz/ilias/
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
- Autoři: Ing. Vladan Stojnić, Kalantidis, Y., prof. Ing. Jiří Matas, Ph.D., doc. Georgios Tolias, Ph.D.,
- Publikace: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos: IEEE Computer Society, 2025. p. 9794-9803. ISSN 2575-7075. ISBN 979-8-3315-4364-8.
- Rok: 2025
- DOI: 10.1109/CVPR52734.2025.00915
- Odkaz: https://doi.org/10.1109/CVPR52734.2025.00915
- Pracoviště: Skupina vizuálního rozpoznávání
-
Anotace:
We propose a training-free method for open-vocabulary semantic segmentation using Vision-and-Language Models (VLMs). Our approach enhances the initial per-patch predictions of VLMs through label propagation, which jointly optimizes predictions by incorporating patch-to-patch relationships. Since VLMs are primarily optimized for cross-modal alignment and not for intra-modal similarity, we use a Vision Model (VM) that is observed to better capture these relationships. We address resolution limitations inherent to patch-based encoders by applying label propagation at the pixel level as a refinement step, significantly improving segmentation accuracy near class boundaries. Our method, called LPOSS+, performs inference over the entire image, avoiding window-based processing and thereby capturing contextual interactions across the full image. LPOSS+ achieves state-of-the-art performance among training-free methods, across a diverse set of datasets. Code: https://github.com/vladan-stojnic/LPOSS
Label Propagation for Zero-shot Classification with Vision-Language Models
- Autoři: Ing. Vladan Stojnić, Kalantidis, Y., doc. Georgios Tolias, Ph.D.,
- Publikace: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos: IEEE Computer Society, 2024. p. 23209-23218. ISSN 2575-7075. ISBN 979-8-3503-5300-6.
- Rok: 2024
- DOI: 10.1109/CVPR52733.2024.02190
- Odkaz: https://doi.org/10.1109/CVPR52733.2024.02190
- Pracoviště: Skupina vizuálního rozpoznávání
-
Anotace:
Vision-Language Models (VLMs) have demonstrated im-pressive performance on zero-shot classification, i.e. classi-fication when provided merely with a list of class names. In this paper, we tackle the case of zero-shot classification in the presence of unlabeled data. We leverage the graph structure of the unlabeled data and introduce ZLaP, a method based on label propagation (LP) that utilizes geodesic distances for classification. We tailor LP to graphs containing both text and image features and further pro-pose an efficient method for performing inductive infer-ence based on a dual solution and a sparsification step. We perform extensive experiments to evaluate the effectiveness of our method on 14 common datasets and show that ZLaP outperforms the latest related works. Code: https://github.com/vladan-stojnic/ZLaP
Training Ensembles with Inliers and Outliers for Semi-supervised Active Learning
- Autoři: Ing. Vladan Stojnić, Laskar, Z., doc. Georgios Tolias, Ph.D.,
- Publikace: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE, 2024. p. 259-268. ISSN 2642-9381. ISBN 979-8-3503-1892-0.
- Rok: 2024
- DOI: 10.1109/WACV57701.2024.00033
- Odkaz: https://doi.org/10.1109/WACV57701.2024.00033
- Pracoviště: Skupina vizuálního rozpoznávání
-
Anotace:
Deep active learning in the presence of outlier examples poses a realistic yet challenging scenario. Acquiring unlabeled data for annotation requires a delicate balance between avoiding outliers to conserve the annotation budget and prioritizing useful inlier examples for effective training. In this work, we present an approach that leverages three highly synergistic components, which are identified as key ingredients: joint classifier training with inliers and outliers, semi-supervised learning through pseudo-labeling, and model ensembling. Our work demonstrates that ensembling significantly enhances the accuracy of pseudolabeling and improves the quality of data acquisition. By enabling semi-supervision through the joint training process, where outliers are properly handled, we observe a substantial boost in classifier accuracy through the use of all available unlabeled examples. Notably, we reveal that the integration of joint training renders explicit outlier detection unnecessary; a conventional component for acquisition in prior work. The three key components align seamlessly with numerous existing approaches. Through empirical evaluations, we showcase that their combined use leads to a performance increase. Remarkably, despite its simplicity, our proposed approach outperforms all other methods in terms of performance. Code: https://github.com/vladan-stojnic/active-outliers