Lidé

prof. Ing. Jiří Matas, Ph.D.

Všechny publikace

Dense Matchers for Dense Tracking

  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Optical flow is a useful input for various applications, including 3D reconstruction, pose estimation, tracking, and structure-from-motion. Despite its utility, the field of dense long-term tracking, especially over wide baselines, has not been extensively explored. This paper extends the concept of combining multiple optical flows over logarithmically spaced intervals as proposed by MFT. We demonstrate the compatibility of MFT with different optical flow networks, yielding results that surpass their individual performance. Moreover, we present a simple yet effective combination of these networks within the MFT framework. This approach proves to be competitive with more sophisticated, non-causal methods in terms of position prediction accuracy, highlighting the potential of MFT in enhancing long-term tracking applications.

A Large-Scale Homography Benchmark

  • Autoři: Barath, D., Mgr. Dmytro Mishkin, Ph.D., Polic, M., Forstner, W., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2023. p. 21360-21370. ISSN 2575-7075. ISBN 979-8-3503-0129-8.
  • Rok: 2023
  • DOI: 10.1109/CVPR52729.2023.02046
  • Odkaz: https://doi.org/10.1109/CVPR52729.2023.02046
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present a large-scale dataset of Planes in 3D, Pi3D, of roughly 1000 planes observed in 10 000 images from the 1DSfM dataset, and HEB, a large-scale homography estimation benchmark leveraging Pi3D. The applications of the Pi3D dataset are diverse, e.g. training or evaluating monocular depth, surface normal estimation and image matching algorithms. The HEB dataset consists of 226 260 homographies and includes roughly 4M correspondences. The homographies link images that often undergo significant viewpoint and illumination changes. As applications of HEB, we perform a rigorous evaluation of a wide range of robust estimators and deep learning-based correspondence filtering methods, establishing the current state-of- the-art in robust homography estimation. We also evalu- ate the uncertainty of the SIFT orientations and scales w.r.t. the ground truth coming from the underlying homographies and provide codes for comparing uncertainty of custom de- tectors. The dataset is available at https://github.com/danini/homography-benchmark.

Adaptive Reordering Sampler with Neurally Guided MAGSAC

  • Autoři: Tong Wei, MSc., prof. Ing. Jiří Matas, Ph.D., Baráth, D.
  • Publikace: ICCV2023: Proceedings of the International Conference on Computer Vision. Piscataway: IEEE, 2023. p. 18117-18127. ISSN 1550-5499. ISBN 979-8-3503-0719-1.
  • Rok: 2023
  • DOI: 10.1109/ICCV51070.2023.01665
  • Odkaz: https://doi.org/10.1109/ICCV51070.2023.01665
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a new sampler for robust estimators that always selects the sample with the highest probability of consisting only of inliers. After every unsuccessful iteration, the inlier probabilities are updated in a principled way via a Bayesian approach. The probabilities obtained by the deep network are used as prior (so-called neural guidance) inside the sampler. Moreover, we introduce a new loss that exploits, in a geometrically justifiable manner, the orientation and scale that can be estimated for any type of feature, e.g., SIFT or SuperPoint, to estimate two-view geometry. The new loss helps to learn higher-order information about the underlying scene geometry. Benefiting from the new sampler and the proposed loss, we combine the neural guidance with the state-of-the-art MAGSAC++. Adaptive Reordering Sampler with Neurally Guided MAGSAC (ARS-MAGSAC) is superior to the state-of-the-art in terms of accuracy and run-time on the PhotoTourism and KITTI datasets for essential and fundamental matrix estimation. The code and trained models are available at https://github.com/weitong8591/ars_magsac.

Binaural SoundNet: Predicting Semantics, Depth and Motion With Binaural Sounds

  • Autoři: Dai, D., Vasudevan, A., prof. Ing. Jiří Matas, Ph.D., Van Gool, L.
  • Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023, 45(1), 123-136. ISSN 0162-8828.
  • Rok: 2023
  • DOI: 10.1109/TPAMI.2022.3155643
  • Odkaz: https://doi.org/10.1109/TPAMI.2022.3155643
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, and the depth map of the scene. To this aim, we propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360$\mathrm{<^>{\circ }}$& LCIRC;camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of multiple vision 'teacher' methods and a sound 'student' method - the student method is trained to generate the same results as the teacher methods do. This way, the auditory system can be trained without using human annotations. To further boost the performance, we propose another novel auxiliary task, coined Spatial Sound Super-Resolution, to increase the directional resolution of sounds. We then formulate the four tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results show that 1) our method achieves good results for all four tasks, 2) the four tasks are mutually beneficial - training them together achieves the best performance, 3) the number and orientation of microphones are both important, and 4) features learned from the standard spectrogram and features obtained by the classic signal processing pipeline are complementary for auditory perception tasks. The data and code are released on the project page: https://www.trace.ethz.ch/publications/2020/sound_perception/index.html.

BOP Challenge 2022 on Detection, Segmentation and Pose Estimation of Specific Rigid Objects

  • Autoři: Sundermeyer, M., Hodaň, T., Labbé, Y., Wang, G., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Whorkshops (CVPRW). USA: IEEE Computer Society, 2023. p. 2785-2794. ISSN 2160-7508. ISBN 979-8-3503-0250-9.
  • Rok: 2023
  • DOI: 10.1109/CVPRW59228.2023.00279
  • Odkaz: https://doi.org/10.1109/CVPRW59228.2023.00279
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present the evaluation methodology, datasets and results of the BOP Challenge 2022, the fourth in a series of public competitions organized with the goal to capture the status quo in the field of 6D object pose estimation from an RGB/RGB-D image. In 2022, we witnessed another significant improvement in the pose estimation accuracy – the state of the art, which was 56.9 AR C in 2019 (Vidal et al.) and 69.8 AR C in 2020 (CosyPose), moved to new heights of 83.7 AR C (GDRNPP). Out of 49 pose estimation methods evaluated since 2019, the top 18 are from 2022. Methods based on point pair features, which were introduced in 2010 and achieved competitive results even in 2020, are now clearly outperformed by deep learning methods. The synthetic-to-real domain gap was again significantly reduced, with 82.7 AR C achieved by GDRNPP trained only on synthetic images from BlenderProc. The fastest variant of GDRNPP reached 80.5 AR C with an average time per image of 0.23s. Since most of the recent methods for 6D object pose estimation begin by detecting/segmenting objects, we also started evaluating 2D object detection and segmentation performance based on the COCO metrics. Compared to the Mask R-CNN results from CosyPose in 2020, detection improved from 60.3 to 77.3 AP C and segmentation from 40.5 to 58.7 AP C . The online evaluation system stays open and is available at: bop.felk.cvut.cz.

Calibrated Out-of-Distribution Detection with a Generic Representation

  • DOI: 10.1109/ICCVW60793.2023.00485
  • Odkaz: https://doi.org/10.1109/ICCVW60793.2023.00485
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Out-of-distribution detection is a common issue in deploying vision models in practice and solving it is an essential building block in safety critical applications. Most of the existing OOD detection solutions focus on improving the OOD robustness of a classification model trained exclusively on in-distribution (ID) data. In this work, we take a different approach and propose to leverage generic pre-trained representation. We propose a novel OOD method, called GROOD, that formulates the OOD detection as a Neyman-Pearson task with well calibrated scores and which achieves excellent performance, predicated by the use of a good generic representation. Only a trivial training process is required for adapting GROOD to a particular problem. The method is simple, general, efficient, calibrated and with only a few hyper-parameters. The method achieves state-of-the-art performance on a number of OOD benchmarks, reaching near perfect performance on several of them. The source code is available at https://github.com/vojirt/GROOD.

DocILE Benchmark for Document Information Localization and Extraction

  • Autoři: Šimsa, Š., Šulc, M., Uřičář, M., Patel, Y., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ICDAR 2023: Proceedings of the Document Analysis and Recognition, Part II. Cham: Springer, 2023. p. 147-166. LNCS. vol. 14188. ISSN 0302-9743. ISBN 978-3-031-41678-1.
  • Rok: 2023
  • DOI: 10.1007/978-3-031-41679-8_9
  • Odkaz: https://doi.org/10.1007/978-3-031-41679-8_9
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly 1M unlabeled documents for unsupervised pre-training. The dataset has been built with knowledge of domain- and task-specific aspects, resulting in the following key features: (i) annotations in 55 classes, which surpasses the granularity of previously published key information extraction datasets by a large margin; (ii) Line Item Recognition represents a highly practical information extraction task, where key information has to be assigned to items in a table; (iii) documents come from numerous layouts and the test set includes zero- and few-shot cases as well as layouts commonly seen in the training set. The benchmark comes with several baselines, including RoBERTa, LayoutLMv3 and DETR-based Table Transformer; applied to both tasks of the DocILE benchmark, with results shared in this paper, offering a quick starting point for future work. The dataset, baselines and supplementary material are available at https://github.com/rossumai/docile.

DoG Accuracy Via Equivariance: Get The Interpolation Right

  • DOI: 10.1109/ICIP49359.2023.10222153
  • Odkaz: https://doi.org/10.1109/ICIP49359.2023.10222153
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We study the influence of image interpolation algorithms on local feature detectors operating on a scale pyramid, focusing on the Difference-of-Gaussian, as used in SIFT. We show that commonly used implementations, such as in OpenCV and Kornia, are neither rotational nor scale equivariant. We present a simple solution and demonstrate its positive influence on the downstream image matching tasks. The implementation of the method has been accepted in standard libraries OpenCV and Kornia.

Efficient Visuo-Haptic Object Shape Completion for Robot Manipulation

  • DOI: 10.1109/IROS55552.2023.10342200
  • Odkaz: https://doi.org/10.1109/IROS55552.2023.10342200
  • Pracoviště: Skupina vizuálního rozpoznávání, Vidění pro roboty a autonomní systémy
  • Anotace:
    For robot manipulation, a complete and accurate object shape is desirable. Here, we present a method that combines visual and haptic reconstruction in a closed-loop pipeline. From an initial viewpoint, the object shape is reconstructed using an implicit surface deep neural network. The location with highest uncertainty is selected for haptic exploration, the object is touched, the new information from touch and a new point cloud from the camera are added, object position is re-estimated and the cycle is repeated. We extend Rustler et al. (2022) by using a new theoretically grounded method to determine the points with highest uncertainty, and we increase the yield of every haptic exploration by adding not only the contact points to the point cloud but also incorporating the empty space established through the robot movement to the object. Additionally, the solution is compact in that the jaws of a closed two-finger gripper are directly used for exploration. The object position is re-estimated after every robot action and multiple objects can be present simultaneously on the table. We achieve a steady improvement with every touch using three different metrics and demonstrate the utility of the better shape reconstruction in grasping experiments on the real robot. On average, grasp success rate increases from 63.3 % to 70.4 % after a single exploratory touch and to 82.7% after five touches. The collected data and code are publicly available (https://osf.io/j6rkd/, https://github.com/ctu-vras/vishac).

Extended Overview of DocILE 2023: Document Information Localization and Extraction

  • Autoři: Šimsa, Š., Uřičář, M., Šulc, M., Patel, Y., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of the Working Notes of CLEF 2023. Aachen: CEUR Workshop Proceedings, 2023. p. 546-571. vol. 3497. ISSN 1613-0073.
  • Rok: 2023
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This paper provides an overview of the DocILE 2023 Competition, its tasks, participant submissions, the competition results and possible future research directions. This first edition of the competition focused on two Information Extraction tasks, Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR). Both of these tasks require detection of pre-defined categories of information in business documents. The second task additionally requires correctly grouping the information into tuples, capturing the structure laid out in the document. The competition used the recently published DocILE dataset and benchmark that stays open to new submissions. The diversity of the participant solutions indicates the potential of the dataset as the submissions included pure Computer Vision, pure Natural Language Processing, as well as multi-modal solutions and utilized all of the parts of the dataset, including the annotated, synthetic and unlabeled subsets. This is an extended version of the condensed overview paper [1].

Finding Geometric Models by Clustering in the Consensus Space

  • Autoři: Barath, D., Ing. Denys Rozumnyi, Eichhardt, I., Hajder, L., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2023. p. 5414-5424. ISSN 2575-7075. ISBN 979-8-3503-0129-8.
  • Rok: 2023
  • DOI: 10.1109/CVPR52729.2023.00524
  • Odkaz: https://doi.org/10.1109/CVPR52729.2023.00524
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. The problem is formalized as finding dominant model instances progressively without forming crisp point-to-model assignments. Dominant instances are found via a RANSAC-like sampling and a consolidation process driven by a model quality function considering previously proposed instances. New ones are found by clustering in the consensus space. This new formulation leads to a simple iterative algorithm with state-of-the-art accuracy while running in real-time on a number of vision problems - at least two orders of magnitude faster than the competitors on two-view motion estimation. Also, we propose a deterministic sampler reflecting the fact that real-world data tend to form spatially coherent structures. The sampler returns connected components in a progressively densified neighborhood-graph. We present a number of applications where the use of multiple geometric models improves accuracy. These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects; and we also propose a way of using multiple homographies in global SfM algorithms. Source code: https://github.com/danini/clustering-in-consensus-space.

Generalized Differentiable RANSAC

  • DOI: 10.1109/ICCV51070.2023.01618
  • Odkaz: https://doi.org/10.1109/ICCV51070.2023.01618
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose ∇-RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. The proposed approach enables the use of relaxation techniques for estimating the gradients in the sampling distribution, which are then propagated through a differentiable solver. The trainable quality function marginalizes over the scores from all the models estimated within ∇-RANSAC to guide the network learning accurate and useful inlier probabilities or to train feature detection and matching networks. Our method directly maximizes the probability of drawing a good hypothesis, allowing us to learn better sampling distributions. We test ∇-RANSAC on various real-world scenarios on fundamental and essential matrix estimation, and 3D point cloud registration, outdoors and indoors, with handcrafted and learning-based features. It is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives. The code and trained models are available at https://github.com/weitong8591/differentiable_ransac.

Guided Video Object Segmentation by Tracking

  • Autoři: Pelhan, J., Kristan, M., Lukezic, A., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Electrotechnical Review. 2023, 90(4), 147-158. ISSN 0013-5852.
  • Rok: 2023
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper presents Guided video object segmentation by tracking (gVOST) method for a human -in-the-loop video object segmentation which significantly reduces the manual annotation effort. The method is designed for an interactive object segmentation in a wide range of videos with a minimal user input. User to iteratively selects and annotates a small set of anchor frames by just a few clicks on the object border. The segmentation then is propagated to intermediate frames. Experiments show that gVOST performs well on diverse and challenging videos used in visual object tracking (VOT2020 dataset) where it achieves an IoU of 73% at only 5% of the user annotated frames. This shortens the annotation time by 98% compared to the brute force approach. gVOST outperforms the state-of-the-art interactive video object segmentation methods on the VOT2020 dataset and performs comparably on a less diverse DAVIS video object segmentation dataset.

Image-Consistent Detection of Road Anomalies As Unpredictable Patches

  • DOI: 10.1109/WACV56688.2023.00545
  • Odkaz: https://doi.org/10.1109/WACV56688.2023.00545
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a novel method for anomaly detection primarily aiming at autonomous driving. The design of the method, called DaCUP (Detection of anomalies as Consistent Unpredictable Patches), is based on two general properties of anomalous objects: an anomaly is (i) not from a class that could be modelled and (ii) it is not similar (in appearance) to non-anomalous objects in the image. To this end, we propose a novel embedding bottleneck in an auto-encoder like architecture that enables modelling of a diverse, multi-modal known class appearance (e.g. road). Secondly, we introduce novel image-conditioned distance features that allow known class identification in a nearest-neighbour manner on-the-fly, greatly increasing its ability to distinguish true and false positives. Lastly, an inpainting module is utilized to model the uniqueness of detected anomalies and significantly reduce false positives by filtering regions that are similar, thus reconstructable from their neighbourhood. We demonstrate that filtering of regions based on their similarity to neighbour regions, using e.g. an inpainting module, is general and can be used with other methods for reduction of false positives. The proposed method is evaluated on several publicly available datasets for road anomaly detection and on a maritime benchmark for obstacle avoidance. The method achieves state-of-the-art performance in both tasks with the same hyper-parameters with no domain specific design.

Overview of DocILE 2023: Document Information Localization and Extraction

  • Autoři: Šimsa, Š., Uřičář, M., Šulc, M., Patel, Y., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of the CLEF 2023: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Cham: Springer, 2023. p. 276-293. LNCS. vol. 14163. ISSN 0302-9743. ISBN 978-3-031-42447-2.
  • Rok: 2023
  • DOI: 10.1007/978-3-031-42448-9_21
  • Odkaz: https://doi.org/10.1007/978-3-031-42448-9_21
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This paper provides an overview of the DocILE 2023 Competition, its tasks, participant submissions, the competition results and possible future research directions. This first edition of the competition focused on two Information Extraction tasks, Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR). Both of these tasks require detection of pre-defined categories of information in business documents. The second task additionally requires correctly grouping the information into tuples, capturing the structure laid out in the document. The competition used the recently published DocILE dataset and benchmark that stays open to new submissions. The diversity of the participant solutions indicates the potential of the dataset as the submissions included pure Computer Vision, pure Natural Language Processing, as well as multi-modal solutions and utilized all of the parts of the dataset, including the annotated, synthetic and unlabeled subsets.

Overview of FungiCLEF 2023: Fungi Recognition Beyond 1/0 Cost

  • Autoři: Picek, L., Šulc, M., Chamidullin, R., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of the Working Notes of CLEF 2023. Aachen: CEUR Workshop Proceedings, 2023. p. 1943-1953. vol. 3497. ISSN 1613-0073.
  • Rok: 2023
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Computer vision systems for fungi recognition aid mycologists, researchers, and enthusiasts in the efficient identification of mushroom species. FungiCLEF 2023, the second edition of the fungi recognition challenge at LifeCLEF, builds upon the Danish Fungi 2020 dataset and upon its predecssor by presenting several recognition tasks differing in the cost functions corresponding to different practical scenarios, including poisonous/edible decision making or discovering unseen species. With practical applications in mind, the 2023 challenge only accepted submissions with model size under 1GB. The competition received 16 final submissions from 3 teams. This overview paper provides a detailed description of the challenge data and tasks, a review of the submitted methods, and a discussion of the results.

Planar Object Tracking via Weighted Optical Flow

  • DOI: 10.1109/WACV56688.2023.00164
  • Odkaz: https://doi.org/10.1109/WACV56688.2023.00164
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose WOFT - a novel method for planar object tracking that estimates a full 8 degrees-of-freedom pose, i.e. the homography w.r.t. a reference view. The method uses a novel module that leverages dense optical flow and assigns a weight to each optical flow correspondence, estimating a homography by weighted least squares in a fully differentiable manner. The trained module assigns zero weights to incorrect correspondences (outliers) in most cases, making the method robust and eliminating the need of the typically used non-differentiable robust estimators like RANSAC. The proposed weighted optical flow tracker (WOFT) achieves state-of-the-art performance on two benchmarks, POT-210 [23] and POIC [7], tracking consistently well across a wide range of scenarios.

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results

  • DOI: 10.1109/ICCVW60793.2023.00195
  • Odkaz: https://doi.org/10.1109/ICCVW60793.2023.00195
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Visual Object Tracking Segmentation VOTS2023 challenge is the eleventh annual tracker benchmarking activity of the VOT initiative. This challenge is the first to merge short-term and long-term as well as single-target and multiple-target tracking with segmentation masks as the only target location specification. A new dataset was created; the ground truth has been withheld to prevent overfitting. New performance measures and evaluation protocols have been created along with a new toolkit and an evaluation server. Results of the presented 47 trackers indicate that modern tracking frameworks are well-suited to deal with convergence of short-term and long-term tracking and that multiple and single target tracking can be considered a single problem. A leaderboard, with participating trackers details, the source code, the datasets, and the evaluation kit are publicly available at the challenge website

The Tenth Visual Object Tracking VOT2022 Challenge Results

  • Autoři: Kristan, M., Leonardis, A., prof. Ing. Jiří Matas, Ph.D., Mgr. Ondřej Drbohlav, Ph.D.,
  • Publikace: Computer Vision – ECCV 2022 Workshops, Part VIII. Cham: Springer, 2023. p. 431-460. Lecture Notes in Computer Science. vol. 13808. ISSN 0302-9743. ISBN 978-3-031-25084-2.
  • Rok: 2023
  • DOI: 10.1007/978-3-031-25085-9_25
  • Odkaz: https://doi.org/10.1007/978-3-031-25085-9_25
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Visual Object Tracking challenge VOT2022 is the tenth annual tracker benchmarking activity organized by the VOT initiative. Results of 93 entries are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2022 challenge was composed of seven sub-challenges focusing on different tracking domains: (i) VOT-STs2022 challenge focused on short-term tracking in RGB by segmentation, (ii) VOT-STb2022 challenge focused on short-term tracking in RGB by bounding boxes, (iii) VOT-RTs2022 challenge focused on “real-time” short-term tracking in RGB by segmentation, (iv) VOT-RTb2022 challenge focused on “real-time” short-term tracking in RGB by bounding boxes, (v) VOT-LT2022 focused on long-term tracking, namely coping with target disappearance and reappearance, (vi) VOT-RGBD2022 challenge focused on short-term tracking in RGB and depth imagery, and (vii) VOT-D2022 challenge focused on short-term tracking in depth-only imagery. New datasets were introduced in VOT-LT2022 and VOT-RGBD2022, VOT-ST2022 dataset was refreshed, and a training dataset was introduced for VOT-LT2022. The source code for most of the trackers, the datasets, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

Tracking by 3D Model Estimation of Unknown Objects in Videos

  • Autoři: Ing. Denys Rozumnyi, prof. Ing. Jiří Matas, Ph.D., Pollefeys, M., Ferrari, V., Oswald, M.R.
  • Publikace: ICCV2023: Proceedings of the International Conference on Computer Vision. Piscataway: IEEE, 2023. p. 14040-14050. ISSN 1550-5499. ISBN 979-8-3503-0719-1.
  • Rok: 2023
  • DOI: 10.1109/ICCV51070.2023.01295
  • Odkaz: https://doi.org/10.1109/ICCV51070.2023.01295
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Most model-free visual object tracking methods formulate the tracking task as object location estimation given by a 2D segmentation or a bounding box in each video frame. We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation, namely the textured 3D shape and 6DoF pose in each video frame. Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames, including frames where some points are invisible. To achieve that, the estimation is driven by re-rendering the input video frames as well as possible through differentiable rendering, which has not been used for tracking before. The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose. We improve the state-of-the-art in 2D segmentation tracking on three different datasets with mostly rigid objects.

Visual Object Tracking With Discriminative Filters and Siamese Networks: A Survey and Outlook

  • Autoři: Javed, S., Danelljan, M., Khan, F., Khan, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023, 45(5), 6552-6574. ISSN 0162-8828.
  • Rok: 2023
  • DOI: 10.1109/TPAMI.2022.3212594
  • Odkaz: https://doi.org/10.1109/TPAMI.2022.3212594
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating tracking paradigms, which have led to significant progress. Following the rapid evolution of visual object tracking in the last decade, this survey presents a systematic and thorough review of more than 90 DCFs and Siamese trackers, based on results in nine tracking benchmarks. First, we present the background theory of both the DCF and Siamese tracking core formulations. Then, we distinguish and comprehensively review the shared as well as specific open research challenges in both these tracking paradigms. Furthermore, we thoroughly analyze the performance of DCF and Siamese trackers on nine benchmarks, covering different experimental aspects of visual tracking: datasets, evaluation metrics, performance, and speed comparisons. We finish the survey by presenting recommendations and suggestions for distinguished open challenges based on our analysis.

A Discriminative Single-Shot Segmentation Network for Visual Object Tracking

  • Autoři: Lukezic, A., prof. Ing. Jiří Matas, Ph.D., Kristan, M.
  • Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022, 44(12), 9742-9755. ISSN 0162-8828.
  • Rok: 2022
  • DOI: 10.1109/TPAMI.2021.3137933
  • Odkaz: https://doi.org/10.1109/TPAMI.2021.3137933
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker - D3S(2), which narrows the gap between visual object tracking and video object segmentation. A single-shot network applies two target models with complementary geometric properties, one invariant to a broad range of transformations, including non-rigid deformations, the other assuming a rigid object to simultaneously achieve robust online target segmentation. The overall tracking reliability is further increased by decoupling the object and feature scale estimation. Without per-dataset finetuning, and trained only for segmentation as the primary output, D3S(2) outperforms all published trackers on the recent short-term tracking benchmark VOT2020 and performs very close to the state-of-the-art trackers on the GOT-10k, TrackingNet, OTB100 and LaSoT. D3S(2) outperforms the leading segmentation tracker SiamMask on video object segmentation benchmarks and performs on par with top video object segmentation algorithms.

Automatic Fungi Recognition: Deep Learning Meets Mycology

  • DOI: 10.3390/s22020633
  • Odkaz: https://doi.org/10.3390/s22020633
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The article presents an AI-based fungi species recognition system for a citizen-science community. The system's real-time identification too - FungiVision - with a mobile application front-end, led to increased public interest in fungi, quadrupling the number of citizens collecting data. FungiVision, deployed with a human-in-the-loop, reaches nearly 93% accuracy. Using the collected data, we developed a novel fine-grained classification dataset - Danish Fungi 2020 (DF20) - with several unique characteristics: species-level labels, a small number of errors, and rich observation metadata. The dataset enables the testing of the ability to improve classification using metadata, e.g., time, location, habitat and substrate, facilitates classifier calibration testing and finally allows the study of the impact of the device settings on the classification performance. The continual flow of labelled data supports improvements of the online recognition system. Finally, we present a novel method for the fungi recognition service, based on a Vision Transformer architecture. Trained on DF20 and exploiting available metadata, it achieves a recognition error that is 46.75% lower than the current system. By providing a stream of labeled data in one direction, and an accuracy increase in the other, the collaboration creates a virtuous cycle helping both communities.

DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image

  • Autoři: Martyniuk, T., Kupyn, O., Kurlyak, Y., Krashenyi, I., prof. Ing. Jiří Matas, Ph.D., Sharmanska, V.
  • Publikace: Proceeding 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022. p. 20910-20920. ISSN 2575-7075. ISBN 978-1-6654-6946-3.
  • Rok: 2022
  • DOI: 10.1109/CVPR52688.2022.02027
  • Odkaz: https://doi.org/10.1109/CVPR52688.2022.02027
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present DAD-3DHeads, a dense and diverse large-scale dataset, and a robust model for 3D Dense Head Alignment in-the-wild. It contains annotations of over 3.5K landmarks that accurately represent 3D head shape compared to the ground-truth scans. The data-driven model, DAD-3DNet, trained on our dataset, learns shape, expression, and pose parameters, and performs 3D reconstruction of a FLAME mesh. The model also incorporates a landmark prediction branch to take advantage of rich supervision and co-training of multiple related tasks. Experimentally, DAD-3DNet outperforms or is comparable to the state-of-the-art models in (i) 3D Head Pose Estimation on AFLW2000-3D and BIWI, (ii) 3D Face Shape Reconstruction on NoW and Feng, and (iii) 3D Dense Head Alignment and 3D Landmarks Estimation on DAD-3DHeads dataset. Finally, diversity of DAD-3DHeads in camera angles, facial expressions, and occlusions enables a benchmark to study in-the-wild generalization and robustness to distribution shifts. The dataset webpage is https://p.farm/research/dad-3dheads.

Danish Fungi 2020 - Not Just Another Image Recognition Dataset

  • Autoři: Picek, L., Šulc, M., prof. Ing. Jiří Matas, Ph.D., Jeppesen, T.S.
  • Publikace: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022. USA: IEEE Computer Society, 2022. p. 3281-3291. ISSN 2642-9381. ISBN 978-1-6654-0915-5.
  • Rok: 2022
  • DOI: 10.1109/WACV51458.2022.00334
  • Odkaz: https://doi.org/10.1109/WACV51458.2022.00334
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We introduce a novel fine-grained dataset and bench-mark, the Danish Fungi 2020 (DF20). The dataset, constructed from observations submitted to the Atlas of Danish Fungi, is unique in its taxonomy-accurate class labels, small number of errors, highly unbalanced long-tailed class distribution, rich observation metadata, and well-defined class hierarchy. DF20 has zero overlap with ImageNet, al-lowing unbiased comparison of models fine-tuned from publicly available ImageNet checkpoints. The proposed evaluation protocol enables testing the ability to improve classification using metadata – e.g. precise geographic location, habitat, and substrate, facilitates classifier calibration testing, and finally allows to study the impact of the device settings on the classification performance. Experiments using Convolutional Neural Networks (CNN) and the recent Vision Transformers (ViT) show that DF20 presents a challenging task. Interestingly, ViT achieves results su-perior to CNN baselines with 80.45% accuracy and 0.743 macro F1 score, reducing the CNN error by 9% and 12% respectively. A simple procedure for including metadata into the decision process improves the classification accuracy by more than 2.95 percentage points, reducing the error rate by 15%. The source code for all methods and experiments is available at https://sites.google.com/view/danish-fungi-dataset.

Early queen infection shapes developmental dynamics and induces long-term disease protection in incipient ant colonies

  • Autoři: Casillas-Perez, B., Pull, Ch.D., Naiser, F., Naderlinger, E., prof. Ing. Jiří Matas, Ph.D., Cremer, S.
  • Publikace: Ecology Letters. 2022, 25(1), 89-100. ISSN 1461-023X.
  • Rok: 2022
  • DOI: 10.1111/ele.13907
  • Odkaz: https://doi.org/10.1111/ele.13907
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Infections early in life can have enduring effects on an organism's development and immunity. In this study, we show that this equally applies to developing 'superorganisms'--incipient social insect colonies. When we exposed newly mated Lasius niger ant queens to a low pathogen dose, their colonies grew more slowly than controls before winter, but reached similar sizes afterwards. Independent of exposure, queen hibernation survival improved when the ratio of pupae to workers was small. Queens that reared fewer pupae before worker emergence exhibited lower pathogen levels, indicating that high brood rearing efforts interfere with the ability of the queen's immune system to suppress pathogen proliferation. Early-life queen pathogen exposure also improved the immunocompetence of her worker offspring, as demonstrated by challenging the workers to the same pathogen a year later. Transgenerational transfer of the queen's pathogen experience to her workforce can hence durably reduce the disease susceptibility of the whole superorganism.

FEAR: Fast, Efficient, Accurate and Robust Visual Tracker

  • Autoři: Borsuk, V., Vei, R., Kupyn, O., Martyniuk, T., Krashenyi, I., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision - ECCV 2022, Part XXII. Springer, Cham, 2022. p. 644-663. LNCS. vol. 13682. ISSN 0302-9743. ISBN 978-3-031-20046-5.
  • Rok: 2022
  • DOI: 10.1007/978-3-031-20047-2_37
  • Odkaz: https://doi.org/10.1007/978-3-031-20047-2_37
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present FEAR, a family of fast, efficient, accurate, and robust Siamese visual trackers. We present a novel and efficient way to benefit from dual-template representation for object model adaption, which incorporates temporal information with only a single learnable parameter. We further improve the tracker architecture with a pixel-wise fusion block. By plugging-in sophisticated backbones with the abovementioned modules, FEAR-M and FEAR-L trackers surpass most Siamese trackers on several academic benchmarks in both accuracy and efficiency. Employed with the lightweight backbone, the optimized version FEAR-XS offers more than 10 times faster tracking than current Siamese trackers while maintaining near state-of-the-art results. FEAR-XS tracker is 2.4x smaller and 4.3x faster than LightTrack with superior accuracy. In addition, we expand the definition of the model efficiency by introducing FEAR benchmark that assesses energy consumption and execution speed. We show that energy consumption is a limiting factor for trackers on mobile devices. Source code, pretrained models, and evaluation protocol are available at https://github.com/PinataFarms/FEARTracker.

Graph-Cut RANSAC: Local Optimization on Spatially Coherent Structures

  • Autoři: Baráth, D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022, 44(9), 4961-4974. ISSN 0162-8828.
  • Rok: 2022
  • DOI: 10.1109/TPAMI.2021.3071812
  • Odkaz: https://doi.org/10.1109/TPAMI.2021.3071812
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose Graph-Cut RANSAC, GC-RANSAC in short, a new robust geometric model estimation method where the local optimization step is formulated as energy minimization with binary labeling, applying the graph-cut algorithm to select inliers. The minimized energy reflects the assumption that geometric data often form spatially coherent structures - it includes both a unary component representing point-to-model residuals and a binary term promoting spatially coherent inlier-outlier labelling of neighboring points. The proposed local optimization step is conceptually simple, easy to implement, efficient with a globally optimal inlier selection given the model parameters. Graph-Cut RANSAC, equipped with "the bells and whistles" of USAC and MAGSAC++, was tested on a range of problems using a number of publicly available datasets for homography, 6D object pose, fundamental and essential matrix estimation. It is more geometrically accurate than state-of-the-art robust estimators, fails less often and runs faster or with speed similar to less accurate alternatives. The source code is available at https://github.com/danini/graph-cut-ransac.

Human keypoint detection for close proximity human-robot interaction

  • DOI: 10.1109/Humanoids53995.2022.10000133
  • Odkaz: https://doi.org/10.1109/Humanoids53995.2022.10000133
  • Pracoviště: Skupina vizuálního rozpoznávání, Vidění pro roboty a autonomní systémy
  • Anotace:
    We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction. The detection in this scenario is specific in that only a subset of body parts such as hands and torso are in the field of view. In particular, (i) we survey existing datasets with human pose annotation from the perspective of close proximity images and prepare and make publicly available a new Human in Close Proximity (HiCP) dataset; (ii) we quantitatively and qualitatively compare state-of-the-art human whole-body 2D keypoint detection methods (OpenPose, MMPose, AlphaPose, Detectron2) on this dataset; (iii) since accurate detection of hands and fingers is critical in applications with handovers, we evaluate the performance of the MediaPipe hand detector; (iv) we deploy the algorithms on a humanoid robot with an RGB-D camera on its head and evaluate the performance in 3D human keypoint detection. A motion capture system is used as reference. The best performing whole-body keypoint detectors in close proximity were MMPose and AlphaPose, but both had difficulty with finger detection. Thus, we propose a combination of MMPose or AlphaPose for the body and MediaPipe for the hands in a single framework providing the most accurate and robust detection. We also analyse the failure modes of individual detectors---for example, to what extent the absence of the head of the person in the image degrades performance. Finally, we demonstrate the framework in a scenario where a humanoid robot interacting with a person uses the detected 3D keypoints for whole-body avoidance maneuvers.

Lightweight Monocular Depth with a Novel Neural Architecture Search Method

  • Autoři: Lam Huynh, X., Phong Nguyen, X., prof. Ing. Jiří Matas, Ph.D., Rahtu, E., Heikkil ̈a, J.
  • Publikace: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022. USA: IEEE Computer Society, 2022. p. 326-336. ISSN 2642-9381. ISBN 978-1-6654-0915-5.
  • Rok: 2022
  • DOI: 10.1109/WACV51458.2022.00040
  • Odkaz: https://doi.org/10.1109/WACV51458.2022.00040
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This paper presents a novel neural architecture search method, called LiDNAS, for generating lightweight monocular depth estimation models. Unlike previous neural architecture search (NAS) approaches, where finding optimized networks is computationally demanding, the introduced novel Assisted Tabu Search leads to efficient architecture exploration. Moreover, we construct the search space on a pre-defined backbone network to balance layer diversity and search space size. The LiDNAS method outperforms the state-of-the-art NAS approach, proposed for disparity and depth estimation, in terms of search efficiency and output model performance. The LiDNAS optimized models achieve result superior to compact depth estimation state-of-the-art on NYU-Depth-v2, KITTI, and ScanNet, while being 7%-500% more compact in size, i.e the number of model parameters.

Marginalizing Sample Consensus

  • Autoři: Baráth, D., Nosková, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022, 44(11), 8420-8432. ISSN 0162-8828.
  • Rok: 2022
  • DOI: 10.1109/TPAMI.2021.3103562
  • Odkaz: https://doi.org/10.1109/TPAMI.2021.3103562
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A new method for robust estimation, MAGSAC++, is proposed. It introduces a new model quality (scoring) function that does not make inlier-outlier decisions, and a novel marginalization procedure formulated as an M-estimation with a novel class of M-estimators (a robust kernel) solved by an iteratively re-weighted least squares procedure. Instead of the inlier-outlier threshold, it requires only its loose upper bound which can be chosen from a significantly wider range. Also, we propose a new termination criterion and a technique for selecting a set of inliers in a data-driven manner as a post-processing step after the robust estimation finishes. On a number of publicly available real-world datasets for homography, fundamental matrix fitting and relative pose, MAGSAC++ produces results superior to the state-of-the-art robust methods. It is more geometrically accurate, fails fewer times, and it is often faster. It is shown that MAGSAC++ is significantly less sensitive to the setting of the threshold upper bound than the other state-of-the-art algorithms to the inlier-outlier threshold. Therefore, it is easier to be applied to unseen problems and scenes without acquiring information by hand about the setting of the inlier-outlier threshold. The source code and examples both in C++ and Python are available at https://github.com/danini/magsac .

Overview of FungiCLEF 2022: Fungi Recognition as an Open Set Classification Problem

  • Autoři: Picek, L., Šulc, M., prof. Ing. Jiří Matas, Ph.D., Heilmann-Clausen, J.
  • Publikace: Proceedings of the Working Notes of CLEF 2022. Aachen: CEUR Workshop Proceedings, 2022. p. 1970-1981. vol. 3180. ISSN 1613-0073.
  • Rok: 2022
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The main goal of the new LifeCLEF challenge, FungiCLEF 2022: Fungi Recognition as an Open Set Classification Problem, was to provide an evaluation ground for end-to-end fungi species recognition in an open class set scenario. An AI-based fungi species recognition system deployed in the Atlas of Danish Fungi helps mycologists to collect valuable data and allows users to learn about fungi species identification. Advances in fungi recognition from images and metadata will allow continuous improvement of the system deployed in this citizen science project. The training set is based on the Danish Fungi 2020 dataset and contains 295,938 photographs of 1,604 species. For testing, we provided a collection of 59,420 expert-approved observations collected in 2021. The test set includes 1,165 species from the training set and 1,969 unknown species, leading to an open-set recognition problem. This paper provides (i) a description of the challenge task and datasets, (ii) a summary of the evaluation methodology, (iii) a review of the systems submitted by the participating teams, and (iv) a discussion of the challenge results.

Plant recognition by AI: Deep neural nets, transformers, and kNN in deep embeddings

  • DOI: 10.3389/fpls.2022.787527
  • Odkaz: https://doi.org/10.3389/fpls.2022.787527
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The article reviews and benchmarks machine learning methods for automatic image-based plant species recognition and proposes a novel retrieval-based method for recognition by nearest neighbor classification in a deep embedding space. The image retrieval method relies on a model trained via the Recall@k surrogate loss. State-of-the-art approaches to image classification, based on Convolutional Neural Networks (CNN) and Vision Transformers (ViT), are benchmarked and compared with the proposed image retrieval-based method. The impact of performance-enhancing techniques, e.g., class prior adaptation, image augmentations, learning rate scheduling, and loss functions, is studied. The evaluation is carried out on the PlantCLEF 2017, the ExpertLifeCLEF 2018, and the iNaturalist 2018 Datasets—the largest publicly available datasets for plant recognition. The evaluation of CNN and ViT classifiers shows a gradual improvement in classification accuracy. The current state-of-the-art Vision Transformer model, ViT-Large/16, achieves 91.15% and 83.54% accuracy on the PlantCLEF 2017 and ExpertLifeCLEF 2018 test sets, respectively; the best CNN model (ResNeSt-269e) error rate dropped by 22.91% and 28.34%. Apart from that, additional tricks increased the performance for the ViT-Base/32 by 3.72% on ExpertLifeCLEF 2018 and by 4.67% on PlantCLEF 2017. The retrieval approach achieved superior performance in all measured scenarios with accuracy margins of 0.28%, 4.13%, and 10.25% on ExpertLifeCLEF 2018, PlantCLEF 2017, and iNat2018–Plantae, respectively.

Point Cloud Color Constancy

  • Autoři: Xing, X., Qian, Y., Feng, S., Dong, Y., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceeding 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022. p. 19718-19727. ISSN 2575-7075. ISBN 978-1-6654-6946-3.
  • Rok: 2022
  • DOI: 10.1109/CVPR52688.2022.01913
  • Odkaz: https://doi.org/10.1109/CVPR52688.2022.01913
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In this paper, we present Point Cloud Color Constancy, in short PCCC, an illumination chromaticity estimation algorithm exploiting a point cloud. We leverage the depth information captured by the time-of-flight (ToF) sensor mounted rigidly with the RGB sensor, and form a 6D cloud where each point contains the coordinates and RGB intensities, noted as (x,y,z,r,g,b). PCCC applies the PointNet architecture to the color constancy problem, deriving the illumination vector point-wise and then making a global decision about the global illumination chromaticity. On two popular RGB-D datasets, which we extend with illumination information, as well as on a novel benchmark, PCCC obtains lower error than the state-of-the-art algorithms. Our method is simple and fast, requiring merely 16 x 16-size input and reaching speed over 140 fps (CPU time), including the cost of building the point cloud and net inference.

Pose-graph via Adaptive Image Re-ordering

  • Autoři: Baráth, D., Nosková, J., Eichhardt, I., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: The 33rd British Machine Vision Conference Proceedings. Durham: The British Machine Vision Association and Society for Pattern Recognition, 2022.
  • Rok: 2022
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We introduce novel methods that speed up the pose-graph generation for global Structure-from-Motion algorithms. We replace the widely used ``accept-or-reject'' strategy for image pairs, where often thousands of RANSAC iterations are wasted on pairs with low inlier ratio or on non-matchable ones. The new algorithm exploits the fact that every unsuccessful RANSAC iteration reduces the probability of an image pair being matchable, i.e., it reduces its inlier ratio expectation. The method always selects the most promising pair for matching. While running RANSAC on the pair, it updates the distribution of its inlier ratio probability in a principled way via a Bayesian approach. Once the expected inlier ratio drops below an adaptive threshold, the method puts back the pair in the processing queue ordered by the updated inlier ratio expectations. The algorithms are tested on more than 600k real image pairs. They accelerate the pose-graph generation by an order-of-magnitude on average. The code will be made available.

Recall@k Surrogate Loss with Large Batches and Similarity Mixup

  • DOI: 10.1109/CVPR52688.2022.00735
  • Odkaz: https://doi.org/10.1109/CVPR52688.2022.00735
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This work focuses on learning deep visual representation models for retrieval by exploring the interplay between a new loss function, the batch size, and a new regularization approach. Direct optimization, by gradient descent, of an evaluation metric, is not possible when it is non-differentiable, which is the case for recall in retrieval. A differentiable surrogate loss for the recall is proposed in this work. Using an implementation that sidesteps the hardware constraints of the GPU memory, the method trains with a very large batch size, which is essential for metrics computed on the entire retrieval database. It is assisted by an efficient mixup regularization approach that operates on pairwise scalar similarities and virtually increases the batch size further. The suggested method achieves state-of-the-art performance in several image retrieval benchmarks when used for deep metric learning. For instance-level recognition, the method outperforms similar approaches that train using an approximation of average precision.

The Hitchhiker's Guide to Prior-Shift Adaptation

  • Autoři: Šipka, T., Šulc, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022. USA: IEEE Computer Society, 2022. p. 2031-2039. ISSN 2642-9381. ISBN 978-1-6654-0915-5.
  • Rok: 2022
  • DOI: 10.1109/WACV51458.2022.00209
  • Odkaz: https://doi.org/10.1109/WACV51458.2022.00209
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In many computer vision classification tasks, class priors at test time often differ from priors on the training set. In the case of such prior shift, classifiers must be adapted correspondingly to maintain close to optimal performance. This paper analyzes methods for adaptation of probabilistic classifiers to new priors and for estimating new priors on an unlabeled test set. We propose a novel method to address a known issue of prior estimation methods based on confusion matrices, where inconsistent estimates of decision probabilities and confusion matrices lead to negative values in the estimated priors. Experiments on fine-grained image classification datasets provide insight into the best practice of prior shift estimation and classifier adaptation, and show that the proposed method achieves state-of-the-art results in prior adaptation. Applying the best practice to two tasks with naturally imbalanced priors, learning from web-crawled images and plant species classification, increased the recognition accuracy by 1.1% and 3.4% respectively.

Trans2k: Unlocking the Power of Deep Models for Transparent Object Tracking

  • Autoři: Trojer, Z., Lukezic, A., prof. Ing. Jiří Matas, Ph.D., Kristan, M.
  • Publikace: The 33rd British Machine Vision Conference Proceedings. Durham: The British Machine Vision Association and Society for Pattern Recognition, 2022.
  • Rok: 2022
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Visual object tracking has focused predominantly on opaque objects, while transparent object tracking received very little attention. Motivated by the uniqueness of transparent objects in that their appearance is directly affected by the background, the first dedicated evaluation dataset has emerged recently. We contribute to this effort by proposing the first transparent object tracking training dataset Trans2k that consists of over 2k sequences with 104,343 images overall, annotated by bounding boxes and segmentation masks. Noting that transparent objects can be realistically rendered by modern renderers, we quantify domain-specific attributes and render the dataset containing visual attributes and tracking situations not covered in the existing object training datasets. We observe a consistent performance boost (up to 16%) across a diverse set of modern tracking architectures when trained using Trans2k, and show insights not previously possible due to the lack of appropriate training sets. The dataset and the rendering engine will be publicly released to unlock the power of modern learning-based trackers and foster new designs in transparent object tracking.

A deep learning method for visual recognition of snake species

  • Autoři: Chamidullin, R., Šulc, M., prof. Ing. Jiří Matas, Ph.D., Picek, L.
  • Publikace: Proceedings of the 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings, 2021. p. 1512-1525. CEUR Workshop Proceedings. vol. 2936. ISSN 1613-0073.
  • Rok: 2021
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper presents a method for image-based snake species identification. The proposed method isbased on deep residual neural networks – ResNeSt, ResNeXt and ResNet – fine-tuned from ImageNetpre-trained checkpoints. We achieve performance improvements by: discarding predictions of speciesthat do not occur in the country of the query; combining predictions from an ensemble of classifiers;and applying mixed precision training, which allows training neural networks with larger batch size.We experimented with loss functions inspired by the considered metrics: soft F1 loss and weighted crossentropy loss. However, the standard cross entropy loss achieved superior results both in accuracy andin F1 measures. The proposed method scored third in the SnakeCLEF 2021 challenge, achieving 91.6%classification accuracy, Country F1 Score of 0.860, and F1 Score of 0.830.

Acoustic vehicle speed estimation from single sensor measurements

  • Autoři: Djukanovic, S., prof. Ing. Jiří Matas, Ph.D., Virtanen, T.
  • Publikace: IEEE Sensors Journal. 2021, 21(20), 23317-23324. ISSN 1530-437X.
  • Rok: 2021
  • DOI: 10.1109/JSEN.2021.3110009
  • Odkaz: https://doi.org/10.1109/JSEN.2021.3110009
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper addresses acoustic vehicle speed estimation using single sensor measurements. We introduce a new speed-dependent feature based on the attenuation of the sound amplitude. The feature is predicted from the audio signal and used as input to a regression model for speed estimation. For this research, we have collected, annotated, and published a dataset of audio-video recordings of single vehicles passing by the camera at a known constant speed. The dataset contains 304 urban-environment real-field recordings of ten different vehicles. The proposed method is trained and tested on the collected dataset. Experiments show that it is able to accurately predict the pass-by instant of a vehicle and to estimate its speed with an average error of 7.39 km/h. When the speed is discretized into intervals of 10 km/h, the proposed method achieves the average accuracy of 53.2% for correct interval prediction and 93.4% when misclassification of one interval is allowed. Experiments also show that sound disturbances, such as wind, severely affect acoustic speed estimation.

Ballroom Dance Recognition from Audio Recordings

  • Autoři: Pavlín, T., Ing. Jan Čech, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2020 25th International Conference on Pattern Recognition (ICPR). Los Alamitos: IEEE Computer Society, 2021. p. 2142-2149. ISSN 1051-4651. ISBN 978-1-7281-8808-9.
  • Rok: 2021
  • DOI: 10.1109/ICPR48806.2021.9412255
  • Odkaz: https://doi.org/10.1109/ICPR48806.2021.9412255
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a CNN-based approach to classify ten genres of ballroom dances given audio recordings, five latin and five standard, namely Cha Cha Cha, Jive, Paso Doble, Rumba, Samba, Quickstep, Slow Foxtrot, SlowWaltz, Tango and Viennese Waltz. We utilize a spectrogram of an audio signal and we treat it as an image that is an input of the CNN. The classification is performed independently by 5-seconds spectrogram segments in sliding window fashion and the results are then aggregated. The method was tested on following datasets: Publicly available Extended Ballroom dataset collected by Marchand and Peeters, 2016 and two YouTube datasets collected by us, one in studio quality and the other, more challenging, recorded on mobile phones. The method achieved accuracy 93.9%, 96.7% and 89.8% respectively. The method runs in real-time. We implemented a web application to demonstrate the proposed method.

Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion

  • Autoři: Huynh, L., Nguyen, P., prof. Ing. Jiří Matas, Ph.D., Rahtu, E., Heikkil ̈a, J.
  • Publikace: ICCV2021: Proceedings of the International Conference on Computer Vision. Piscataway: IEEE, 2021. p. 12747-12756. ISSN 2380-7504. ISBN 978-1-6654-2812-5.
  • Rok: 2021
  • DOI: 10.1109/ICCV48922.2021.01253
  • Odkaz: https://doi.org/10.1109/ICCV48922.2021.01253
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In this paper, we propose enhancing monocular depthestimation by adding 3D points as depth guidance. Un-like existing depth completion methods, our approach per-forms well on extremely sparse and unevenly distributedpoint clouds, which makes it agnostic to the source of the3D points. We achieve this by introducing a novel multi-scale 3D point fusion network that is both lightweight andefficient. We demonstrate its versatility on two differentdepth estimation problems where the 3D points have beenacquired with conventional structure-from-motion and Li-DAR. In both cases, our network performs on par with state-of-the-art depth completion methods and achieves signifi-cantly higher accuracy when only a small number of pointsis used while being more compact in terms of the num-ber of parameters. We show that our method outperformssome contemporary deep learning based multi-view stereoand structure-from-motion methods both in accuracy and incompactness.

DAL: A Deep Depth-Aware Long-term Tracker

  • Autoři: Qian, Y., Yan, S., Lukezic, A., Kristan, M., Kamarainen, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2020 25th International Conference on Pattern Recognition (ICPR). Los Alamitos: IEEE Computer Society, 2021. p. 7825-7832. ISSN 1051-4651. ISBN 978-1-7281-8808-9.
  • Rok: 2021
  • DOI: 10.1109/ICPR48806.2021.9412984
  • Odkaz: https://doi.org/10.1109/ICPR48806.2021.9412984
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The best RGBD trackers provide high accuracy but are slow to run. On the other hand, the best RGB trackers are fast but clearly inferior on the RGBD datasets. In this work, we propose a deep depth-aware long-term tracker that achieves state-of-the-art RGBD tracking performance and is fast to run.

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects

  • Autoři: Ing. Denys Rozumnyi, Oswald, M.R., Ferrari, V., prof. Ing. Jiří Matas, Ph.D., Pollefeys, M.
  • Publikace: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2021. p. 3455-3464. ISSN 2575-7075. ISBN 978-1-6654-4509-2.
  • Rok: 2021
  • DOI: 10.1109/CVPR46437.2021.00346
  • Odkaz: https://doi.org/10.1109/CVPR46437.2021.00346
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Objects moving at high speed appear significantly blurred when captured with cameras. The blurry appearance is especially ambiguous when the object has complex shape or texture. In such cases, classical methods, or even humans, are unable to recover the object's appearance and motion. We propose a method that, given a single image with its estimated background, outputs the object's appearance and position in a series of sub-frames as if captured by a high-speed camera (i.e. temporal super-resolution). The proposed generative model embeds an image of the blurred object into a latent space representation, disentangles the background, and renders the sharp appearance. Inspired by the image formation model, we design novel self-supervised loss function terms that boost performance and show good generalization capabilities. The proposed DeFMO method is trained on a complex synthetic dataset, yet it performs well on real-world data from several datasets. DeFMO outperforms the state of the art and generates high-quality temporal super-resolution frames.

Efficient Initial Pose-Graph Generation for Global SfM

  • Autoři: Baráth, D., Mgr. Dmytro Mishkin, Ph.D., Eichhardt, I., Shipachev, I., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2021. p. 14541-14550. ISSN 2575-7075. ISBN 978-1-6654-4509-2.
  • Rok: 2021
  • DOI: 10.1109/CVPR46437.2021.01431
  • Odkaz: https://doi.org/10.1109/CVPR46437.2021.01431
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose ways to speed up the initial pose-graph generation for global Structure-from-Motion algorithms. To avoid forming tentative point correspondences by FLANN and geometric verification by RANSAC, which are the most time-consuming steps of the pose-graph creation, we propose two new methods -- built on the fact that image pairs usually are matched consecutively. Thus, candidate relative poses can be recovered from paths in the partly-built pose-graph. We propose a heuristic for the A* traversal, considering global similarity of images and the quality of the pose-graph edges. Given a relative pose from a path, descriptor-based feature matching is made "light-weight" by exploiting the known epipolar geometry. To speed up PROSAC-based sampling when RANSAC is applied, we propose a third method to order the correspondences by their inlier probabilities from previous estimations. The algorithms are tested on 402130 image pairs from the 1DSfM dataset and they speed up the feature matching 17 times and pose estimation 5 times. The source code will be made public.

Fast Fourier Intrinsic Network

  • Autoři: Qian, Y., Shi, M., Kamarainen, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021. USA: IEEE Computer Society, 2021. p. 3168-3177. ISBN 978-0-7381-4266-1.
  • Rok: 2021
  • DOI: 10.1109/WACV48630.2021.00321
  • Odkaz: https://doi.org/10.1109/WACV48630.2021.00321
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We address the problem of decomposing an image into albedo and shading. We propose the Fast Fourier Intrinsic Network, FFI-Net in short, that operates in the spectral domain, splitting the input into several spectral bands. Weights in FFI-Net are optimized in the spectral domain, allowing faster convergence to a lower error. FFI-Net is lightweight and does not need auxiliary networks for training. The network is trained end-to-end with a novel spectral loss which measures the global distance between the network prediction and corresponding ground truth. FFI-Net achieves state-of-the-art performance on MPI-Sintel, MIT Intrinsic, and IIW datasets.

Fast Text vs. Non-text Classification of Images

  • Autoři: Králíček, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ICDAR2021: 16th IAPR International Conference on Document Analysis and Recognition. Cham: Springer International Publishing, 2021. p. 18-32. LNCS. vol. 12824. ISSN 0302-9743. ISBN 978-3-030-86336-4.
  • Rok: 2021
  • DOI: 10.1007/978-3-030-86337-1_2
  • Odkaz: https://doi.org/10.1007/978-3-030-86337-1_2
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a fast method for classifying images as containing text, or with no scene text. The typical application is in processing large image streams, as encountered in social networks, for detection and recognition of scene text. The proposed classifier efficiently removes non-text images from consideration, thus allowing to apply the potentially computationally heavy scene text detection and OCR on only a fraction of the images.

FEDS -- Filtered Edit Distance Surrogate

  • Autoři: Patel, Y., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ICDAR2021: 16th IAPR International Conference on Document Analysis and Recognition. Cham: Springer International Publishing, 2021. p. 171-186. LNCS. vol. 12824. ISSN 0302-9743. ISBN 978-3-030-86336-4.
  • Rok: 2021
  • DOI: 10.1007/978-3-030-86337-1_12
  • Odkaz: https://doi.org/10.1007/978-3-030-86337-1_12
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This paper proposes a procedure to train a scene text recognition model using a robust learned surrogate of edit distance. The proposed method borrows from self-paced learning and filters out the training examples that are hard for the surrogate. The filtering is performed by judging the quality of the approximation, using a ramp function, enabling end-to-end training. Following the literature, the experiments are conducted in a post-tuning setup, where a trained scene text recognition model is tuned using the learned surrogate of edit distance. The efficacy is demonstrated by improvements on various challenging scene text datasets such as IIIT-5K, SVT, ICDAR, SVTP, and CUTE. The proposed method provides an average improvement of 11.2% on total edit distance and an error reduction of 9.5% on accuracy.

FMODetect: Robust Detection of Fast Moving Objects

  • Autoři: Ing. Denys Rozumnyi, prof. Ing. Jiří Matas, Ph.D., Šroubek, F., Pollefeys, M., Oswald, M.R.
  • Publikace: ICCV2021: Proceedings of the International Conference on Computer Vision. Piscataway: IEEE, 2021. p. 3521-3529. ISSN 2380-7504. ISBN 978-1-6654-2812-5.
  • Rok: 2021
  • DOI: 10.1109/ICCV48922.2021.00352
  • Odkaz: https://doi.org/10.1109/ICCV48922.2021.00352
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose the first learning-based approach for fast moving objects detection. Such objects are highly blurred and move over large distances within one video frame. Fast moving objects are associated with a deblurring and matting problem, also called deblatting. We show that the separation of deblatting into consecutive matting and deblurring allows achieving real-time performance, i.e. an order of magnitude speed-up, and thus enabling new classes of application. The proposed method detects fast moving objects as a truncated distance function to the trajectory by learning from synthetic data. For the sharp appearance estimation and accurate trajectory estimation, we propose a matting and fitting network that estimates the blurred appearance without background, followed by an energy minimization based deblurring. The state-of-the-art methods are outperformed in terms of recall, precision, trajectory estimation, and sharp appearance reconstruction. Compared to other methods, such as deblatting, the inference is of several orders of magnitude faster and allows applications such as real-time fast moving object detection and retrieval in large video collections.

Image Matching across Wide Baselines: From Paper to Practice

  • DOI: 10.1007/s11263-020-01385-0
  • Odkaz: https://doi.org/10.1007/s11263-020-01385-0
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We introduce a comprehensive benchmark for local features and robust estimation algorithms, focusing on the downstream task -- the accuracy of the reconstructed camera pose -- as our primary metric. Our pipeline's modular structure allows easy integration, configuration, and combination of different methods and heuristics. This is demonstrated by embedding dozens of popular algorithms and evaluating them, from seminal works to the cutting edge of machine learning research. We show that with proper settings, classical solutions may still outperform the perceived state of the art. Besides establishing the actual state of the art, the conducted experiments reveal unexpected properties of Structure from Motion (SfM) pipelines that can help improve their performance, for both algorithmic and learned methods. Data and code are online https://github.com/team-yi-ubc/image-matching-benchmark providing an easy-to-use and flexible framework for the benchmarking of local features and robust estimation methods, both alongside and against top-performing methods. This work provides a basis for the Image Matching Challenge https://vision.uvic.ca/image-matching-challenge/.

Monocular Arbitrary Moving Object Discovery and Segmentation

  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a method for discovery and segmentation of objects that are, or their parts are, independently moving in the scene. Given three monocular video frames, the method outputs semantically meaningful regions, i.e. regions corresponding to the whole object, even when only a part of it moves. The architecture of the CNN-based end-to-end method, called Raptor, combines semantic and motion backbones, which pass their outputs to a final region segmentation network. The semantic backbone is trained in a class-agnostic manner in order to generalise to object classes beyond the training data. The core of the motion branch is a geometrical cost volume computed from optical flow, optical expansion, mono-depth and the estimated camera motion. Evaluation of the proposed architecture on the instance motion segmentation and binary moving-static segmentation problems on KITTI, DAVIS-Moving and YTVOSMoving datasets shows that the proposed method achieves state-of-the-art results on all the datasets and is able to generalise well to various environments. For the KITTI dataset, we provide an upgraded instance motion segmentation annotation which covers all moving objects. Dataset, code and models are available on the github project page github.com/michalneoral/Raptor.

Monocular Depth Estimation Primed by Salient Point Detection and Normalized Hessian Loss

  • Autoři: Huynh, L., Pedone, M., Nguyen, P., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 3DV 2021: Proceedings of the International Conference on 3D Vision. Los Alamitos, CA: IEEE Computer Soc., 2021. p. 228-238. ISSN 2475-7888. ISBN 978-1-6654-2688-6.
  • Rok: 2021
  • DOI: 10.1109/3DV53792.2021.00033
  • Odkaz: https://doi.org/10.1109/3DV53792.2021.00033
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Deep neural networks have recently thrived on single image depth estimation. That being said, current developments on this topic highlight an apparent compromise between accuracy and network size. This work proposes an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from salient point detection. Specifically, we utilize a sparse set of keypoints to train a FuSaNet model that consists of two major components: Fusion-Net and Saliency-Net. In addition, we introduce a normalized Hessian loss term invariant to scaling and shear along the depth direction, which is shown to substantially improve the accuracy. The proposed method achieves state-of-the-art results on NYU-Depth-v2 and KITTI while using 3.1-38.4 times smaller model in terms of the number of parameters than baseline approaches. Experiments on the SUN-RGBD further demonstrate the generalizability of the proposed method.

Neural network-based acoustic vehicle counting

  • Autoři: Djukanovic, S., Patel, Y., prof. Ing. Jiří Matas, Ph.D., Virtanen, T.
  • Publikace: 29th European Signal Processing Conference (EUSIPCO). New Jersey: IEEE Signal Processing Society, 2021. p. 561-565. ISSN 2076-1465. ISBN 9789082797060.
  • Rok: 2021
  • DOI: 10.23919/EUSIPCO54536.2021.9615925
  • Odkaz: https://doi.org/10.23919/EUSIPCO54536.2021.9615925
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This paper addresses acoustic vehicle counting usingone-channel audio. We predict the pass-by instants of vehiclesfrom local minima of clipped vehicle-to-microphone distance.This distance is predicted from audio using a two-stage (coarse-fine) regression, with both stages realised via neural networks(NNs). Experiments show that the NN-based distance regressionoutperforms by far the previously proposed support vectorregression. The95%confidence interval for the mean of vehiclecounting error is within[0.28%,−0.55%]. Besides the minima-based counting, we propose a deep learning counting that op-erates on the predicted distance without detecting local minima.Although outperformed in accuracy by the former approach,deep counting has a significant advantage in that it does notdepend on minima detection parameters. Results also show thatremoving low frequencies in features improves the countingperformance.

Performance Evaluation Methodology for Long-Term Single-Object Tracking

  • DOI: 10.1109/TCYB.2020.2980618
  • Odkaz: https://doi.org/10.1109/TCYB.2020.2980618
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize the short-term performance measures, thus linking the two tracking problems. Furthermore, the new measures are highly robust to temporal annotation sparsity and allow annotation of sequences hundreds of times longer than in the current datasets without increasing manual annotation labor. A new challenging dataset of carefully selected sequences with many target disappearances is proposed. A new tracking taxonomy is proposed to position trackers on the short-term/long-term spectrum. The benchmark contains an extensive evaluation of the largest number of long-term trackers and comparison to state-of-the-art short-term trackers. We analyze the influence of tracking architecture implementations to long-term performance and explore various redetection strategies as well as the influence of visual model update strategies to long-term tracking drift. The methodology is integrated in the VOT toolkit to automate experimental analysis and benchmarking and to facilitate the future development of long-term trackers.

RGBD-Net: Predicting Color and Depth Images for Novel Views Synthesis

  • Autoři: Nguyen, P., Karnewar, A., Huynh, L., Rahtu, E., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 3DV 2021: Proceedings of the International Conference on 3D Vision. Los Alamitos, CA: IEEE Computer Soc., 2021. p. 1095-1105. ISSN 2475-7888. ISBN 978-1-6654-2688-6.
  • Rok: 2021
  • DOI: 10.1109/3DV53792.2021.00117
  • Odkaz: https://doi.org/10.1109/3DV53792.2021.00117
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network. The former one predicts depth maps of the target views by using adaptive depth scaling, while the latter one leverages the predicted depths and renders spatially and temporally consistent target images. In the experimental evaluation on standard datasets, RGBD-Net not only outperforms the state-of-the-art by a clear margin, but it also generalizes well to new scenes without per-scene optimization. Moreover, we show that RGBD-Net can be optionally trained without depth supervision while still retaining high-quality rendering. Thanks to the depth regression network, RGBD-Net can be also used for creating dense 3D point clouds that are more accurate than those produced by some state-of-the-art multi-view stereo methods.

Road Anomaly Detection by Partial Image Reconstruction with Segmentation Coupling

  • Autoři: Ing. Tomáš Vojíř, Ph.D., Šipka, T., Aljundi, R., Chumerin, N., Reino, D.O., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ICCV2021: Proceedings of the International Conference on Computer Vision. Piscataway: IEEE, 2021. p. 15651-15660. ISSN 2380-7504. ISBN 978-1-6654-2812-5.
  • Rok: 2021
  • DOI: 10.1109/ICCV48922.2021.01536
  • Odkaz: https://doi.org/10.1109/ICCV48922.2021.01536
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present a novel approach to the detection of unknownobjects in the context of autonomous driving. The problemis formulated as anomaly detection, since we assume thatthe unknown stuff or object appearance cannot be learned.To that end, we propose a reconstruction module that can beused with many existing semantic segmentation networks,and that is trained to recognize and reconstruct road (driv-able) surface from a small bottleneck. We postulate thatpoor reconstruction of the road surface is due to areas thatare outside of the training distribution, which is a strong in-dicator of an anomaly. The road structural similarity erroris coupled with the semantic segmentation to incorporateinformation from known classes and produce final per-pixelanomaly scores. The proposed JSR-Net was evaluated onfour datasets, Lost-and-found, Road Anomaly, Road Obsta-cles, and FishyScapes, achieving state-of-art performanceon all, reducing the false positives significantly, while typ-ically having the highest average precision for wide rangeof operation points.

Text Recognition - Real World Data and Where to Find Them

  • Autoři: Ing. Klára Janoušková, prof. Ing. Jiří Matas, Ph.D., Gomez, L., Karatzas, D.
  • Publikace: 2020 25th International Conference on Pattern Recognition (ICPR). Los Alamitos: IEEE Computer Society, 2021. p. 4489-4496. ISSN 1051-4651. ISBN 978-1-7281-8808-9.
  • Rok: 2021
  • DOI: 10.1109/ICPR48806.2021.9412868
  • Odkaz: https://doi.org/10.1109/ICPR48806.2021.9412868
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present a method for exploiting weakly annotated images to improve text extraction pipelines. The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions. The method includes matching of imprecise transcriptions to weak annotations and an edit distance guided neighbourhood search. It produces nearly error-free, localised instances of scene text, which we treat as "pseudo ground truth" (PGT).

The Ninth Visual Object Tracking VOT2021 Challenge Results

  • Autoři: Kristan, M., prof. Ing. Jiří Matas, Ph.D., Leonardis, A., Mgr. Ondřej Drbohlav, Ph.D.,
  • Publikace: ICCVW2021: The Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE, 2021. p. 2711-2738. ISSN 2473-9944. ISBN 978-1-6654-0191-3.
  • Rok: 2021
  • DOI: 10.1109/ICCVW54120.2021.00305
  • Odkaz: https://doi.org/10.1109/ICCVW54120.2021.00305
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity organized by the VOT initiative. Results of 71 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2021 challenge was composed of four sub-challenges focusing on different tracking domains: (i) VOT-ST2021 challenge focused on short-term tracking in RGB, (ii) VOT-RT2021 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2021 focused on long-term tracking, namely coping with target disappearance and reappearance and (iv) VOT-RGBD2021 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2021 dataset was refreshed, while VOT-RGBD2021 introduces a training dataset and sequestered dataset for winner identification. The source code for most of the trackers, the datasets, the evaluation kit and the results along with the source code for most trackers are publicly available at the challenge website.

Tracking by Deblatting

  • DOI: 10.1007/s11263-021-01480-w
  • Odkaz: https://doi.org/10.1007/s11263-021-01480-w
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Objects moving at high speed along complex trajectories often appear in videos, especially videos of sports. Such objects travel a considerable distance during exposure time of a single frame, and therefore, their position in the frame is not well defined. They appear as semi-transparent streaks due to the motion blur and cannot be reliably tracked by general trackers. We propose a novel approach called Tracking by Deblatting based on the observation that motion blur is directly related to the intra-frame trajectory of an object. Blur is estimated by solving two intertwined inverse problems, blind deblurring and image matting, which we call deblatting. By postprocessing, non-causal Tracking by Deblatting estimates continuous, complete, and accurate object trajectories for the whole sequence. Tracked objects are precisely localized with higher temporal resolution than by conventional trackers. Energy minimization by dynamic programming is used to detect abrupt changes of motion, called bounces. High-order polynomials are then fitted to smooth trajectory segments between bounces. The output is a continuous trajectory function that assigns location for every real-valued time stamp from zero to the number of frames. The proposed algorithm was evaluated on a newly created dataset of videos from a high-speed camera using a novel Trajectory-IoU metric that generalizes the traditional Intersection over Union and measures the accuracy of the intra-frame trajectory. The proposed method outperforms the baselines both in recall and trajectory accuracy. Additionally, we show that from the trajectory function precise physical calculations are possible, such as radius, gravity, and sub-frame object velocity. Velocity estimation is compared to the high-speed camera measurements and radars. Results show high performance of the proposed method in terms of Trajectory-IoU, recall, and velocity estimation.

VSAC: Efficient and Accurate Estimator for H and F

  • Autoři: Ivashechkin, M., Baráth, D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ICCV2021: Proceedings of the International Conference on Computer Vision. Piscataway: IEEE, 2021. p. 15223-15232. ISSN 2380-7504. ISBN 978-1-6654-2812-5.
  • Rok: 2021
  • DOI: 10.1109/ICCV48922.2021.01496
  • Odkaz: https://doi.org/10.1109/ICCV48922.2021.01496
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present VSAC, a RANSAC-type robust estimator with a number of novelties. It benefits from the introduction of the concept of independent inliers that improves significantly the efficacy of the dominant plane handling and, also, allows near error-free rejection of incorrect models, without false positives. The local optimization process and its application is improved so that it is run on average only once. Further technical improvements include adaptive sequential hypothesis verification and efficient model estimation via Gaussian elimination. Experiments on four standard datasets show that VSAC is significantly faster than all its predecessors and runs on average in 1-2 ms, on a CPU. It is two orders of magnitude faster and yet as precise as MAGSAC++, the currently most accurate estimator of two-view geometry. In the repeated runs on EVD, HPatches, PhotoTourism, and Kusvod2 datasets, it never failed.

A Benchmark for Burst Color Constancy

  • Autoři: Qian, Y., Käpylä, J., Kämäräinen, J.-K., Koskinen, S., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision – ECCV 2020 Workshops, Part III. Cham: Springer International Publishing, 2020. p. 359-375. LNCS. vol. 12537. ISSN 0302-9743. ISBN 978-3-030-67069-6.
  • Rok: 2020
  • DOI: 10.1007/978-3-030-67070-2_22
  • Odkaz: https://doi.org/10.1007/978-3-030-67070-2_22
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Burst Color Constancy (CC) is a recently proposed approach that challenges the conventional single-frame color constancy. The conventional approach is to use a single frame - shot frame - to estimate the scene illumination color. In burst CC, multiple frames from the view finder sequence are used to estimate the color of the shot frame. However, there are no realistic large-scale color constancy datasets with sequence input for method evaluation. In this work, a new such CC benchmark is introduced. The benchmark comprises of (1) 600 real-world sequences recorded with a high-resolution mobile phone camera, (2) a fixed train-test split which ensures consistent evaluation, and (3) a baseline method which achieves high accuracy in the new benchmark and the dataset used in previous works. Results for more than 20 well-known color constancy methods including the recent state-of-the-arts are reported in our experiments.

A new semi-supervised method improving optical flow on distant domains

  • Autoři: Novák, T., Mgr. Jan Šochman, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of the 25th Computer Vision Winter Workshop Conference February 3-5, 2020, Rogaška Slatina, Slovenia. Ljubljana: Slovenian Pattern Recognition Society, 2020. p. 37-45. ISBN 978-961-90901-9-0.
  • Rok: 2020
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a semi-supervised approach to learning by formulating the optimization as constrained gradient descent on a loss function that includes unsupervised terms. The method is demonstrated on semi-supervised optical flow training that promotes photo-consistency and smoothness of the flow. We show that the unsupervised objective significantly improves the estimation on a distant domain while maintaining the performance on the original domain. As a result, we achieve state-of-the-art results on the Creative Flow+ dataset among CNN based methods that did not train on any samples from the dataset.

Autonomous Car Chasing

  • DOI: 10.1007/978-3-030-66823-5_20
  • Odkaz: https://doi.org/10.1007/978-3-030-66823-5_20
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We developed an autonomous driving system that can chase another vehicle using only images from a single RGB camera. At the core of the system is a novel dual-task convolutional neural network simultaneously performing object detection as well as coarse semantic segmentation. The system was firstly tested in CARLA simulations. We created a new challenging publicly available CARLA Car Chasing Dataset collected by manually driving the chased car. Using the dataset, we showed that the system that uses the semantic segmentation was able to chase the pursued car on average 16% longer than other versions of the system. Finally, we integrated the system into a sub-scale vehicle platform built on a high-speed RC car and demonstrated its capabilities by autonomously chasing another RC car.

BOP Challenge 2020 on 6D Object Localization

  • Autoři: Hodaň, T., Sundermeyer, M., Drost, B., Labbe, Y., Brachmann, E., Michel, F., Rother, C., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision – ECCV 2020 Workshops, Part II. Basel: Springer, 2020. p. 577-594. LNCS. vol. 12536. ISSN 0302-9743. ISBN 978-3-030-66095-6.
  • Rok: 2020
  • DOI: 10.1007/978-3-030-66096-3_39
  • Odkaz: https://doi.org/10.1007/978-3-030-66096-3_39
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This paper presents the evaluation methodology, datasets, and results of the BOP Challenge 2020, the third in a series of public competitions organized with the goal to capture the status quo in the field of 6D object pose estimation from an RGB-D image. In 2020, to reduce the domain gap between synthetic training and real test RGB images, the participants were provided 350K photorealistic trainining images generated by BlenderProc4BOP, a new open-source and light-weight physically-based renderer (PBR) and procedural data generator. Methods based on deep neural networks have finally caught up with methods based on point pair features, which were dominating previous editions of the challenge. Although the top-performing methods rely on RGB-D image channels, strong results were achieved when only RGB channels were used at both training and test time -- out of 26 evaluated methods, the third method was trained on RGB channels of PBR and real images, while the fifth was trained on PBR images only. Strong data augmentation was identified as a key component of the top-performing CosyPose method, and the photorealism of PBR images was demonstrated effective despite the augmentation. The online evaluation system stays open and is available at the project website: bop.felk.cvut.cz.

D3S - A discriminative single shot segmentation tracker

  • Autoři: Lukežič, A., prof. Ing. Jiří Matas, Ph.D., Kristan, M.
  • Publikace: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2020. p. 7131-7140. ISSN 2575-7075. ISBN 978-1-7281-7168-5.
  • Rok: 2020
  • DOI: 10.1109/CVPR42600.2020.00716
  • Odkaz: https://doi.org/10.1109/CVPR42600.2020.00716
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker - D3S, which narrows the gap between visual object tracking and video object segmentation. A single-shot network applies two target models with complementary geometric properties, one invariant to a broad range of transformations, including non-rigid deformations, the other assuming a rigid object to simultaneously achieve high robustness and online target segmentation. Without per-dataset finetuning and trained only for segmentation as the primary output, D3S outperforms all trackers on VOT2016, VOT2018 and GOT-10k benchmarks and performs close to the state-of-the-art trackers on the TrackingNet. D3S outperforms the leading segmentation tracker SiamMask on video object segmentation benchmarks and performs on par with top video object segmentation algorithms, while running an order of magnitude faster, close to real-time.

EPOS: Estimating 6D Pose of Objects with Symmetries

  • Autoři: Hodaň, T., Baráth, D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2020. p. 11700-11709. ISSN 1063-6919. ISBN 978-1-7281-7169-2.
  • Rok: 2020
  • DOI: 10.1109/CVPR42600.2020.01172
  • Odkaz: https://doi.org/10.1109/CVPR42600.2020.01172
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present a new method for estimating the 6D pose of rigid objects with available 3D models from a single RGB input image. The method is applicable to a broad range of objects, including challenging ones with global or partial symmetries. An object is represented by compact surface fragments which allow handling symmetries in a systematic manner. Correspondences between densely sampled pixels and the fragments are predicted using an encoder-decoder network. At each pixel, the network predicts: (i) the probability of each object's presence, (ii) the probability of the fragments given the object's presence, and (iii) the precise 3D location on each fragment. A data-dependent number of corresponding 3D locations is selected per pixel, and poses of possibly multiple object instances are estimated using a robust and efficient variant of the PnP-RANSAC algorithm. In the BOP Challenge 2019, the method outperforms all RGB and most RGB-D and D methods on the T-LESS and LM-O datasets. On the YCB-V dataset, it is superior to all competitors, with a large margin over the second-best RGB method. Source code is at: cmp.felk.cvut.cz/epos.

Fungi Recognition: A Practical Use Case

  • Autoři: Šulc, M., Picek, L., prof. Ing. Jiří Matas, Ph.D., Jeppesen, T.S., Heilmann-Clausen, J.
  • Publikace: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). New Jersey: IEEE, 2020. p. 2305-2313. ISSN 2642-9381. ISBN 978-1-7281-6553-0.
  • Rok: 2020
  • DOI: 10.1109/WACV45572.2020.9093624
  • Odkaz: https://doi.org/10.1109/WACV45572.2020.9093624
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper presents a system for visual recognition of 1394 fungi species based on deep convolutional neural networks and its deployment in a citizen-science project. The system allows users to automatically identify observed specimens, while providing valuable data to biologists and computer vision researchers. The underlying classification method scored first in the FGVCx Fungi Classification Kaggle competition organized in connection with the Fine-Grained Visual Categorization (FGVC) workshop at CVPR 2018. We describe our winning submission and evaluate all technicalities that increased the recognition scores, and discuss the issues related to deployment of the system via the web- and mobile- interfaces.

Guiding Monocular Depth Estimation Using Depth-Attention Volume

  • Autoři: Huynh, L., Nguyen-Ha, P., prof. Ing. Jiří Matas, Ph.D., Rahtu, E.
  • Publikace: Computer Vision - ECCV 2020, Part XXVI. Cham: Springer, 2020. p. 581-597. LNCS. vol. 12371. ISSN 0302-9743. ISBN 978-3-030-58573-0.
  • Rok: 2020
  • DOI: 10.1007/978-3-030-58574-7_35
  • Odkaz: https://doi.org/10.1007/978-3-030-58574-7_35
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Recovering the scene depth from a single image is an ill-posed problem that requires additional priors, often referred to as monocular depth cues, to disambiguate different 3D interpretations. In recent works, those priors have been learned in an end-to-end manner from large datasets by using deep neural networks. In this paper, we propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments. This is achieved by incorporating a non-local coplanarity constraint to the network with a novel attention mechanism called depth-attention volume (DAV). Experiments on two popular indoor datasets, namely NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results while using only a fraction of the number of parameters needed by the competing methods. Code is available at: https://github.com/HuynhLam/DAV.

H-Patches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors

  • Autoři: Balntas, V., Lenc, K., Vedaldi, A., Tuytelaars, T., prof. Ing. Jiří Matas, Ph.D., Mikolajczyk, K.
  • Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020, 42(11), 2825-2841. ISSN 0162-8828.
  • Rok: 2020
  • DOI: 10.1109/TPAMI.2019.2915233
  • Odkaz: https://doi.org/10.1109/TPAMI.2019.2915233
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In this paper, a novel benchmark is introduced for evaluating local image descriptors. We demonstrate limitations of the commonly used datasets and evaluation protocols, that lead to ambiguities and contradictory results in the literature. Furthermore, these benchmarks are nearly saturated due to the recent improvements in local descriptors obtained by learning from large annotated datasets. To address these issues, we introduce a new large dataset suitable for training and testing modern descriptors, together with strictly defined evaluation protocols in several tasks such as matching, retrieval and verification. This allows for more realistic, thus more reliable comparisons in different application scenarios. We evaluate the performance of several state-of-the-art descriptors and analyse their properties. We show that a simple normalisation of traditional hand-crafted descriptors is able to boost their performance to the level of deep learning based descriptors once realistic benchmarks are considered. Additionally we specify a protocol for learning and evaluating using cross validation. We show that when training state-of-the-art descriptors on this dataset, the traditional verification task is almost entirely saturated.

Learning Surrogates via Deep Embedding

  • Autoři: Patel, Y., Hodaň, T., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision - ECCV 2020, Part XXX. Cham: Springer International Publishing, 2020. p. 205-221. LNCS. vol. 12375. ISSN 0302-9743. ISBN 978-3-030-58576-1.
  • Rok: 2020
  • DOI: 10.1007/978-3-030-58577-8_13
  • Odkaz: https://doi.org/10.1007/978-3-030-58577-8_13
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This paper proposes a technique for training a neural network by minimizing a surrogate loss that approximates the target evaluation metric, which may be non-differentiable. The surrogate is learned via a deep embedding where the Euclidean distance between the prediction and the ground truth corresponds to the value of the evaluation metric. The effectiveness of the proposed technique is demonstrated in a post-tuning setup, where a trained model is tuned using the learned surrogate. Without a significant computational overhead and any bells and whistles, improvements are demonstrated on challenging and practical tasks of scene-text recognition and detection. In the recognition task, the model is tuned using a surrogate approximating the edit distance metric and achieves up to 39% relative improvement in the total edit distance. In the detection task, the surrogate approximates the intersection over union metric for rotated bounding boxes and yields up to 4.25% relative improvement in the F1 score.

LSD_2 - Joint Denoising and Deblurring of Short and Long Exposure Images with CNNs

  • Autoři: Mustaniemi, J., Kannala, J., prof. Ing. Jiří Matas, Ph.D., Särkkä, S., Heikkilä, J.
  • Publikace: BMVC2020: Proceedings of the British Machine Vision Conference. London: British Machine Vision Association, 2020.
  • Rok: 2020
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper addresses the problem of acquiring high-quality photographs with hand-held smartphone cameras in low-light imaging conditions. We propose an approachbased on capturing pairs of short and long exposure images in rapid succession andfusing them into a single high-quality photograph. Unlike existing methods, we takeadvantage of both images simultaneously and perform a joint denoising and deblurringusing a convolutional neural network. A novel approach is introduced to generate real-istic short-long exposure image pairs. The method produces good images in extremelychallenging conditions and outperforms existing denoising and deblurring methods. Italso enables exposure fusion in the presence of motion blur.

MAGSAC++, a Fast, Reliable and Accurate Robust Estimator

  • Autoři: Baráth, D., Nosková, J., Ivashechkin, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2020. p. 1301-1309. ISSN 1063-6919. ISBN 978-1-7281-7169-2.
  • Rok: 2020
  • DOI: 10.1109/CVPR42600.2020.00138
  • Odkaz: https://doi.org/10.1109/CVPR42600.2020.00138
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A new method for robust estimation, MAGSAC++ , is proposed. It introduces a new model quality (scoring) function that does not require the inlier-outlier decision, and a novel marginalization procedure formulated as an M-estimation with a novel class of M-estimators (a robust kernel) solved by an iteratively re-weighted least squares procedure. We also propose a new sampler, Progressive NAPSAC, for RANSAC-like robust estimators. Exploiting the fact that nearby points often originate from the same model in real-world data, it finds local structures earlier than global samplers. The progressive transition from local to global sampling does not suffer from the weaknesses of purely localized samplers. On six publicly available realworld datasets for homography and fundamental matrix fitting, MAGSAC++ produces results superior to the state-of-the-art robust methods. It is faster, more geometrically accurate and fails less often.

Restoration of Fast Moving Objects

  • Autoři: Kotera, J., prof. Ing. Jiří Matas, Ph.D., Sroubek, F.
  • Publikace: IEEE Transactions on Image Processing. 2020, 29 8577-8589. ISSN 1057-7149.
  • Rok: 2020
  • DOI: 10.1109/TIP.2020.3016490
  • Odkaz: https://doi.org/10.1109/TIP.2020.3016490
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    If an object is photographed at motion in front of a static background, the object will be blurred while the background sharp and partially occluded by the object. The goal is to recover the object appearance from such blurred image. We adopt the image formation model for fast moving objects and consider objects undergoing 2D translation and rotation. For this scenario we formulate the estimation of the object shape, appearance, and motion from a single image and known background as a constrained optimization problem with appropriate regularization terms. Both similarities and differences with blind deconvolution are discussed with the latter caused mainly by the coupling of the object appearance and shape in the acquisition model. Necessary conditions for solution uniqueness are derived and a numerical solution based on the alternating direction method of multipliers is presented. The proposed method is evaluated on a new dataset.

Robust Audio-Based Vehicle Counting in Low-to-Moderate Traffic Flow

  • Autoři: Dukanović, S., prof. Ing. Jiří Matas, Ph.D., Virtanen, T.
  • Publikace: 2020 IEEE Intelligent Vehicles Symposium (IV). Piscataway: IEEE Industrial Electronics Society, 2020. p. 1608-1614. ISSN 2642-7214. ISBN 978-1-7281-6673-5.
  • Rok: 2020
  • DOI: 10.1109/IV47402.2020.9304600
  • Odkaz: https://doi.org/10.1109/IV47402.2020.9304600
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper presents a method for audio-based vehicle counting (VC) in low-to-moderate traffic using one-channel sound. We formulate VC as a regression problem, i.e., we predict the distance between a vehicle and the microphone. Minima of the proposed distance function correspond to vehicles passing by the microphone. V C is carried out via local minima detection in the predicted distance. We propose to set the minima detection threshold at a point where the probabilities of false positives and false negatives coincide so they statistically cancel each other in total vehicle number. The method is trained and tested on a traffic-monitoring dataset comprising 422 short, 20-second one-channel sound files with a total of 1421 vehicles passing by the microphone. Relative V C error in a traffic location not used in the training is below 2% within a wide range of detection threshold values. Experimental results show that the regression accuracy in noisy environments is improved by introducing a novel high-frequency power feature.

Saddle: Fast and repeatable features with good coverage

  • DOI: 10.1016/j.imavis.2019.08.011
  • Odkaz: https://doi.org/10.1016/j.imavis.2019.08.011
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A novel similarity-covariant feature detector that extracts points whose neighborhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile is presented. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Saddle is a fast approximation of Hessian detector as ORB, that implements the FAST detector, is for Harris detector. We propose to use the matching strategy called the first geometric inconsistent with binary descriptors that is suitable for our feature detector, including experiments with fix point descriptors hand-crafted and learned. Experiments show that the Saddle features are general, evenly spread and appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets. Compared to recently proposed deep-learning based interest point detectors and popular hand-crafted keypoint detectors, evaluated for repeatability in the ApolloScape dataset [1], the Saddle detectors shows the best performance in most of the street-level view sequences a.k.a. traversals.

Sub-Frame Appearance and 6D Pose Estimation of Fast Moving Objects

  • Autoři: Ing. Denys Rozumnyi, Kotera, J., Šroubek, F., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2020. p. 6777-6785. ISSN 2575-7075. ISBN 978-1-7281-7168-5.
  • Rok: 2020
  • DOI: 10.1109/CVPR42600.2020.00681
  • Odkaz: https://doi.org/10.1109/CVPR42600.2020.00681
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a novel method that tracks fast moving objects, mainly non-uniform spherical, in full 6 degrees of freedom, estimating simultaneously their 3D motion trajectory, 3D pose and object appearance changes with a time step that is a fraction of the video frame exposure time. The sub-frame object localization and appearance estimation allows realistic temporal super-resolution and precise shape estimation. The method, called TbD-3D (Tracking by Deblatting in 3D) relies on a novel reconstruction algorithm which solves a piece-wise deblurring and matting problem. The 3D rotation is estimated by minimizing the reprojection error. As a second contribution, we present a new challenging dataset with fast moving objects that change their appearance and distance to the camera. High-speed camera recordings with zero lag between frame exposures were used to generate videos with different frame rates annotated with ground-truth trajectory and pose.

The Eighth Visual Object Tracking VOT2020 Challenge Results

  • DOI: 10.1007/978-3-030-68238-5_39
  • Odkaz: https://doi.org/10.1007/978-3-030-68238-5_39
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity organized by the VOT initiative. Results of 58 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The VOT2020 challenge was composed of five sub-challenges focusing on different tracking domains: (i) VOT-ST2020 challenge focused on short-term tracking in RGB, (ii) VOT-RT2020 challenge focused on “real-time” short-term tracking in RGB, (iii) VOT-LT2020 focused on long-term tracking namely coping with target disappearance and reappearance, (iv) VOT-RGBT2020 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2020 challenge focused on long-term tracking in RGB and depth imagery. Only the VOT-ST2020 datasets were refreshed. A significant novelty is introduction of a new VOT short-term tracking evaluation methodology, and introduction of segmentation ground truth in the VOT-ST2020 challenge – bounding boxes will no longer be used in the VOT-ST challenges. A new VOT Python toolkit that implements all these novelites was introduced. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

A Summary of the 4th International Workshop on Recovering 6D Object Pose

  • Autoři: Hodaň, T., Kouskouridas, R., Kim, T.-K., Tombari, F., Bekris, K., Drost, B., Groueix, T., Walas, K., Lepetit, V., Leonardis, A., Steger, C., Michel, F., Sahin, C., Rother, C., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision – ECCV 2018 Workshops. Basel: Springer, 2019. p. 589-600. Lecture Notes in Computer Science. vol. 11129. ISSN 1611-3349. ISBN 978-3-030-11009-3.
  • Rok: 2019
  • DOI: 10.1007/978-3-030-11009-3_36
  • Odkaz: https://doi.org/10.1007/978-3-030-11009-3_36
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This document summarizes the 4th International Workshop on Recovering 6D Object Pose which was organized in conjunction with ECCV 2018 in Munich. The workshop featured four invited talks, oral and poster presentations of accepted workshop papers, and an introduction of the BOP benchmark for 6D object pose estimation. The workshop was attended by 100+ people working on relevant topics in both academia and industry who shared up-to-date advances and discussed open problems.

Care Label Recognition

  • Autoři: Králíček, J., prof. Ing. Jiří Matas, Ph.D., Bušta, M.
  • Publikace: ICDAR2019: Proceedings of the 15th IAPR International Conference on Document Analysis and Recognition. Piscataway, NJ: IEEE, 2019. p. 959-966. ISSN 1520-5363. ISBN 978-1-7281-3015-6.
  • Rok: 2019
  • DOI: 10.1109/ICDAR.2019.00158
  • Odkaz: https://doi.org/10.1109/ICDAR.2019.00158
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper introduces the problem of care label recognition and presents a method addressing it. A care label,also called a care tag, is a small piece of cloth or paper attached to a garment providing instructions for its maintenance and information about e.g. the material and size. The information and instructions are written as symbols or plain text.Care label recognition is a challenging text and pictogram recognition problem – the often sewn text is small, looking as if printed using a non-standard font; the contrast of the text gradually fades, making OCR progressively more difficult. On the other hand, the information provided is typically redundant and thus it facilitates semi-supervised learning.The presented care label recognition method is based on the recently published End-to-End Method for Multi-Language Scene Text, E2E-MLT, Busta et al. 2018, exploiting specific constraints, e.g. a care label vocabulary with multi-language equivalences.Experiments conducted on a newly-created dataset of 63care label images show that even when exploiting problem-specific constraints, a state-of-the-art scene text detection and recognition method achieve precision and recall slightly above0.6, confirming the challenging nature of the problem.

CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark

  • Autoři: Lukežič, A., Kart, U., Durmush, A., Kämäräinen, J.-K., prof. Ing. Jiří Matas, Ph.D., Kristan, M.
  • Publikace: 2019 IEEE International Conference on Computer Vision (ICCV 2019). Los Alamitos: IEEE Computer Society Press, 2019. p. 10012-10021. ISSN 2380-7504. ISBN 978-1-7281-4803-8.
  • Rok: 2019
  • DOI: 10.1109/ICCV.2019.01011
  • Odkaz: https://doi.org/10.1109/ICCV.2019.01011
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a new color-and-depth general visual object tracking benchmark (CDTB). CDTB is recorded by several passive and active RGB-D setups and contains indoor as well as outdoor sequences acquired in direct sunlight. The CDTB dataset is the largest and most diverse dataset in RGB-D tracking, with an order of magnitude larger number of frames than related datasets. The sequences have been carefully recorded to contain significant object pose change, clutter, occlusion, and periods of long-term target absence to enable tracker evaluation under realistic conditions. Sequences are per-frame annotated with 13 visual attributes for detailed analysis. Experiments with RGB and RGB-D trackers show that CDTB is more challenging than previous datasets. State-of-the-art RGB trackers outperform the recent RGB-D trackers, indicating a large gap between the two fields, which has not been previously detected by the prior benchmarks. Based on the results of the analysis we point out opportunities for future research in RGB-D tracker design.

Continual Occlusion and Optical Flow Estimation

  • DOI: 10.1007/978-3-030-20870-7_10
  • Odkaz: https://doi.org/10.1007/978-3-030-20870-7_10
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Two optical flow estimation problems are addressed: (i) occlusion estimation and handling, and (ii) estimation from image sequences longer than two frames. The proposed ContinualFlow method estimates occlusions before flow, avoiding the use of flow corrupted by occlusions for their estimation. We show that providing occlusion masks as an additional input to flow estimation improves the standard performance metric by more than 25% on both KITTI and Sintel. As a second contribution, a novel method for incorporating information from past frames into flow estimation is introduced. The previous frame flow serves as an input to occlusion estimation and as a prior in occluded regions, i.e. those without visual correspondences. By continually using the previous frame flow, ContinualFlow performance improves further by 18% on KITTI and 7% on Sintel, achieving top performance on KITTI and Sintel. © 2019, Springer Nature Switzerland AG.

Cumulative attribute space regression for head pose estimation and color constancy

  • Autoři: Chen, K., Jia, K., Huttunen, H., prof. Ing. Jiří Matas, Ph.D., Kämäräinena, J.-K.
  • Publikace: Pattern recognition. 2019, 87 29-37. ISSN 0031-3203.
  • Rok: 2019
  • DOI: 10.1016/j.patcog.2018.10.015
  • Odkaz: https://doi.org/10.1016/j.patcog.2018.10.015
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Two-stage Cumulative Attribute (CA) regression has been found effective in regression problems of computer vision such as facial age and crowd density estimation. The first stage regression maps input features to cumulative attributes that encode correlations between target values. The previous works have dealt with single output regression. In this work, we propose cumulative attribute spaces for 2- and 3-output (multivariate) regression. We show how the original CA space can be generalized to multiple output by the Cartesian product (CartCA). However, for target spaces with more than two outputs the CartCA becomes computationally infeasible and therefore we propose an approximate solution - multi-view CA (MvCA) - where CartCA is applied to output pairs. We experimentally verify improved performance of the CartCA and MvCA spaces in 2D and 3D face pose estimation and three-output (RGB) illuminant estimation for color constancy.

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

  • Autoři: Bušta, M., Patel, Y., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ACCVW 2018: Proceedings of the 14th Asian Conference on Computer Vision Workshops. Cham: Springer, 2019. p. 127-143. LNCS. vol. 11367. ISSN 0302-9743. ISBN 978-3-030-21073-1.
  • Rok: 2019
  • DOI: 10.1007/978-3-030-21074-8_11
  • Odkaz: https://doi.org/10.1007/978-3-030-21074-8_11
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    An end-to-end trainable (fully differentiable) method for multi-language scene text localization and recognition is proposed. The approach is based on a single fully convolutional network (FCN) with shared layers for both tasks. E2E-MLT is the first published multi-language OCR for scene text. While trained in multi-language setup, E2E-MLT demonstrates competitive performance when compared to other methods trained for English scene text alone. The experiments show that obtaining accurate multi-language multi-script annotations is a challenging problem. Code and trained models are released publicly at https://github.com/MichalBusta/E2E-MLT.

Flash Lightens Gray Pixels

  • Autoři: Qian, Y., Yan, S., Kämäräinen, J.-K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2019 IEEE International Conference on Image Processing (ICIP). Piscataway, NJ: IEEE, 2019. p. 4604-4608. ISSN 2381-8549. ISBN 978-1-5386-6249-6.
  • Rok: 2019
  • DOI: 10.1109/ICIP.2019.8803468
  • Odkaz: https://doi.org/10.1109/ICIP.2019.8803468
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In the real world, a scene is usually cast by multiple illuminants and herein we address the problem of spatial illumination estimation. Our solution is based on detecting gray pixels with the help of flash photography. We show that flash photography significantly improves the performance of gray pixel detection without illuminant prior, training data or calibration of the flash. We also introduce a novel flash photography dataset generated from the MIT intrinsic dataset.

FuCoLoT – A Fully-Correlational Long-Term Tracker

  • Autoři: Lukežič, A., Zajc, L.Č., Ing. Tomáš Vojíř, Ph.D., prof. Ing. Jiří Matas, Ph.D., Kristan, M.
  • Publikace: ACCV 2018: Proceedings of the 14th Asian Conference on Computer Vision, Part II. Springer, 2019. p. 595-611. LNCS. vol. 11362. ISSN 0302-9743. ISBN 978-3-030-20889-9.
  • Rok: 2019
  • DOI: 10.1007/978-3-030-20890-5_38
  • Odkaz: https://doi.org/10.1007/978-3-030-20890-5_38
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose FuCoLoT – a Fully Correlational Long-term Tracker. It exploits the novel DCF constrained filter learning method to design a detector that is able to re-detect the target in the whole image efficiently. FuCoLoT maintains several correlation filters trained on different time scales that act as the detector components. A novel mechanism based on the correlation response is used for tracking failure estimation. FuCoLoT achieves state-of-the-art results on standard short-term benchmarks and it outperforms the current best-performing tracker on the long-term UAV20L benchmark by over 19%. It has an order of magnitude smaller memory footprint than its best-performing competitors and runs at 15Â fps in a single CPU thread.

Gyroscope-Aided Motion Deblurring with Deep Networks

  • Autoři: Mustaniemi, J., Kannala, J., Sarkka, S., prof. Ing. Jiří Matas, Ph.D., Heikkil ̈a, J.
  • Publikace: 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION. NEW YORK, NY: IEEE, 2019. p. 1914-1922. IEEE Winter Conference on Applications of Computer Vision. ISSN 2472-6737. ISBN 978-1-7281-1975-5.
  • Rok: 2019
  • DOI: 10.1109/WACV.2019.00208
  • Odkaz: https://doi.org/10.1109/WACV.2019.00208
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a deblurring method that incorporates gyroscope measurements into a convolutional neural network (CNN). With the help of such measurements, it can handle extremely strong and spatially-variant motion blur. At the same time, the image data is used to overcome the limitations of gyro-based blur estimation. To train our network, we also introduce a novel way of generating realistic training data using the gyroscope. The evaluation shows a clear improvement in visual quality over the state-of-the-art while achieving real-time performance. Furthermore, the method is shown to improve the performance of existing feature detectors and descriptors against the motion blur.

How to make an RGBD tracker?

  • Autoři: Kart, U., Kämäräinen, J.-K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision – ECCV 2018 Workshops. Basel: Springer, 2019. p. 148-161. Lecture Notes in Computer Science. vol. 11129. ISSN 0302-9743. ISBN 978-3-030-11008-6.
  • Rok: 2019
  • DOI: 10.1007/978-3-030-11009-3_8
  • Odkaz: https://doi.org/10.1007/978-3-030-11009-3_8
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a generic framework for converting an arbitrary short-term RGB tracker into an RGBD tracker. The proposed framework has two mild requirements – the short-term tracker provides a bounding box and its object model update can be stopped and resumed. The core of the framework is a depth augmented foreground segmentation which is formulated as an energy minimization problem solved by graph cuts. The proposed framework offers two levels of integration. The first requires that the RGB tracker can be stopped and resumed according to the decision on target visibility. The level-two integration requires that the tracker accept an external mask (foreground region) in the target update. We integrate in the proposed framework the Discriminative Correlation Filter (DCF), and three state-of-the-art trackers – Efficient Convolution Operators for Tracking (ECOhc, ECOgpu) and Discriminative Correlation Filter with Channel and Spatial Reliability (CSR-DCF). Comprehensive experiments on Princeton Tracking Benchmark (PTB) show that level-one integration provides significant improvements for all trackers: DCF average rank improves from 18th to 17th, ECOgpu from 16th to 10th, ECOhc from 15th to 5th and CSR-DCF from 19th to 14th. CSR-DCF with level-two integration achieves the top rank by a clear margin on PTB. Our framework is particularly powerful in occlusion scenarios where it provides 13.5% average improvement and 26% for the best tracker (CSR-DCF).

ICDAR2019 Robust Reading Challenge onMulti-lingual Scene Text Detection and Recognition– RRC-MLT-2019

  • Autoři: Nayef, N., Patel, Y., Bušta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., prof. Ing. Jiří Matas, Ph.D., Pal, U., Burie, J.-Ch., Liu, Ch., Ogier, J.-M.
  • Publikace: ICDAR2019: Proceedings of the 15th IAPR International Conference on Document Analysis and Recognition. Piscataway, NJ: IEEE, 2019. p. 1582-1587. ISSN 1520-5363. ISBN 978-1-7281-3015-6.
  • Rok: 2019
  • DOI: 10.1109/ICDAR.2019.00254
  • Odkaz: https://doi.org/10.1109/ICDAR.2019.00254
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense.With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of theRRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a large scale multi-lingual synthetic dataset to assist the training, and a baseline End-to-End recognition method.The real dataset consists of 20,000 images containing text from 10 languages. The challenge has 4 tasks covering various aspects of multi-lingual scene text: (a) text detection, (b) cropped word script classification, (c) joint text detection and script classification and (d) end-to-end detection and recognition. In total, the competition received 60 submissions from the research and industrial communities. This paper presents the dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge.

Improving CNN classifiers by estimating test-time priors

  • Autoři: Šulc, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2019 IEEE International Conference on Computer Vision Workshops (ICCVW 2019). Los Alamitos: IEEE Computer Society, 2019. p. 3220-3226. ISSN 2473-9944. ISBN 978-1-7281-5023-9.
  • Rok: 2019
  • DOI: 10.1109/ICCVW.2019.00402
  • Odkaz: https://doi.org/10.1109/ICCVW.2019.00402
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The problem of different training and test set class priors is addressed in the context of CNN classifiers. We compare two approaches to the estimation of the unknown test priors: an existing Maximum Likelihood Estimation (MLE) method and a proposed Maximum a Posteriori (MAP) approach introducing a Dirichlet hyper-prior on the class prior probabilities. Experimental results show a significant improvement in the fine-grained classification tasks using known evaluation-time priors, increasing top-1 accuracy by 4.0%on the FGVC iNaturalist 2018 validation set and by 3.9%on the FGVCx Fungi 2018 validation set. Estimation ofthe unknown test set priors noticeably increases the accuracy on the PlantCLEF dataset, allowing a single CNNmodel to achieve state-of-the-art results and to outperform the competition-winning ensemble of 12 CNNs. The pro-posed MAP estimation increases the prediction accuracy by2.8% on PlantCLEF 2017 and by 1.8% on FGVCx Fungi,where the MLE method decreases accuracy.

Intra-frame Object Tracking by Deblatting

  • Autoři: Kotera, J., Ing. Denys Rozumnyi, Šroubek, F., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2019 IEEE International Conference on Computer Vision Workshops (ICCVW 2019). Los Alamitos: IEEE Computer Society, 2019. p. 2300-2309. ISSN 2473-9944. ISBN 978-1-7281-5023-9.
  • Rok: 2019
  • DOI: 10.1109/ICCVW.2019.00283
  • Odkaz: https://doi.org/10.1109/ICCVW.2019.00283
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Objects moving at high speed along complex trajectories often appear in videos, especially videos of sports. Such objects elapse non-negligible distance during exposure time of a single frame and therefore their position in the frame is not well defined. They appear as semi-transparent streaks due to the motion blur and cannot be reliably tracked by standard trackers. We propose a novel approach called Tracking by Deblatting based on the observation that motion blur is directly related to the intra-frame trajectory of an object. Blur is estimated by solving two intertwined inverse problems, blind deblurring and image matting, which we call deblatting. The trajectory is then estimated by fitting a piecewise quadratic curve, which models physically justifiable trajectories. As a result, tracked objects are precisely localized with higher temporal resolution than by conventional trackers. The proposed TbD tracker was evaluated on a newly created dataset of videos with ground truth obtained by a high-speed camera using a novel Trajectory-IoU metric that generalizes the traditional Intersection over Union and measures the accuracy of the intra-frame trajectory. The proposed method outperforms baseline both in recall and trajectory accuracy.

Leveraging Outdoor Webcams for Local Descriptor Learning

  • DOI: 10.3217/978-3-85125-652-9-06
  • Odkaz: https://doi.org/10.3217/978-3-85125-652-9-06
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present AMOS Patches, a large set of image cut-outs, intended primarily for the robustification of trainable local feature descriptors to illumination and appearance changes. Images contributing to AMOS Patches originate from the AMOS dataset of recordings from a large set of outdoor webcams. The semiautomatic method used to generate AMOS Patches is described. It includes camera selection, viewpoint clustering and patch selection. For training, we provide both the registered full source images as well as the patches. A new descriptor, trained on the AMOS Patches and 6Brown datasets, is introduced. It achieves state-of-the-art in matching under illumination changes onstandard benchmarks.

MAGSAC: Marginalizing Sample Consensus

  • Autoři: Baráth, D., prof. Ing. Jiří Matas, Ph.D., Nosková, J.
  • Publikace: CVPR 2019: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2019. p. 10189-10197. ISSN 2575-7075. ISBN 978-1-7281-3293-8.
  • Rok: 2019
  • DOI: 10.1109/CVPR.2019.01044
  • Odkaz: https://doi.org/10.1109/CVPR.2019.01044
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A method called, sigma-consensus, is proposed to eliminate the need for a user-defined inlier-outlier threshold in RANSAC. Instead of estimating the noise sigma, it is marginalized over a range of noise scales. The optimized model is obtained by weighted least-squares fitting where the weights come from the marginalization over sigma of the point likelihoods of being inliers. A new quality function is proposed not requiring sigma and, thus, a set of inliers to determine the model quality. Also, a new termination criterion for RANSAC is built on the proposed marginalization approach. Applying sigma-consensus, MAGSAC is proposed with no need for a user-defined sigma and improving the accuracy of robust estimation significantly. It is superior to the state-of-the-art in terms of geometric accuracy on publicly available real-world datasets for epipolar geometry (F and E) and homography estimation. In addition, applying sigma-consensus only once as a post-processing step to the RANSAC output always improved the model quality on a wide range of vision problems without noticeable deterioration in processing time, adding a few milliseconds.

Non-causal Tracking by Deblatting

  • DOI: 10.1007/978-3-030-33676-9_9
  • Odkaz: https://doi.org/10.1007/978-3-030-33676-9_9
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Tracking by Deblatting (Deblatting = deblurring and matting) stands for solving an inverse problem of deblurring and image matting for tracking motion-blurred objects. We propose non-causal Tracking by Deblatting which estimates continuous, complete and accurate object trajectories. Energy minimization by dynamic programming is used to detect abrupt changes of motion, called bounces. High-order polynomials are fitted to segments, which are parts of the trajectory separated by bounces. The output is a continuous trajectory function which assigns location for every real-valued time stamp from zero to the number of frames. Additionally, we show that from the trajectory function precise physical calculations are possible, such as radius, gravity or sub-frame object velocity. Velocity estimation is compared to the high-speed camera measurements and radars. Results show high performance of the proposed method in terms of Trajectory-IoU, recall and velocity estimation.

Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters

  • Autoři: Kart, U., Lukežič, A., Kristan, M., Kamarainen, J.-K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2019: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2019. p. 1339-1348. ISSN 2575-7075. ISBN 978-1-7281-3293-8.
  • Rok: 2019
  • DOI: 10.1109/CVPR.2019.00143
  • Odkaz: https://doi.org/10.1109/CVPR.2019.00143
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Standard RGB-D trackers treat the target as a 2D structure, which makes modelling appearance changes related even to out-of-plane rotation challenging. This limitation is addressed by the proposed long-term RGB-D tracker called OTR – Object Tracking by Reconstruction. OTR performs online 3D target reconstruction to facilitate robust learning of a set of view-specific discriminative correlation filters (DCFs). The 3D reconstruction supports two performance- enhancing features: (i) generation of an accurate spatial support for constrained DCF learning from its 2D projection and (ii) point-cloud based estimation of 3D pose change for selection and storage of view-specific DCFs which robustly localize the target after out-of-view rotation or heavy occlusion. Extensive evaluation on the Princeton RGB-D tracking and STC Benchmarks shows OTR outperforms the state-of-the-art by a large margin.

On Finding Gray Pixels

  • Autoři: Qian, Y., Kamarainen, J.-K., Nikkanen, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2019: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2019. p. 8054-8062. ISSN 2575-7075. ISBN 978-1-7281-3293-8.
  • Rok: 2019
  • DOI: 10.1109/CVPR.2019.00825
  • Odkaz: https://doi.org/10.1109/CVPR.2019.00825
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a novel grayness index for finding gray pixels and demonstrate its effectiveness and efficiency in illumination estimation. The grayness index, GI in short, is derived using the Dichromatic Reflection Model and is learning-free. GI allows to estimate one or multiple illumination sources in color-biased images. On standard single-illumination and multiple-illumination estimation benchmarks, GI outperforms state-of-the-art statistical methods and many recent deep methods. GI is simple and fast, written in a few dozen lines of code, processing a 1080p image in ~0.4 seconds with a non-optimized Matlab code.

Performance analysis of single-query 6-DoF camera pose estimation in self-driving setups

  • Autoři: Fu, J., Pertuz, S., prof. Ing. Jiří Matas, Ph.D., Kamarainen, J.
  • Publikace: Computer Vision and Image Understanding. 2019, 186 58-73. ISSN 1077-3142.
  • Rok: 2019
  • DOI: 10.1016/j.cviu.2019.04.009
  • Odkaz: https://doi.org/10.1016/j.cviu.2019.04.009
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In this work, we consider the problem of single-query 6-DoF camera pose estimation, i.e. estimating the position and orientation of a camera by using reference images and a point cloud. We perform a systematic comparison of three state-of-the-art strategies for 6-DoF camera pose estimation: feature-based, photometric-based and mutual-information-based approaches. Two standard datasets with self-driving setups are used for experiments, and the performance of the studied methods is evaluated in terms of success rate, translation error and maximum orientation error. Building on the analysis of the results, we evaluate a hybrid approach that combines feature-based and mutual-information-based pose estimation methods to benefit from their complementary properties for pose estimation. Experiments show that (1) in cases with large appearance change between query and reference, the hybrid approach outperforms feature-based and mutual-information-based approaches by an average increment of 9.4% and 8.7% in the success rate, respectively; (2) in cases where query and reference images are captured at similar imaging conditions, the hybrid approach performs similarly as the feature-based approach, but outperforms both photometric-based and mutual-informationbased approaches with a clear margin; (3) the feature-based approach is consistently more accurate than mutual-information-based and photometric-based approaches when at least 4 consistent matching points are found between the query and reference images.

Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm

  • Autoři: Baráth, D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2019 IEEE International Conference on Computer Vision (ICCV 2019). Los Alamitos: IEEE Computer Society Press, 2019. p. 3779-3787. ISSN 1550-5499. ISBN 978-1-7281-4804-5.
  • Rok: 2019
  • DOI: 10.1109/ICCV.2019.00388
  • Odkaz: https://doi.org/10.1109/ICCV.2019.00388
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Progressive-X algorithm, Prog-X in short, is proposed for geometric multi-model fitting. The method interleaves sampling and consolidation of the current data interpretation via repetitive hypothesis proposal, fast rejection, and integration of the new hypothesis into the kept instance set by labeling energy minimization. Due to exploring the data progressively, the method has several beneficial properties compared with the state-of-the-art. First, a clear criterion, adopted from RANSAC, controls the termination and stops the algorithm when the probability of finding a new model with a reasonable number of inliers falls below a threshold. Second, Prog-X is an any-time algorithm. Thus, whenever is interrupted, e.g. due to a time limit, the returned instances cover real and, likely, the most dominant ones. The method is superior to the state-of-the-art in terms of accuracy in both synthetic experiments and on publicly available real-world datasets for homography, two-view motion, and motion segmentation.

Recognition of the Amazonian Flora by Inception Networks with Test-time Class Prior Estimation - CMP submission to PlantCLEF 2019

  • Autoři: Picek, L., Šulc, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum. CEUR-WS.org, 2019. vol. 2380. ISSN 1613-0073.
  • Rok: 2019
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper describes an automatic system for recognition of 10,000 plant species, with focus on species from the Guiana shield and the Amazon rain forest. The proposed system achieves the best results on the PlantCLEF 2019 test set with 31.9% accuracy. Compared against human experts in plant recognition, the system performed better than 3 of the 5 participating human experts and achieved 41.0% accuracy on the subset for expert evaluation. The proposed system is based on the Inception-v4 and Inception-ResNet-v2 Convolutional Neural Network (CNN) architectures. Performance improvements were achieved by: adjusting the CNN predictions according to the estimated change of the class prior probabilities, replacing network parameters with their running averages, testtime data augmentation, filtering the provided training set and adding additional training images from GBIF.

Revisiting gray pixel for statistical illumination estimation

  • Autoři: Qian, Y., Pertuz, S., Nikkanen, J., Kämäräinen, J.-K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: VISAPP2019: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Volume 5. Porto: SciTePress - Science and Technology Publications, 2019. p. 36-46. ISBN 978-989-758-354-4.
  • Rok: 2019
  • DOI: 10.5220/0007406900360046
  • Odkaz: https://doi.org/10.5220/0007406900360046
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present a statistical color constancy method that relies on novel gray pixel detection and mean shift clustering. The method, called Mean Shifted Grey Pixel – MSGP, is based on the observation: true-gray pixels are aligned towards one single direction. Our solution is compact, easy to compute and requires no training. Experiments on two real-world benchmarks show that the proposed approach outperforms state-of-the-art methods in the camera-agnostic scenario. In the setting where the camera is known, MSGP outperforms all statistical methods.

The Seventh Visual Object Tracking VOT2019 Challenge Results

  • Autoři: Kristan, M., prof. Ing. Jiří Matas, Ph.D., Leonardis, A., Mgr. Ondřej Drbohlav, Ph.D.,
  • Publikace: 2019 IEEE International Conference on Computer Vision Workshops (ICCVW 2019). Los Alamitos: IEEE Computer Society, 2019. p. 2206-2241. ISSN 2473-9944. ISBN 978-1-7281-5023-9.
  • Rok: 2019
  • DOI: 10.1109/ICCVW.2019.00276
  • Odkaz: https://doi.org/10.1109/ICCVW.2019.00276
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOTST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on "real-time" shortterm tracking in RGB, (iii) VOT-LT2019 focused on longterm tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard shortterm, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website.

The sixth visual object tracking VOT2018 challenge results

  • Autoři: Kristan, M., Leonardis, A., prof. Ing. Jiří Matas, Ph.D., Felsberg, M., Ing. Tomáš Vojíř, Ph.D.,
  • Publikace: Computer Vision – ECCV 2018 Workshops. Basel: Springer, 2019. p. 3-53. Lecture Notes in Computer Science. vol. 11129. ISSN 0302-9743. ISBN 978-3-030-11008-6.
  • Rok: 2019
  • DOI: 10.1007/978-3-030-11009-3_1
  • Odkaz: https://doi.org/10.1007/978-3-030-11009-3_1
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new long-term tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

Visual Coin-Tracking: Tracking of Planar Double-Sided Objects

  • DOI: 10.1007/978-3-030-33676-9_22
  • Odkaz: https://doi.org/10.1007/978-3-030-33676-9_22
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We introduce a new video analysis problem – tracking of rigid planar objects in sequences where both their sides are visible. Such coin-like objects often rotate fast with respect to an arbitrary axis producing unique challenges, such as fast incident light and aspect ratio change and rotational motion blur. Despite being common, neither tracking sequences containing coin-like objects nor suitable algorithm have been published. As a second contribution, we present a novel coin-tracking benchmark containing 17 video sequences annotated with object segmentation masks. Experiments show that the sequences differ significantly from the ones encountered in standard tracking datasets. We propose a baseline coin-tracking method based on convolutional neural network segmentation and explicit pose modeling. Its performance confirms that coin-tracking is an open and challenging problem.

ALFA: Agglomerative Late Fusion Algorithm for Object Detection

  • Autoři: Razinkov, E., Saveleva, I., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2018 24rd International Conference on Pattern Recognition (ICPR). Piscataway, NJ: IEEE, 2018. p. 2594-2599. ISSN 1051-4651. ISBN 978-1-5386-3788-3.
  • Rok: 2018
  • DOI: 10.1109/ICPR.2018.8545182
  • Odkaz: https://doi.org/10.1109/ICPR.2018.8545182
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose ALFA - a novel late fusion algorithm for object detection. ALFA is based on agglomerative clustering of object detector predictions taking into consideration both the bounding box locations and the class scores. Each cluster represents a single object hypothesis whose location is a weighted combination of the clustered bounding boxes.

BOP: Benchmark for 6D Object Pose Estimation

  • Autoři: Hodaň, T., Michel, F., Brachmann, E., Kehl, W., Buch, A.G., Kraft, D., Drost, B., Vidal, J., Ihrke, S., Zabulis, X., Sahin, C., Manhardt, F., Tombari, F., Kim, T., prof. Ing. Jiří Matas, Ph.D., Rother, C.
  • Publikace: ECCV2018: Proceedings of the European Conference on Computer Vision, Part X. Springer, Cham, 2018. p. 19-35. Lecture Notes in Computer Vision. vol. 11214. ISSN 0302-9743. ISBN 978-3-030-01248-9.
  • Rok: 2018
  • DOI: 10.1007/978-3-030-01249-6_2
  • Odkaz: https://doi.org/10.1007/978-3-030-01249-6_2
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a benchmark for 6D pose estimation of a rigid object from a single RGB-D input image. The training data consists of a texture-mapped 3D object model or images of the object in known 6D poses. The benchmark comprises of: (i) eight datasets in a unified format that cover different practical scenarios, including two new datasets focusing on varying lighting conditions, (ii) an evaluation methodology with a pose-error function that deals with pose ambiguities, (iii) a comprehensive evaluation of 15 diverse recent methods that captures the status quo of the field, and (iv) an online evaluation system that is open for continuous submission of new results. The evaluation shows that methods based on point-pair features currently perform best, outperforming template matching methods, learning-based methods and methods based on 3D local features. The project website is available at bop.felk.cvut.cz

DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

  • Autoři: Kupyn, O., Budzan, V., Mykhailych, M., Mgr. Dmytro Mishkin, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2018: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018. p. 8183-8192. ISSN 2575-7075. ISBN 978-1-5386-6420-9.
  • Rok: 2018
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present DeblurGAN, an end-to-end learned method for motion deblurring. The learning is based on a conditional GAN and the content loss. DeblurGAN achieves state-of-the art performance both in the structural similarity measure and visual appearance. The quality of the deblurring model is also evaluated in a novel way on a real-world problem - object detection on (de-)blurred images. The method is 5 times faster than the closest competitor - DeepDeblur. We also introduce a novel method for generating synthetic motion blurred images from sharp ones, allowing realistic dataset augmentation. The model, code and the dataset are available https://github.com/KupynOrest/DeblurGAN

Depth Masked Discriminative Correlation Filter

  • Autoři: Kart, U., Kamarainen, J., prof. Ing. Jiří Matas, Ph.D., Fan, L.
  • Publikace: 2018 24rd International Conference on Pattern Recognition (ICPR). Piscataway, NJ: IEEE, 2018. p. 2112-2117. ISSN 1051-4651. ISBN 978-1-5386-3788-3.
  • Rok: 2018
  • DOI: 10.1109/ICPR.2018.8546179
  • Odkaz: https://doi.org/10.1109/ICPR.2018.8546179
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Depth information provides a strong cue for occlusion detection and handling, but has been largely omitted in generic object tracking until recently due to lack of suitable benchmark datasets and applications. In this work, we propose a Depth Masked Discriminative Correlation Filter (DM-DCF) which adopts novel depth segmentation based occlusion detection that stops correlation filter updating and depth masking which adaptively adjusts the spatial support for correlation filter. In Princeton RGBD Tracking Benchmark, our DM-DCF is among the state-of-the-art in overall ranking and the winner on multiple categories. Moreover, since it is based on DCF, "DM-DCF" runs an order of magnitude faster than its competitors making it suitable for time constrained applications.

Detecting decision ambiguity from facial images

  • Autoři: Jahoda, P., Vobecký, A., Ing. Jan Čech, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: FG 2018: Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition. Piscataway: IEEE, 2018. p. 499-503. ISSN 2326-5396. ISBN 978-1-5386-2335-0.
  • Rok: 2018
  • DOI: 10.1109/FG.2018.00080
  • Odkaz: https://doi.org/10.1109/FG.2018.00080
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In situations when potentially costly decisions are being made, faces of people tend to reflect a level of certainty about the appropriateness of the chosen decision. This fact is known from the psychological literature. In the paper, we propose a method that uses facial images for automatic detection of the decision ambiguity state of a subject. To train and test the method, we collected a large-scale dataset from "Who Wants to Be a Millionaire?" -- a popular TV game show. The videos provide examples of various mental states of contestants, including uncertainty, doubts and hesitation. The annotation of the videos is done automatically from on-screen graphics. The problem of detecting decision ambiguity is formulated as binary classification. Video-clips where a contestant asks for help (audience, friend, 50:50) are considered as positive samples; if he (she) replies directly as negative ones. We propose a baseline method combining a deep convolutional neural network with an SVM. The method has an error rate of 24%. The error of human volunteers on the same dataset is 45%, close to chance.

Discriminative Correlation Filter Tracker with Channel and Spatial Reliability

  • DOI: 10.1007/s11263-017-1061-3
  • Odkaz: https://doi.org/10.1007/s11263-017-1061-3
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Short-term tracking is an open and challenging problem for which discriminative correlation filters (DCF) have shown excellent performance. We introduce the channel and spatial reliability concepts to DCF tracking and provide a learning algorithm for its efficient and seamless integration in the filter update and the tracking process. The spatial reliability map adjusts the filter support to the part of the object suitable for tracking. This both allows to enlarge the search region and improves tracking of non-rectangular objects. Reliability scores reflect channel-wise quality of the learned filters and are used as feature weighting coefficients in localization. Experimentally, with only two simple standard feature sets, HoGs and colornames, the novel CSR-DCF method---DCF with channel and spatial reliability---achieves state-of-the-art results on VOT 2016, VOT 2015 and OTB100. The CSR-DCF runs close to real-time on a CPU.

Fast Motion Deblurring for Feature Detection and Matching Using Inertial Measurements

  • Autoři: Mustaniemi, J., Kannala, J., Sarkka, S., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2018 24rd International Conference on Pattern Recognition (ICPR). Piscataway, NJ: IEEE, 2018. p. 3068-3073. ISSN 1051-4651. ISBN 978-1-5386-3788-3.
  • Rok: 2018
  • DOI: 10.1109/ICPR.2018.8546041
  • Odkaz: https://doi.org/10.1109/ICPR.2018.8546041
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Many computer vision and image processing applications rely on local features. It is well-known that motion blur decreases the performance of traditional feature detectors and descriptors. We propose an inertial-based deblurring method for improving the robustness of existing feature detectors and descriptors against the motion blur. Unlike most deblurring algorithms, the method can handle spatially-variant blur and rolling shutter distortion. Furthermore, it is capable of running in real-time contrary to state-of-the-art algorithms. The limitations of inertial-based blur estimation are taken into account by validating the blur estimates using image data. The evaluation shows that when the method is used with traditional feature detector and descriptor, it increases the number of detected keypoints, provides higher repeatability and improves the localization accuracy. We also demonstrate that such features will lead to more accurate and complete reconstructions when used in the application of 3D visual reconstruction.

Graph-Cut RANSAC

  • Autoři: Baráth, D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2018: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018. p. 6733-6741. ISSN 2575-7075. ISBN 978-1-5386-6420-9.
  • Rok: 2018
  • DOI: 10.1109/CVPR.2018.00704
  • Odkaz: https://doi.org/10.1109/CVPR.2018.00704
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A novel method for robust estimation, called Graph-Cut RANSAC, GC-RANSAC in short, is introduced. To separate inliers and outliers, it runs the graph-cut algorithm in the local optimization (LO) step which is applied when a so-far-the-best model is found. The proposed LO step is conceptually simple, easy to implement, globally optimal and efficient. GC-RANSAC is shown experimentally, both on synthesized tests and real image pairs, to be more geometrically accurate than state-of-the-art methods on a range of problems, e.g. line fitting, homography, affine transformation, fundamental and essential matrix estimation. It runs in real-time for many problems at a speed approximately equal to that of the less accurate alternatives (in milliseconds on standard CPU).

ICDAR2017 Robust Reading Challenge on COCO-Text

  • Autoři: Gomez, R., Shi, B., Gomez, L., Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Los Alamitos: IEEE Computer Society, 2018. p. 1435-1443. ISSN 1520-5363. ISBN 978-1-5386-3586-5.
  • Rok: 2018
  • DOI: 10.1109/ICDAR.2017.234
  • Odkaz: https://doi.org/10.1109/ICDAR.2017.234
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This report presents the final results of the ICDAR 2017 Robust Reading Challenge on COCO-Text. A challenge on scene text detection and recognition based on the largest real scene text dataset currently available: the COCO-Text dataset. The competition is structured around three tasks: Text Localization, Cropped Word Recognition and End-To-End Recognition. The competition received a total of 27 submissions over the different opened tasks. This report describes the datasets and the ground truth, details the performance evaluation protocols used and presents the final results along with a brief summary of the participating methods.

Multi-class Model Fitting by Energy Minimization and Mode-Seeking

  • Autoři: Baráth, D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ECCV2018: Proceedings of the European Conference on Computer Vision, Part XVI. Springer, Cham, 2018. p. 229-245. Lecture Notes in Computer Vision. vol. 11220. ISSN 0302-9743. ISBN 978-3-030-01269-4.
  • Rok: 2018
  • DOI: 10.1007/978-3-030-01270-0_14
  • Odkaz: https://doi.org/10.1007/978-3-030-01270-0_14
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose a general formulation, called Multi-X, for multi-class multi-instance model fitting – the problem of interpreting the input data as a mixture of noisy observations originating from multiple instances of multiple classes. We extend the commonly used α-expansion-based technique with a new move in the label space. The move replaces a set of labels with the corresponding density mode in the model parameter domain, thus achieving fast and robust optimization. Key optimization parameters like the bandwidth of the mode seeking are set automatically within the algorithm. Considering that a group of outliers may form spatially coherent structures in the data, we propose a cross-validation-based technique removing statistically insignificant instances. Multi-X outperforms significantly the state-of-the-art on publicly available datasets for diverse problems: multiple plane and rigid motion detection; motion segmentation; simultaneous plane and cylinder fitting; circle and line fitting.

Non-contact reflectance photoplethysmography: Progress, limitations, and myths

  • DOI: 10.1109/FG.2018.00111
  • Odkaz: https://doi.org/10.1109/FG.2018.00111
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Photoplethysmography (PPG) is a non-invasive method of measuring changes of blood volume in human tissue. The literature on non-contact reflectance PPG related to cardiovascular activity is extensively reviewed. We identify key factors limiting the performance of the PPG methods and reproducibility of the research as: a lack of publicly available datasets and incomplete description of data used in published experiments (missing details on video compression, lighting setup and subject’s skin type), use of unreliable pulse oximeter devices for ground-truth reference and missing standard experimental protocols. Two experiments with 5 participants are presented showing that the quality of the reconstructed signal (1) is adversely affected by a reduction of spatial resolution that also amplifies the effects of H.264 video compression and (2) is improved by precise pixel-to-pixel stabilization.

Plant Recognition by Inception Networks with Test-time Class Prior Estimation

  • Autoři: Šulc, M., Picek, L., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum. CEUR-WS.org, 2018. vol. 2125. ISSN 1613-0073.
  • Rok: 2018
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper describes an automatic system for recognition of 10,000 plant species from one or more images. The system finished 1st in the ExpertLifeCLEF 2018 plant identification challenge with 88.4% accuracy and performed better than 5 of the 9 participating plant identification experts. The system is based on the Inception-ResNet-v2 and Inception-v4 Convolutional Neural Network (CNN) architectures. Performance improvements were achieved by: adjusting the CNN predictions according to the estimated change of the class prior probabilities, replacing network parameters with their running averages, and test-time data augmentation.

Repeatability Is Not Enough: Learning Affine Regions via Discriminability

  • Autoři: Mgr. Dmytro Mishkin, Ph.D., Radenović, F., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ECCV2018: Proceedings of the European Conference on Computer Vision, Part IX. Springer, Cham, 2018. p. 287-304. Lecture Notes in Computer Vision. vol. 11213. ISSN 0302-9743. ISBN 978-3-030-01239-7.
  • Rok: 2018
  • DOI: 10.1007/978-3-030-01240-3_18
  • Odkaz: https://doi.org/10.1007/978-3-030-01240-3_18
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A method for learning local affine-covariant regions is presented. We show that maximizing geometric repeatability does not lead to local regions, a.k.a features, that are reliably matched and this necessitates descriptor-based learning. We explore factors that influence such learning and registration: the loss function, descriptor type, geometric parametrization and the trade-off between matchability and geometric accuracy and propose a novel hard negative-constant loss function for learning of affine regions. The affine shape estimator – AffNet – trained with the hard negative-constant loss outperforms the state-of-the-art in bag-of-words image retrieval and wide baseline stereo. The proposed training process does not require precisely geometrically aligned patches. The source codes and trained weights are available at https://github.com/ducha-aiki/affnet

Tracking and Re-Identification System for Multiple Laboratory Animals

  • Autoři: Naiser, F., Šmíd, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2018 VAIB (ICPR2018 Workshop). Edinburgh: University of Edinburg, 2018.
  • Rok: 2018
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present a tracking system for ecology and biology researchers suitable for movement and interaction analysis of multiple animals in laboratory conditions. On the input is a single video with multiple animals and the outputs are animal trajectories. The system is agnostic with regard to animal species. It can adapt to a new animal appearance automatically without annotation. For animal re-identification we use discriminatively trained CNN embedding. The system was tested on sequences with multiple ants, zebrafish and sowbugs.

Visual Heart Rate Estimation with Convolutional Neural Network

  • Pracoviště: Skupina vizuálního rozpoznávání, Strojové učení
  • Anotace:
    We propose a novel two-step convolutional neural network to estimate a heart rate from a sequence of facial images. The network is trained end-to-end by alternating op- timization and validated on three publicly available datasets yielding state-of-the-art results against three baseline methods. The network performs better by a 40% margin to the state-of-the-art method on a newly collected dataset. A challenging dataset of 204 fitness-themed videos is introduced. The dataset is designed to test the robustness of heart rate estimation methods to illumination changes and subject’s motion. 17 subjects perform 4 activities (talking, rowing, exercising on a stationary bike and an elliptical trainer) in 3 lighting setups. Each activity is captured by two RGB web-cameras, one is placed on a tripod, the other is attached to the fitness machine which vibrates significantly. Subject’s age ranges from 20 to 53 years, the mean heart rate is ≈ 110, the standard deviation ≈ 25.

Deep structured-output regression learning for computational color constancy

  • Autoři: Qian, Y., Chen, K., Kamarainen, J.-K., Nikkanen, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2016 23rd International Conference on Pattern Recognition (ICPR). Institute of Electrical and Electronics Engineers, 2017. p. 1899-1904. ISSN 1051-4651. ISBN 978-1-5090-4847-2.
  • Rok: 2017
  • DOI: 10.1109/ICPR.2016.7899914
  • Odkaz: https://doi.org/10.1109/ICPR.2016.7899914
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The color constancy problem is addressed by structured-output regression on the values of the fully-connected layers of a convolutional neural network. The AlexNet and the VGG are considered and VGG slightly outperformed AlexNet. Best results were obtained with the first fully-connected 'fc6' layer and with multi-output support vector regression. Experiments on the SFU Color Checker and Indoor Dataset benchmarks demonstrate that our method achieves competitive performance, outperforming the state of the art on the SFU indoor benchmark.

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

  • DOI: 10.1109/ICCV.2017.242
  • Odkaz: https://doi.org/10.1109/ICCV.2017.242
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A method for scene text localization and recognition is proposed. The novelties include: training of both text detection and recognition in a single end-to-end pass, the structure of the recognition CNN and the geometry of its input layer that preserves the aspect of the text and adapts its resolution to the data. The proposed method achieves state-of-the-art accuracy in the end-to-end text recognition on two standard datasets -- ICDAR 2013 and ICDAR 2015, whilst being an order of magnitude faster than competing methods - the whole pipeline runs at $10$ frames per second on an NVidia K80 GPU.

Discriminative Correlation Filter with Channel and Spatial Reliability

  • Autoři: Lukežic, A.L., Ing. Tomáš Vojíř, Ph.D., Cehovin Zajc, L.C.Z., prof. Ing. Jiří Matas, Ph.D., Kristan, M.K.
  • Publikace: CVPR 2017: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, 2017. p. 4847-4856. ISSN 1063-6919. ISBN 978-1-5386-0457-1.
  • Rok: 2017
  • DOI: 10.1109/CVPR.2017.515
  • Odkaz: https://doi.org/10.1109/CVPR.2017.515
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Short-term tracking is an open and challenging problem for which discriminative correlation filters (DCF) have shown excellent performance. We introduce the channel and spatial reliability concepts to DCF tracking and provide a novel learning algorithm for its efficient and seamless integration in the filter update and the tracking process. The spatial reliability map adjusts the filter support to the part of the object suitable for tracking. This allows tracking of non-rectangular objects as well as extending the search region. Channel reliability reflects the quality of the learned filter and it is used as a feature weighting coefficient in localization. Experimentally, with only two simple standard features, HOGs and Colornames, the novel CSR-DCF method – DCF with Channel and Spatial Reliability – achieves state-of-the-art results on VOT 2016, VOT 2015 and OTB. The CSR-DCF runs in real-time on a CPU.

Fine-grained recognition of plants from images

  • DOI: 10.1186/s13007-017-0265-4
  • Odkaz: https://doi.org/10.1186/s13007-017-0265-4
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Background Fine-grained recognition of plants from images is a challenging computer vision task, due to the diverse appearance and complex structure of plants, high intra-class variability and small inter-class differences. We review the state-of-the-art and discuss plant recognition tasks, from identification of plants from specific plant organs to general plant recognition “in the wild”. Results We propose texture analysis and deep learning methods for different plant recognition tasks. The methods are evaluated and compared them to the state-of-the-art. Texture analysis is only applied to images with unambiguous segmentation (bark and leaf recognition), whereas CNNs are only applied when sufficiently large datasets are available. The results provide an insight in the complexity of different plant recognition tasks. The proposed methods outperform the state-of-the-art in leaf and bark classification and achieve very competitive results in plant recognition “in the wild”. Conclusions The results suggest that recognition of segmented leaves is practically a solved problem, when high volumes of training data are available. The generality and higher capacity of state-of-the-art CNNs makes them suitable for plant recognition “in the wild” where the views on plant organs or plants vary significantly and the difficulty is increased by occlusions and background clutter.

In the Saddle: Chasing fast and repeatable features

  • DOI: 10.1109/ICPR.2016.7899712
  • Odkaz: https://doi.org/10.1109/ICPR.2016.7899712
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A novel similarity-covariant feature detector that extracts points whose neighborhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Experiments show that the Saddle features are general, evenly spread and appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets.

Inertial-Based Scale Estimation for Structure from Motion on Mobile Devices

  • Autoři: Mustaniemi, J., Kannala, J., Särkkä, S., prof. Ing. Jiří Matas, Ph.D., Heikkilä, J.
  • Publikace: Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on. Piscataway: IEEE, 2017. p. 4394-4401. ISSN 2153-0866. ISBN 978-1-5386-2682-5.
  • Rok: 2017
  • DOI: 10.1109/IROS.2017.8206303
  • Odkaz: https://doi.org/10.1109/IROS.2017.8206303
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Structure from motion algorithms have an inherent limitation that the reconstruction can only be determined up to the unknown scale factor. Modern mobile devices are equipped with an inertial measurement unit (IMU), which can be used for estimating the scale of the reconstruction. We propose a method that recovers the metric scale given inertial measurements and camera poses. In the process, we also perform a temporal and spatial alignment of the camera and the IMU. Therefore, our solution can be easily combined with any existing visual reconstruction software. The method can cope with noisy camera pose estimates, typically caused by motion blur or rolling shutter artifacts, via utilizing a Rauch-Tung-Striebel (RTS) smoother. Furthermore, the scale estimation is performed in the frequency domain, which provides more robustness to inaccurate sensor time stamps and noisy IMU samples than the previously used time domain representation. In contrast to previous methods, our approach has no parame- ters that need to be tuned for achieving a good performance. In the experiments, we show that the algorithm outperforms the state-of-the-art in both accuracy and convergence speed of the scale estimate. The accuracy of the scale is around 1% from the ground truth depending on the recording. We also demonstrate that our method can improve the scale accuracy of the Project Tango’s build-in motion tracking.

Learning with Noisy and Trusted Labels for Fine-Grained Plant Recognition

  • Autoři: Šulc, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum. Aachen: CEUR Workshop Proceedings, 2017. vol. 1866. ISSN 1613-0073.
  • Rok: 2017
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper describes the deep learning approach to automatic visual recognition of 10 000 plant species submitted to the PlantCLEF 2017 challenge. We evaluate modifications and extensions of the state-ofthe-art Inception-ResNet-v2 CNN architecture, including maxout, bootstrapping for training with noisy labels, and filtering the data with noisy labels using a classifier pre-trained on the trusted dataset. The final pipeline consists of a set of CNNs trained with different modifications on different subsets of the provided training data. With the proposed approach, we were ranked as the third best team in the LifeCLEF 2017 challenge.

Recurrent Color Constancy

  • Autoři: Qian, Y., Chen, K., Nikkanen, J., Kämäräinen, J.-K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2017 IEEE International Conference on Computer Vision (ICCV 2017). Piscataway: IEEE, 2017. p. 5459-5467. ISSN 1550-5499. ISBN 978-1-5386-1032-9.
  • Rok: 2017
  • DOI: 10.1109/ICCV.2017.582
  • Odkaz: https://doi.org/10.1109/ICCV.2017.582
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We introduce a novel formulation of temporal color constancy which considers multiple frames preceding the frame for which illumination is estimated. We propose an endto-end trainable recurrent color constancy network – the RCC-Net – which exploits convolutional LSTMs and a simulated sequence to learn compositional representations in space and time. We use a standard single frame color constancy benchmark, the SFU Gray Ball Dataset, which can be adapted to a temporal setting. Extensive experiments show that the proposed method consistently outperforms single-frame state-of-the-art methods and their temporal variants.

Rolling Shutter Camera Synchronization with Sub-millisecond Accuracy

  • Autoři: Šmíd, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Madeira: SciTePress, 2017. p. 238-245. vol. 4. ISBN 978-989-758-225-7.
  • Rok: 2017
  • DOI: 10.5220/0006175402380245
  • Odkaz: https://doi.org/10.5220/0006175402380245
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A simple method for synchronization of video streams with a precision better than one millisecond is proposed. The method is applicable to any number of rolling shutter cameras and when a few photographic flashes or other abrupt lighting changes are present in the video. The approach exploits the rolling shutter sensor property that every sensor row starts its exposure with a small delay after the onset of the previous row. The cameras may have different frame rates and resolutions, and need not have overlapping fields of view. The method was validated on five minutes of four streams from an ice hockey match. The found transformation maps events visible in all cameras to a reference time with a standard deviation of the temporal error in the range of 0.3 to 0.5 milliseconds. The quality of the synchronization is demonstrated on temporally and spatially overlapping images of a fast moving puck observed in two cameras.

Spotting Facial Micro-Expressions “In the Wild”

  • Autoři: Husák, P., Ing. Jan Čech, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of the 22nd Computer Vision Winter Workshop. Wien: Pattern Recognition & Image Processing Group, Vienna University of Technology, 2017. ISBN 978-3-200-04969-7.
  • Rok: 2017
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Micro-expressions are quick facial motions, appearing in high stake and stressful situations typically when a subject tries to hide his or her emotions. Two attributes are present - fast duration and low intensity. A simple detection method is proposed, which determines instants of micro-expressions in a video. The method is based on analyzing image intensity differences over a registered face sequence. The specific pattern is detected by an SVM classifier. The results are evaluated on standard microexpression datasets SMIC-E and CASMEII. The proposed method outperformed competing methods in detection accuracy. Further, we collected a new real micro-expression dataset of mostly poker game videos downloaded from YouTube. We achieved average cross-validation AUC 0.88 for the SMIC, and 0.81 on the new challenging “in the Wild” database.

Systematic Evaluation of Convolution Neural Network Advances on the ImageNet

  • DOI: 10.1016/j.cviu.2017.05.007
  • Odkaz: https://doi.org/10.1016/j.cviu.2017.05.007
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper systematically studies the impact of a range of recent advances in convolution neural network (CNN) architectures and learning methods on the object categorization (ILSVRC) problem. The evaluation tests the influence of the following choices of the architecture: non-linearity (ReLU, ELU, maxout, compatability with batch normalization), pooling variants (stochastic, max, average, mixed), network width, classifier design (convolutional, fully-connected, SPP), image pre-processing, and of learning parameters: learning rate, batch size, cleanliness of the data, etc. The performance gains of the proposed modifications are first tested individually and then in combination. The sum of individual gains is greater than the observed improvement when all modifications are introduced, but the “deficit” is small suggesting independence of their benefits. We show that the use of 128 × 128 pixel images is sufficient to make qualitative conclusions about optimal network structure that hold for the full size Caffe and VGG nets. The results are obtained an order of magnitude faster than with the standard 224 pixel images.

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects

  • DOI: 10.1109/WACV.2017.103
  • Odkaz: https://doi.org/10.1109/WACV.2017.103
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.

The Visual Object Tracking VOT2017 challenge results

  • Autoři: Kristan, M., Leonardis, A., prof. Ing. Jiří Matas, Ph.D., Felsberg, M., Ing. Tomáš Vojíř, Ph.D., Nosková, J.
  • Publikace: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW 2017). Piscataway, NJ: IEEE, 2017. p. 1949-1972. ISSN 2473-9944. ISBN 978-1-5386-1034-3.
  • Rok: 2017
  • DOI: 10.1109/ICCVW.2017.230
  • Odkaz: https://doi.org/10.1109/ICCVW.2017.230
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative. Results of 51 trackers are presented; many are state-of-the-art published at major computer vision conferences or journals in recent years. The evaluation included the standard VOT and other popular methodologies and a new "real-time" experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The VOT2017 goes beyond its predecessors by (i) improving the VOT public dataset and introducing a separate VOT2017 sequestered dataset, (ii) introducing a realtime tracking experiment and (iii) releasing a redesigned toolkit that supports complex experiments. The dataset, the evaluation kit and the results are publicly available at the challenge website(1).

The World of Fast Moving Objects

  • Autoři: Ing. Denys Rozumnyi, Kotěra, J., Šroubek, F., Novotný, L., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2017: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, 2017. p. 4838-4846. ISSN 1063-6919. ISBN 978-1-5386-0457-1.
  • Rok: 2017
  • DOI: 10.1109/CVPR.2017.514
  • Odkaz: https://doi.org/10.1109/CVPR.2017.514
  • Pracoviště: Katedra kybernetiky, Skupina vizuálního rozpoznávání
  • Anotace:
    The notion of a Fast Moving Object (FMO), i.e. an object that moves over a distance exceeding its size within the exposure time, is introduced. FMOs may, and typically do, rotate with high angular speed. FMOs are very common in sports videos, but are not rare elsewhere. In a single frame, such objects are often barely visible and appear as semitransparent streaks. A method for the detection and tracking of FMOs is proposed. The method consists of three distinct algorithms, which form an efficient localization pipeline that operates successfully in a broad range of conditions. We show that it is possible to recover the appearance of the object and its axis of rotation, despite its blurred appearance. The proposed method is evaluated on a new annotated dataset. The results show that existing trackers are inadequate for the problem of FMO localization and a new approach is required. Two applications of localization, temporal superresolution and highlighting, are presented.

Visual Descriptors in Methods for Video Hyperlinking

  • Autoři: Galuščáková, P, Baťko, M, Ing. Jan Čech, Ph.D., prof. Ing. Jiří Matas, Ph.D., Novák, D, Pecina, P
  • Publikace: ICMR '17 Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. New York: ACM, 2017. p. 294-300. ISBN 978-1-4503-4701-3.
  • Rok: 2017
  • DOI: 10.1145/3078971.3079026
  • Odkaz: https://doi.org/10.1145/3078971.3079026
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In this paper, we survey different state-of-the-art visual processing methods and utilize them in hyperlinking. Visual information, calculated using Features Signatures, SIMILE descriptors and convolutional neural networks (CNN), is utilized as similarity between video frames and used to find similar faces, objects and setting. Visual concepts in frames are also automatically recognized and textual output of the recognition is combined with search based on subtitles and transcripts. All presented experiments were performed in the Search and Hyperlinking 2014 MediaEval task and Video Hyperlinking 2015 TRECVid task.

Visual Language Identification from Facial Landmarks

  • DOI: 10.1007/978-3-319-59129-2_33
  • Odkaz: https://doi.org/10.1007/978-3-319-59129-2_33
  • Pracoviště: Katedra kybernetiky, Skupina vizuálního rozpoznávání, Strojové učení
  • Anotace:
    The automatic Visual Language IDentification (VLID), i.e. a problem of using visual information to identify the language being spoken, using no audio information, is studied. The proposed method employs facial landmarks automatically detected in a video. A convex optimisation problem to find jointly both the discriminative representation (a softhistogram over a set of lip shapes) and the classifier is formulated. A 10-fold cross-validation is performed on dataset consisting of 644 videos collected from youtube.com resulting in accuracy 73% in a pairwise iscrimination between English and French (50% for a chance).Astudy, inwhich 10 videos were used, suggests that the proposed method performs better than average human in discriminating between the languages.

Working hard to know your neighbor's margins: Local descriptor learning loss

  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We introduce a loss for metric learning, which is inspired by the Lowe's matching criterion for SIFT. We show that the proposed loss, that maximizes the distance between the closest positive and closest negative example in the batch, is better than complex regularization methods; it works well for both shallow and deep convolution network architectures. Applying the novel loss to the L2Net CNN architecture results in a compact descriptor named HardNet. It has the same dimensionality as SIFT (128) and shows state-of-art performance in wide baseline stereo, patch verification and instance retrieval benchmarks.

A Novel Performance Evaluation Methodology for Single-Target Trackers

  • Autoři: Kristan, M., prof. Ing. Jiří Matas, Ph.D., Leonardis, A., Ing. Tomáš Vojíř, Ph.D., Pflugfelder, R., Fernandez, G., Nebehay, G., Porikli, F., Cehovin, L.
  • Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2016, 38(11), 2137-2155. ISSN 0162-8828.
  • Rok: 2016
  • DOI: 10.1109/TPAMI.2016.2516982
  • Odkaz: https://doi.org/10.1109/TPAMI.2016.2516982
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper addresses the problem of single-target tracker performance evaluation.We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.

Accurate Closed-form Estimation of Local Affine Transformations Consistent with the Epipolar Geometry

  • Autoři: Barath, D., prof. Ing. Jiří Matas, Ph.D., Hajder, L.
  • Publikace: Proceedings of the British Machine Vision Conference (BMVC) 2016. British Machine Vision Association, 2016. ISBN 1-901725-53-7.
  • Rok: 2016
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    For a pair of images satisfying the epipolar constraint, a method for accurate estimation of local affine transformations is proposed. The method returns the local affine transformation consistent with the epipolar geometry that is closest in the least squares sense to the initial estimate provided by an affine-covariant detector. The minimized L2-norm of the affine matrix elements is found in closed-form. We show that the used norm has an intuitive geometric interpretation. The method, with negligible computational requirements, is validated on publicly available benchmarking datasets and on synthetic data. The accuracy of the local affine transformations is improved for all detectors and all image pairs. Implicitly, precision of the tested feature detectors was compared. The Hessian-Affine detector combined with ASIFT view synthesis was the most accurate.

All you need is a good init

  • Pracoviště: Katedra kybernetiky, Skupina vizuálního rozpoznávání
  • Anotace:
    Layer-sequential unit-variance (LSUV) initialization - a simple method for weight initialization for deep net learning - is proposed. The method consists of the two steps. First, pre-initialize weights of each convolution or inner-product layer with orthonormal matrices. Second, proceed from the first to the final layer, normalizing the variance of the output of each layer to be equal to one. Experiment with different activation functions (maxout, ReLU-family, tanh) show that the proposed initialization leads to learning of very deep nets that (i) produces networks with test accuracy better or equal to standard methods and (ii) is at least as fast as the complex schemes proposed specifically for very deep nets such as FitNets (Romero et al. 2015)) and Highway (Srivastava et al. (2015)). Performance is evaluated on GoogLeNet, CaffeNet, FitNets and Residual nets and the state-of-the-art, or very close to it, is achieved on the MNIST, CIFAR-10/100 and ImageNet datasets.

Detection of bubbles as concentric circular arrangements

  • Autoři: Strokina, N., prof. Ing. Jiří Matas, Ph.D., Eerola, T., Lensu, L., Kälviäinen, H.
  • Publikace: Machine Vision and Applications. 2016, 27(3), 387-396. ISSN 0932-8092.
  • Rok: 2016
  • DOI: 10.1007/s00138-016-0749-7
  • Odkaz: https://doi.org/10.1007/s00138-016-0749-7
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The paper proposes a method for the detection of bubble-like transparent objects in a liquid. The detection problem is non-trivial since bubble appearance varies considerably due to different lighting conditions causing contrast reversal and multiple interreflections. We formulate the problem as the detection of concentric circular arrangements (CCA). The CCAs are recovered in a hypothesize-optimize-verify framework. The hypothesis generation is based on sampling from the partially linked components of the non-maximum suppressed responses of oriented ridge filters, and is followed by the CCA parameter estimation. Parameter optimization is carried out by minimizing a novel cost-function. The performance was tested on gas dispersion images of pulp suspension and oil dispersion images. The mean error of gas/oil volume estimation was used as a performance criterion due to the fact that the main goal of the applications driving the research was the bubble volume estimation. The method achieved 28 and 13 % of gas and oil volume estimation errors correspondingly outperforming the OpenCV Circular Hough Transform in both cases and the WaldBoost detector in gas volume estimation.

Fast L1-Based RANSAC for Homography Estimation

  • Pracoviště: Katedra kybernetiky, Skupina vizuálního rozpoznávání
  • Anotace:
    We revisit the problem of local optimization (LO) in RANSAC for homography estimation. The standard state-of-the-art LO-RANSAC improves the plain version's accuracy and stability, but it may be computationally demanding, it is complex to implement and requires setting multiple parameters. We show that employing L1 minimization instead of the standard LO step of LO-RANSAC leads to results with similar precision. At the same time, the proposed L1 minimization is significantly faster than the standard LO step of [8], it is easy to implement and it has only a few of parameters which all have intuitive interpretation. On the negative side, the L1 minimization does not achieve the robustness of the standard LO step, its probability of failure is higher.

From Dusk till Dawn: Modeling in the Dark

  • Autoři: Radenovič, F., Schönberger, J. L., Ji, D., Frahm, J., prof. Mgr. Ondřej Chum, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2016: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2016. p. 5488-5496. ISSN 1063-6919. ISBN 978-1-4673-8851-1.
  • Rok: 2016
  • DOI: 10.1109/CVPR.2016.592
  • Odkaz: https://doi.org/10.1109/CVPR.2016.592
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Internet photo collections naturally contain a large variety of illumination conditions, with the largest difference between day and night images. Current modeling techniques do not embrace the broad illumination range often leading to reconstruction failure or severe artifacts. We present an algorithm that leverages the appearance variety to obtain more complete and accurate scene geometry along with consistent multi-illumination appearance information. The proposed method relies on automatic scene appearance grouping, which is used to obtain separate dense 3D models. Subsequent model fusion combines the separate models into a complete and accurate reconstruction of the scene. In addition, we propose a method to derive the appearance information for the model under the different illumination conditions, even for scene parts that are not observed under one illumination condition. To achieve this, we develop a cross-illumination color transfer technique. We evaluate our method on a large variety of landmarks from across Europe reconstructed from a database of 7.4M images.

Hessian Interest Points on GPU

  • Pracoviště: Katedra počítačové grafiky a interakce, Skupina vizuálního rozpoznávání
  • Anotace:
    This paper is about interest point detection and GPU programming. We take a popular GPGPU implementation of SIFT - the de-facto standard in fast interest point detectors - SiftGPU and implement modifications that according to recent research result in better performance in terms of repeatability of the detected points. The interest points found at local extrema of the Difference of Gaussians (DoG) function in the original SIFT are replaced by the local extrema of determinant of Hessian matrix of the intensity function. Experimentally we show that the GPU implementation of Hessian-based detector (i) surpasses in repeatability the original DoG-based implementation, (ii) gives result very close to those of a reference CPU implementation, and (iii) is significantly faster than the CPU implementation. We show what speedup is achieved for different image sizes and provide analysis of computational cost of individual steps of the algorithm. The source code is publicly available.

Multi-H: Efficient Recovery of Tangent Planes in Stereo Images

  • Autoři: Barath, D., prof. Ing. Jiří Matas, Ph.D., Hajder, L.
  • Publikace: Proceedings of the British Machine Vision Conference (BMVC) 2016. British Machine Vision Association, 2016. ISBN 1-901725-53-7.
  • Rok: 2016
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Multi-H – an efficient method for the recovery of the tangent planes of a set of point correspondences satisfying the epipolar constraint is proposed. The problem is formulated as a search for a labeling minimizing an energy that includes a data and spatial regularization terms. The number of planes is controlled by a combination of Mean-Shift and a-expansion. Experiments on the fountain-P11 3D dataset show that Multi-H provides highly accurate tangent plane estimates. It also outperforms all state-of-the-art techniques for multi-homography estimation on the publicly available AdelaideRMF dataset. Since Multi-H achieves nearly error-free performance, we introduce and make public a more challenging dataset for multi-plane fitting evaluation.

Multi-view facial landmark detection by using a 3D shape model

  • DOI: 10.1016/j.imavis.2015.11.003
  • Odkaz: https://doi.org/10.1016/j.imavis.2015.11.003
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An algorithm for accurate localization of facial landmarks coupled with a head pose estimation from a single monocular image is proposed. The algorithm is formulated as an optimization problem where the sum of individual landmark scoring functions is maximized with respect to the camera pose by fitting a parametric 3D shape model. The landmark scoring functions are trained by a structured output SVM classifier that takes a distance to the true landmark position into account when learning. The optimization criterion is non-convex and we propose a robust initialization scheme which employs a global method to detect a raw but reliable initial landmark position. Self-occlusions causing landmarks invisibility are handled explicitly by excluding the corresponding contributions from the data term. This allows the algorithm to operate correctly for a large range of viewing angles. Experiments on standard ``in-the-wild'' datasets demonstrate that the proposed algorithm outperforms several state-of-the-art landmark detectors especially for non-frontal face images. The algorithm achieves the average relative landmark localization error below 10% of the interocular distance in 98.3% of the 300W dataset test images.

On Evaluation of 6D Object Pose Estimation

  • DOI: 10.1007/978-3-319-49409-8_52
  • Odkaz: https://doi.org/10.1007/978-3-319-49409-8_52
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A pose of a rigid object has 6 degrees of freedom and its full knowledge is required in many robotic and scene understanding applications. Evaluation of 6D object pose estimates is not straightforward. Object pose may be ambiguous due to object symmetries and occlusions, i.e. there can be multiple object poses that are indistinguishable in the given image and should be therefore treated as equivalent. The paper defines 6D object pose estimation problems, proposes an evaluation methodology and introduces three new pose error functions that deal with pose ambiguity. The new error functions are compared with functions commonly used in the literature and shown to remove certain types of non-intuitive outcomes. Evaluation tools are provided at: https://github.com/thodan/obj_pose_eval

Online adaptive hidden Markov model for multi-tracker fusion

  • DOI: 10.1016/j.cviu.2016.05.007
  • Odkaz: https://doi.org/10.1016/j.cviu.2016.05.007
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    In this paper, we propose a novel method for visual object tracking called HMMTxD. The method fuses observations from complementary out-of-the box trackers and a detector by utilizing a hidden Markov model whose latent states correspond to a binary vector expressing the failure of individual trackers. The Markov model is trained in an unsupervised way, relying on an online learned detector to provide a source of tracker-independent information for a modified Baum- Welch algorithm that updates the model w.r.t. the partially annotated data. We show the effectiveness of the proposed method on combination of two and three tracking algorithms. The performance of HMMTxD is evaluated on two standard benchmarks (CVPR2013 and VOT) and on a rich collection of 77 publicly available sequences. The HMMTxD outperforms the state-of-the-art, often significantly, on all data-sets in almost all criteria.

Real-Time Lexicon-Free Scene Text Localization and Recognition

  • DOI: 10.1109/TPAMI.2015.2496234
  • Odkaz: https://doi.org/10.1109/TPAMI.2015.2496234
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    An end-to-end real-time text localization and recognition method is presented. Its real-time performance is achieved by posing the character detection and segmentation problem as an efficient sequential selection from the set of Extremal Regions. The ER detector is robust against blur, low contrast and illumination, color and texture variation. In the first stage, the probability of each ER being a character is estimated using features calculated by a novel algorithm in constant time and only ERs with locally maximal probability are selected for the second stage, where the classification accuracy is improved using computationally more expensive features. A highly efficient clustering algorithm then groups ERs into text lines and an OCR classifier trained on synthetic fonts is exploited to label character regions. The most probable character sequence is selected in the last stage when the context of each character is known. The method was evaluated on three public datasets. On the ICDAR 2013 dataset the method achieves state-of-the-art results in text localization; on the more challenging SVT dataset, the proposed method significantly outperforms the state-of-the-art methods and demonstrates that the proposed pipeline can incorporate additional prior knowledge about the detected text. The proposed method was exploited as the baseline in the ICDAR 2015 Robust Reading competition, where it compares favourably to the state-of-the art.

Significance of Colors in Texture Datasets

  • Autoři: Šulc, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of the 21st Computer Vision Winter Workshop. Ljubljana: Slovenian Pattern Recognition Society, 2016. ISBN 978-961-90901-7-6.
  • Rok: 2016
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This paper studies the significance of color in eight publicly available datasets commonly used for texture recognition through the classification results of "pure-color" and "pure-texture" (color-less) descriptors. The datasets are described using the state-of-the-art color descriptors, Discriminative Color Descriptors (DD) and Color Names (CN). The descriptors are based on partitioning of the color space into clusters and assigning the image probabilities of belonging to individual clusters. We propose a simple extension of the DD and the CN descriptors, adding the standard deviations of color cluster probabilities into the descriptor. The extension leads to a significant improvement in recognition rates on all datasets. On all datasets the 22-dimensional improved CN$^sigma$ descriptor outperforms all original 11-, 25- and 50-dimensional descriptors. Linear combination of the state-of-the-art "pure-texture" classifier with the CN$^sigma$ classifier improves the results on all datasets.

Structured Output SVM Prediction of Apparent Age, Gender and Smile From Deep Features

  • Autoři: Uřičář, M., Timofte, R., Rothe, R., prof. Ing. Jiří Matas, Ph.D., Van Gool, L.
  • Publikace: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. Piscataway (New Jersey): IEEE, 2016. pp. 730-738. ISSN 2160-7508. ISBN 978-1-5090-1437-8.
  • Rok: 2016
  • DOI: 10.1109/CVPRW.2016.96
  • Odkaz: https://doi.org/10.1109/CVPRW.2016.96
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose structured output SVM for predicting the apparent age as well as gender and smile from a single face image represented by deep features. We pose the problem of apparent age estimation as an instance of the multi-class structured output SVM classifier followed by a softmax expected value refinement. The gender and smile predictions are treated as binary classification problems. The proposed solution first detects the face in the image and then extracts deep features from the cropped image around the detected face. We use a convolutional neural network with VGG-16 architecture [25] for learning deep features. The network is pretrained on the ImageNet [24] database and then fine-tuned on IMDB-WIKI [21] and ChaLearn 2015 LAP datasets [8]. We validate our methods on the ChaLearn 2016 LAP dataset [9]. Our structured output SVMs are trained solely on ChaLearn 2016 LAP data. We achieve excellent results for both apparent age prediction and gender and smile classification.

Texture-Independent Long-Term Tracking Using Virtual Corners

  • Autoři: Lebeda, K., Hadfield, S., prof. Ing. Jiří Matas, Ph.D., Bowden, R.
  • Publikace: IEEE Transactions on Image Processing. 2016, 25(1), 359-371. ISSN 1057-7149.
  • Rok: 2016
  • DOI: 10.1109/TIP.2015.2497141
  • Odkaz: https://doi.org/10.1109/TIP.2015.2497141
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Long term tracking of an object, given only a single instance in an initial frame, remains an open problem. We propose a visual tracking algorithm, robust to many of the difficulties which often occur in real-world scenes. Correspondences of edge-based features are used, to overcome the reliance on the texture of the tracked object and improve invariance to lighting. Furthermore we address long-term stability, enabling the tracker to recover from drift and to provide redetection following object disappearance or occlusion. The two-module principle is similar to the successful state-of-the-art long-term TLD tracker, however our approach offers better performance in benchmarks and extends to cases of low-textured objects. This becomes obvious in cases of plain objects with no texture at all, where the edge-based approach proves the most beneficial. We perform several different experiments to validate the proposed method. Firstly, results on short-term sequences show the performance of tracking challenging (low-textured and/or transparent) objects which represent failure cases for competing state-of-the-art approaches. Secondly, long sequences are tracked, including one of almost 30,000 frames which to our knowledge is the longest tracking sequence reported to date. This tests the re-detection and drift resistance properties of the tracker. Finally, we report results of the proposed tracker on the VOT Challenge 2013 and 2014 datasets as well as on the VTB1.0benchmark and we show relative performance of the tracker compared to its competitors. All the results are comparable to the state-of-the-art on sequences with textured objects and superior on non-textured objects. The new annotated sequences are made publicly available.

The thermal infrared visual object tracking VOT-TIR2016 challenge results

  • Autoři: Felsberg, M., Kristan, M., prof. Ing. Jiří Matas, Ph.D., Leonardis, A., Ing. Tomáš Vojíř, Ph.D.,
  • Publikace: Computer Vision – ECCV 2016 Workshops, Part II. Cham: Springer International Publishing, 2016. pp. 824-849. Lecture Notes in Computer Science. vol. 9914. ISSN 0302-9743. ISBN 978-3-319-48880-6.
  • Rok: 2016
  • DOI: 10.1007/978-3-319-48881-3_55
  • Odkaz: https://doi.org/10.1007/978-3-319-48881-3_55
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Thermal Infrared Visual Object Tracking challenge 2016, VOT-TIR2016, aims at comparing short-term single-object visual trackers that work on thermal infrared (TIR) sequences and do not apply pre-learned models of object appearance. VOT-TIR2016 is the second benchmark on short-term tracking in TIR sequences. Results of 24 trackers are presented. For each participating tracker, a short description is provided in the appendix. The VOT-TIR2016 challenge is similar to the 2015 challenge, the main difference is the introduction of new, more difficult sequences into the dataset. Furthermore, VOT-TIR2016 evaluation adopted the improvements regarding overlap calculation in VOT2016. Compared to VOT-TIR2015, a significant general improvement of results has been observed, which partly compensate for the more difficult sequences. The dataset, the evaluation kit, as well as the results are publicly available at the challenge website.

The visual object tracking VOT2016 challenge results

  • Autoři: Kristan, M., Leonardis, A., prof. Ing. Jiří Matas, Ph.D., Felsberg, M., Ing. Tomáš Vojíř, Ph.D.,
  • Publikace: Computer Vision – ECCV 2016 Workshops, Part II. Cham: Springer International Publishing, 2016. pp. 777-823. Lecture Notes in Computer Science. vol. 9914. ISSN 0302-9743. ISBN 978-3-319-48880-6.
  • Rok: 2016
  • DOI: 10.1007/978-3-319-48881-3_54
  • Odkaz: https://doi.org/10.1007/978-3-319-48881-3_54
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art trackers makes the VOT 2016 the largest and most challenging benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the Appendix. The VOT2016 goes beyond its predecessors by (i) introducing a new semi-automatic ground truth bounding box annotation methodology and (ii) extending the evaluation system with the no-reset experiment. The dataset, the evaluation kit as well as the results are publicly available at the challenge website (http: //votchallenge.net).

Very Deep Residual Networks with MaxOut for Plant Identification in the Wild

  • Autoři: Šulc, M., Mgr. Dmytro Mishkin, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum. Aachen: CEUR Workshop Proceedings, 2016. pp. 579-586. CEUR Workshop Proceedings. vol. 1609. ISSN 1613-0073.
  • Rok: 2016
  • Pracoviště: Katedra kybernetiky, Skupina vizuálního rozpoznávání
  • Anotace:
    The paper presents our deep learning approach to automatic recognition of plant species from photos. We utilized a very deep 152-layer residual network model pre-trained on ImageNet, replaced the original fully connected layer with two randomly initialized fully connected layers connected with maxout, and fine-tuned the network on the PlantCLEF 2016 training data. Bagging of 3 networks was used to further improve accuracy. With the proposed approach we scored among the top 3 teams in the PlantCLEF 2016 plant identification challenge.

A machine learning approach to hypothesis decoding in scene text recognition

  • Autoři: Libovicky, J., Ing. Lukáš Neumann, Ph.D., Pecina, P., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision -- ACCV 2014 Workshops. Heidelberg: Springer, 2015, pp. 169-180. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-16630-8. Available from: http://dx.doi.org/10.1007/978-3-319-16631-5_13
  • Rok: 2015
  • DOI: 10.1007/978-3-319-16631-5_13
  • Odkaz: https://doi.org/10.1007/978-3-319-16631-5_13
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Scene Text Recognition (STR) is a task of localizing and transcribing textual information captured in real-word images. With its increasing accuracy, it becomes a new source of textual data for standard Natural Language Processing tasks and poses new problems because of the specific nature of Scene Text. In this paper, we learn a string hypotheses decoding procedure in an STR pipeline using structured prediction methods that proved to be useful in automatic Speech Recognition and Machine Translation. The model allow to employ a wide range of typographical and language features into the decoding process. The proposed method is evaluated on a standard dataset and improves both character and word recognition performance over the baseline.

Cascaded Sparse Spatial Bins for Efficient and Effective Generic Object Detection

  • Autoři: Novotný, D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2015 IEEE International Conference on Computer Vision (ICCV 2015). Piscataway: IEEE, 2015. pp. 1152-1160. ISSN 1550-5499. ISBN 978-1-4673-8391-2.
  • Rok: 2015
  • DOI: 10.1109/ICCV.2015.137
  • Odkaz: https://doi.org/10.1109/ICCV.2015.137
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel efficient method for extraction of object proposals is introduced. Its ”objectness” function exploits deep spatial pyramid features, a novel fast-to-compute HoGbased edge statistic and the EdgeBoxes score [42]. The efficiency is achieved by the use of spatial bins in a novel combination with sparsity-inducing group normalized SVM. State-of-the-art recall performance is achieved on Pascal VOC07, significantly outperforming methods with comparable speed. Interestingly, when only 100 proposals per image are considered the method attains 78% recall on VOC07. The method improves mAP of the RCNN stateof- the-art class-specific detector, increasing it by 10 points when only 50 proposals are used in each image. The system trained on twenty classes performs well on the two hundred class ILSVRC2013 set confirming generalization capability

Detection and Fine 3D Pose Estimation of Texture-less Objects in RGB-D Images

  • Autoři: Hodaň, T., Zabulis, X., Lourakis, M., Ing. Štěpán Obdržálek, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: IROS 2015: Proceedings IEEE/RSJ International Conference on Inteligent Robots and Systems. Los Alamitos: IEEE Computer Society, 2015. p. 4421-4428. ISSN 2153-0858. ISBN 978-1-4799-9994-1.
  • Rok: 2015
  • DOI: 10.1109/IROS.2015.7354005
  • Odkaz: https://doi.org/10.1109/IROS.2015.7354005
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Despite their ubiquitous presence, texture-less objects present significant challenges to contemporary visual object detection and localization algorithms. This paper proposes a practical method for the detection and accurate 3D localization of multiple texture-less and rigid objects depicted in RGB-D images. The detection procedure adopts the sliding window paradigm, with an efficient cascade-style evaluation of each window location. A simple pre-filtering is performed first, rapidly rejecting most locations. For each remaining location, a set of candidate templates (i.e. trained object views) is identified with a voting procedure based on hashing, which makes the method's computational complexity largely unaffected by the total number of known objects. The candidate templates are then verified by matching feature points in different modalities. Finally, the approximate object pose associated with each detected template is used as a starting point for a stochastic optimization procedure that estimates accurate 3D pose. Experimental evaluation shows that the proposed method yields a recognition rate comparable to the state of the art, while its complexity is sub-linear in the number of templates.

Efficient Character Skew Rectification in Scene Text Images

  • Autoři: Bušta, M., Drtina, T., Helekal, D., Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision -- ACCV 2014 Workshops. Heidelberg: Springer, 2015, pp. 134-146. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-16630-8.
  • Rok: 2015
  • DOI: 10.1007/978-3-319-16631-5_10
  • Odkaz: https://doi.org/10.1007/978-3-319-16631-5_10
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We present an efficient method for character skew rectification in scene text images. The method is based on a novel skew estimators, which exploit intuitive glyph properties and which can be efficiently computed in a linear time. The estimators are evaluated on a synthetically generated data (including Latin, Cyrillic, Greek, Runic scripts) and real scene text images, where the skew rectification by the proposed method improves the accuracy of a state-of-the-art scene text recognition pipeline.

Efficient Image Detail Mining

  • Autoři: Mikulík, A., Radenovič, F., prof. Mgr. Ondřej Chum, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ACCV 2014: Proceedings of the 12th Asian Conference on Computer Vision, Part II. Cham: Springer, 2015. p. 118-132. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-16807-4.
  • Rok: 2015
  • DOI: 10.1007/978-3-319-16808-1_9
  • Odkaz: https://doi.org/10.1007/978-3-319-16808-1_9
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Two novel problems straddling the boundary between image retrieval and data min ing are formulated: for every pixel in the query image, (i) find the database image with the maximum resolution depicting the pixel and (ii) find the frequency with which it is photograp hed in detail. An efficient and reliable solution for both problems is proposed based on two novel techniques, the hierarchical query expansion that exploits the document at a time (DAAT ) inverted file and a geometric consistency verification sufficiently robust to prevent topic drift within a zooming search. Experiments show that the proposed method finds surprisingly fine details on landmarks, even those that are hardly noticeable for humans.

Efficient Scene text localization and recognition with local character refinement

  • DOI: 10.1109/ICDAR.2015.7333861
  • Odkaz: https://doi.org/10.1109/ICDAR.2015.7333861
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An unconstrained end-to-end text localization and recognition method is p resented. The method detects initial text hypothesis in a single pass by an efficient reg ion-based method and subsequently refines the text hypothesis using a more robust local t ext model, which deviates from the common assumption of region-based methods that all cha racters are detected as connected components.

Efficient Texture-less Object Detection for Augmented Reality Guidance

  • Autoři: Hodaň, T., Damen, D., Mayol-Cuevas, W., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ISMARW 2015: Proceedings IEEE International Symposium on Mixed and Augmented Reality Workshops. Los Alamitos: IEEE Computer Society, 2015. p. 81-86. ISBN 978-1-4673-8471-1.
  • Rok: 2015
  • DOI: 10.1109/ISMARW.2015.23
  • Odkaz: https://doi.org/10.1109/ISMARW.2015.23
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Real-time scalable detection of texture-less objects in 2D images is a highly relevant task for augmented reality applications such as assembly guidance. The paper presents a purely edge-based method based on the approach of Damen et al. (2012). The proposed method exploits the recent structured edge detector by Dollar and Zitnick (2013), which uses supervised examples for improved object outline detection. It was experimentally shown to yield consistently better results than the standard Canny edge detector. The work has identified two other areas of improvement over the original method; proposing a Hough-based tracing, bringing a speed-up of more than 5 times, and a search for edgelets in stripes instead of wedges, achieving improved performance especially at lower rates of false positives per image. Experimental evaluation proves the proposed method to be faster and more robust. The method is also demonstrated to be suitable to support an augmented reality application for assembly guidance.

Fast Features Invariant to Rotation and Scale of Texture

  • Autoři: Šulc, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision - ECCV 2014 Workshops, Part II. Cham: Springer, 2015. pp. 47-62. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-16180-8.
  • Rok: 2015
  • DOI: 10.1007/978-3-319-16181-5_4
  • Odkaz: https://doi.org/10.1007/978-3-319-16181-5_4
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A family of novel texture representations called Ffirst, the Fast Features Invariant to Rotation and Scale of Texture, is introduced. New rotation invariants are proposed, extending the LBP-HF features, improving the recognition accuracy. Using the full set of LBP features, as opposed to uniform only, leads to further improvement. Linear Support Vector Machines with an approximate chi2 kernel map are used for fast and precise classification. Experimental results show that Ffirst exceeds the best reported results in texture classification on three difficult datasets KTH-TIPS2a, KTH-TIPS2b and ALOT, achieving 88%, 76% and 96% accuracy respectively. The recognition rates are above 99% on standard texture datasets KTH-TIPS, Brodatz32, UIUCTex, UMD, CUReT.

FASText: Efficient Unconstrained Scene Text Detector

  • DOI: 10.1109/ICCV.2015.143
  • Odkaz: https://doi.org/10.1109/ICCV.2015.143
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a novel easy-to-implement stroke detector based on an efficie nt pixel intensity comparison to surrounding pixels. Stroke-specific keypoints are efficiently detected and text fragments are subsequently extracted by local thresholding guided by keypoint properties. Classification based on effectively calculated features then eliminates non-text regions.

ICDAR 2015 competition on Robust Reading

  • Autoři: Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., prof. Ing. Jiří Matas, Ph.D., Ing. Lukáš Neumann, Ph.D., Chandrasekhar, V., Lu, S., Shafait, F., Uchida, S.
  • Publikace: Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. Piscataway: IEEE, 2015. pp. 1156-1160. ISSN 1520-5363. ISBN 978-1-4799-1805-8.
  • Rok: 2015
  • DOI: 10.1109/ICDAR.2015.7333942
  • Odkaz: https://doi.org/10.1109/ICDAR.2015.7333942
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Results of the ICDAR 2015 Robust Reading Competition are presented. A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Im ages, Focused Scene Images and Video Text. Challenge 4 is run on a newly acquired dataset of 1,670 images evaluating Text Localisation, Word Recognition and End-to-End pipelines. In addition, the dataset for Challenge 3 on Video Text has been substantially updated wi th more video sequences and more accurate ground truth data. Finally, tasks assessing End -to-End system performance have been introduced to all Challenges. The competition took p lace in the first quarter of 2015, and received a total of 44 submissions. Only the tasks newly introduced in 2015 are reported on. The datasets, the ground truth specification a nd the evaluation protocols are presented together with the results and a brief summary o f the participating methods.

MODS: Fast and robust method for two-view matching

  • DOI: 10.1016/j.cviu.2015.08.005
  • Odkaz: https://doi.org/10.1016/j.cviu.2015.08.005
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Abstract A novel algorithm for wide-baseline matching called MODS - matching on demand with view synthesis - is presented. The MODS algorithm is experimentally shown to solve a broader range of wide-baseline problems than the state of the art while being nearly as fast as standard matchers on simple problems. The apparent robustness vs. speed trade-off is finessed by the use of progressively more time-consuming feature detectors and by on-demand generation of synthesized images that is performed until a reliable estimate of geometry is obtained. We introduce an improved method for tentative correspondence selection, applicable both with and without view synthesis. A modification of the standard first to second nearest distance rule increases the number of correct matches by 5-20% at no additional computational cost. Performance of the MODS algorithm is evaluated on several standard publicly available datasets, and on a new set of geometrically challenging wide baseline problems that is made public together with the ground truth. Experiments show that the MODS outperforms the state-of-the-art in robustness and speed. Moreover, MODS performs well on other classes of difficult two-view problems like matching of images from different modalities, with wide temporal baseline or with significant lighting changes.

Place Recognition with WxBS Retrieval

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We present a novel visual place recognition method designed for operation in challenging conditions such as encountered in day to night or winter to summer matching. The proposed WxBS Retrieval method is novel in enriching a bag of words approach with the use of multiple detectors, descriptors with suitable visual vocabularies, view synthesis, and adaptive thresholding to compensate for large variations in contrast and richness of features in different conditions. The performance of the method evaluated on the public Visual Place Recognition in Changing Environments (VPRiCE) dataset was achieved with precision 0.689 and recall 0.798 and F1-score 0.740. The precision and F1 score are best results so far reported for VPRiCE dataset. Experiments show that the combination of retrieval and matching algorithms with detectors and descriptors insensitive to gradient reversal and contrast lead to both high accuracy and scalability.

Sharing local information in scanning-window detection

  • Autoři: Pokorný, J., Trefný, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVWW 2015: Proceedings of the 20th Computer Vision Winter Workshop. Graz: Graz University of Technology, 2015. pp. 107-113. ISBN 978-3-85125-388-7.
  • Rok: 2015
  • DOI: 10.3217/978-3-85125-388-7
  • Odkaz: https://doi.org/10.3217/978-3-85125-388-7
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    WaldBoost algorithm is a state-of-the-art method for object detection due to its high detection accuracy and real-time speed. However, since the scanning window procedure does not make use of information shared among overlapping windows, there is still a possibility of a significant speed-up by exploiting this property. Zemcik et al. recently proposed to use a second classifier to suppress the neighboring positions with a negligible computational overhead. In this paper we improve upon the work of Zemcık et al. and show that with an improved scanning strategy and predictor selection we outperform it in both geometric accuracy as well as detection rate on the FDDB dataset for face detec- tion, while achieving the same or a higher speed-up

Texture-Based Leaf Identification

  • Autoři: Šulc, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision - ECCV 2014 Workshops, Part IV. Cham: Springer, 2015, pp. 185-200. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-16219-5.
  • Rok: 2015
  • DOI: 10.1007/978-3-319-16220-1_14
  • Odkaz: https://doi.org/10.1007/978-3-319-16220-1_14
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel approach to visual leaf identification is proposed. A leaf is represented by a pair of local feature histograms, one computed from the leaf interior, the other from the border. The histogrammed local features are an improved version of a recently proposed rotation and scale invariant descriptor based on local binary patterns (LBPs). Describing the leaf with multi-scale histograms of rotationally invariant features derived from sign- and magnitude-LBP provides a desirable level of invariance. The representation does not use colour. Using the same parameter settings in all experiments and standard evaluation protocols, the method outperforms the state-of-the-art on all tested leaf sets - the Austrian Federal Forests dataset, the Flavia dataset, the Foliage dataset, the Swedish dataset and the Middle European Woods dataset - achieving excellent recognition rates above 99%. Preliminary results on images from the north and south regions of France obtained from the LifeCLEF'14 Plant task dataset indicate that the proposed method is also applicable to recognizing the environmental conditions the plant has been exposed to.

The Thermal Infrared Visual Object Tracking VOT-TIR2015 Challenge Results

  • Autoři: Felsberg, M., Berg, A., Hager, G., Ahlberg, J., prof. Ing. Jiří Matas, Ph.D., Ing. Tomáš Vojíř, Ph.D.,
  • Publikace: The IEEE International Conference on Computer Vision (ICCV) Workshops. New York: IEEE Computer Society Press, 2015. pp. 639-651. ISSN 1550-5499. ISBN 978-1-4673-8390-5.
  • Rok: 2015
  • DOI: 10.1109/ICCVW.2015.86
  • Odkaz: https://doi.org/10.1109/ICCVW.2015.86
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Thermal Infrared Visual Object Tracking challenge 2015, VOT-TIR2015, aims at comparing short-term single-object visual trackers that work on thermal infrared (TIR) sequences and do not apply pre-learned models of object appearance. VOT-TIR2015 is the first benchmark on short-term tracking in TIR sequences. Results of 24 trackers are presented. For each participating tracker, a short description is provided in the appendix. The VOT-TIR2015 challenge is based on the VOT2013 challenge, but introduces the following novelties: (i) the newly collected LTIR (Linkoping TIR) dataset is used, (ii) the VOT2013 attributes are adapted to TIR data, (iii) the evaluation is performed using insights gained during VOT2013 and VOT2014 and is similar to VOT2015.

The Visual Object Tracking VOT2014 Challenge Results

  • Autoři: Kristan, M., Pflugfelder, R., Leonardis, A., prof. Ing. Jiří Matas, Ph.D., Cehovin, L., Nebehay, G., Ing. Tomáš Vojíř, Ph.D., Fernandez, G.
  • Publikace: Computer Vision - ECCV 2014 Workshops, Part II. Cham: Springer, 2015. pp. 191-217. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-16180-8.
  • Rok: 2015
  • DOI: 10.1007/978-3-319-16181-5_14
  • Odkaz: https://doi.org/10.1007/978-3-319-16181-5_14
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The Visual Object Tracking challenge 2014, VOT2014, aims at comparing sho rt-term single-object visual trackers that do not apply pre-learned models of object appea rance. Results of 38 trackers are presented. The number of tested trackers makes VOT 2014 the largest benchmark on short-term tracking to date. For each participating tracker, a sh ort description is provided in the appendix. Features of the VOT2014 challenge that go bey ond its VOT2013 predecessor are introduced: (i) a new VOT2014 dataset with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2 013 evaluation methodology, (iii) a new unit for tracking speed assessment less dependent on the hardware and (iv) the VOT2014 evaluation toolkit that significantly speeds up execu tion of experiments. The dataset, the evaluation kit as well as the results are publicly a vailable at the challenge website (http://www.votchallenge.net/).

The Visual Object Tracking VOT2015 challenge results

  • Autoři: Kristan, M., prof. Ing. Jiří Matas, Ph.D., Leonardis, A., Felsberg, M., Ing. Tomáš Vojíř, Ph.D.,
  • Publikace: The IEEE International Conference on Computer Vision (ICCV) Workshops. New York: IEEE Computer Society Press, 2015. pp. 564-586. ISSN 1550-5499. ISBN 978-1-4673-8390-5.
  • Rok: 2015
  • DOI: 10.1109/ICCVW.2015.79
  • Odkaz: https://doi.org/10.1109/ICCVW.2015.79
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 62 trackers are presented. The number of tested trackers makes VOT 2015 the largest benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2015 challenge that go beyond its VOT2014 predecessor are: (i) a new VOT2015 dataset twice as large as in VOT2014 with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2014 evaluation methodology by introduction of a new performance measure. The dataset, the evaluation kit as well as the results are publicly available at the challenge website(1).

Towards Visual Words to Words Text Detection with a General Bag of Words Representation

  • DOI: 10.1109/ICDAR.2015.7333840
  • Odkaz: https://doi.org/10.1109/ICDAR.2015.7333840
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We address the problem of text localization and retrieval in real world images. We are first to study the retrieval of text images, i.e. the selection of images containing text in large collections at high speed. We propose a novel representation, textual visual words, which describe text by generic visual words that geometrically consistently predict bottom and top lines of text. The visual words are discretized SIFT descriptors of Hessian features. The features may correspond to various structures present in the text - character fragments, individual characters or their arrangements. The textual words representation is invariant to affine transformation of the image and local linear change of intensity. Experiments demonstrate that the proposed method outperforms the state-of-the-art on the MS dataset. The proposed method detects blurry, small font, low contrast, noisy text from real world images.

WxBS: Wide Baseline Stereo Generalizations

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We have presented a new problem - the wide multiple baseline stereo (WxBS) -- which considers matching of images that simultaneously differ in more than one image acquisition factor such as viewpoint, illumination, sensor type or where object appearance changes significantly, e.g. over time. A new dataset with the ground truth for evaluation of matching algorithms has been introduced and will be made public. We have extensively tested a large set of popular and recent detectors and descriptors and show than the combination of RootSIFT and HalfRootSIFT as descriptors with MSER and Hessian-Affine detectors works best for many different nuisance factors. We show that simple adaptive thresholding improves Hessian-Affine, DoG, MSER (and possibly other) detectors and allows to use them on infrared and low contrast images. A novel matching algorithm for addressing the WxBS problem has been introduced. We have shown experimentally that the WxBS-M matcher dominantes the state-of-the-art methods both on both the new and existing datasets.

A 3D Approach to Facial Landmarks: Detection, Refinement, and Tracking

  • DOI: 10.1109/ICPR.2014.378
  • Odkaz: https://doi.org/10.1109/ICPR.2014.378
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A real-time algorithm for accurate localization of facial landmarks in a single monocular image is proposed. The algorithm is formulated as an optimization problem, in which the sum of responses of local classifiers is maximized with respect to the camera pose by fit ting a generic (not a person specific) 3D model. The algorithm simultaneously estimates a head position and orientation and detects the facial landmarks in the image. Despite being local, we show that the basin of attraction is large to the extent it can be initialized by a scannin g window face detector. Other experiments on standard datasets demonstrate that the proposed a lgorithm outperforms a state-of-the-art landmark detector especially for non-frontal face imag es, and that it is capable of reliable and stable tracking for large set of viewing angles.

A Few Things One Should Know About Feature Extraction, Description and Matching

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We explore the computational bottlenecks of the affine feature extraction process and sho w how this process can be speeded up by 2-3 times with no or very modest loss of performance. With o ur improvements the speed of the Hessian-Affine and MSER detector is comparable with similarity-inva riant SURF and DoG-SIFT detectors. The improvements presented include a faster anisotropic patch ext raction algorithm which does not depend on the feature scale, a speed up of a feature dominant orien tation estimation and SIFT descriptor computation using a look-up table. In the second part of the paper we explore performance of the recently proposed first geometrically inconsistent nearest neighbour criterion and domination orientation generation process.

Matching of Images of Non-planar Objects with View Synthesis

  • Autoři: Mgr. Dmytro Mishkin, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: SOFSEM 2014: Theory and Practice of Computer Science. Cham: Springer International Publishing AG, 2014. pp. 30-39. Lecture notes in computer science. ISSN 0302-9743. ISBN 978-3-319-04297-8.
  • Rok: 2014
  • DOI: 10.1007/978-3-319-04298-5_4
  • Odkaz: https://doi.org/10.1007/978-3-319-04298-5_4
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We explore the performance of the recently proposed two-view image matchin g algorithms using affine view synthesis ASIFT (Morel and Yu, 2009) [14] and MODS (Mishkin, Perdoch and Matas, 2013) [10] on images of objects that do not have significant local texture and that are l ocally not well approximated by planes. Experiments show that view synthesis improves matching resul ts on images of such objects, but the number of useful synthetic views is lower than for planar objects matching. The best detector for matching images of 3D objects is the Hessian-Affine in the Sparse configuration. The iterative MODS matcher performs comparably confirming it is a robust, generic method for two view matching that performs well for different types of scenes and a wide range of viewing conditions.

Rectification, and Segmentation of Coplanar Repeated Patterns

  • DOI: 10.1109/CVPR.2014.380
  • Odkaz: https://doi.org/10.1109/CVPR.2014.380
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper presents a novel and general method for the detection, rectification and segmentation of imaged coplanar repeated patterns. The only assumption made of the scene geometry is that repeated scene elements are mapped to each other by planar Euclidean transformations. The class of patterns covered is broad and includes nearly all commonly seen, planar, man-made repeated patterns. In addition, novel linear constraints are used to reduce geometric ambiguity between the rectified imaged pattern and the scene pattern. Rectification to within a similarity of the scene plane is achieved from one rotated repeat, or to within a similarity with a scale ambiguity along the axis of symmetry from one reflected repeat. A stratum of constraints is derived that gives the necessary configuration of repeats for each successive level of rectification. A generative model for the imaged pattern is inferred and used to segment the pattern with pixel accuracy. Qualitative results are shown on a broad range of image types on which state-of-the-art methods fail.

Relevance Assessment for Visual Video Re-ranking

  • Autoři: Aldana Iuit, J., prof. Mgr. Ondřej Chum, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Image Analysis and Recognition: 11th International Conference (ICIAR 2014). Berlin: Springer-Verlag, 2014, pp. 421-430. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-11757-7. Available from: http://dx.doi.org/10.1007/978-3-319-11758-4_46
  • Rok: 2014
  • DOI: 10.1007/978-3-319-11758-4_46
  • Odkaz: https://doi.org/10.1007/978-3-319-11758-4_46
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The following problem is considered: Given a name or phrase specifying an object, collect images and videos from the internet possibly depicting the object using a textual query on their name or annotation. A visual model from the images is built and used to rank the videos by relevance to the object of interest. Shot relevance is defined as the duration of the visibility of the object of interest. The model is based on local image features. The relevant shot detection builds on wide baseline stereo matching. The method is tested on 10 text phrases corresponding to 10 landmarks. The pool of 100 videos collected querying You-Tube with includes seven relevant videos for each landmark. The implementation runs faster than real-time at 208 frames per second. Averaged over the set of landmarks, at recall 0.95 the method has mean precision of 0.65, and the mean Average Precision (mAP) of 0.92.

Robust scale-adaptive mean-shift for tracking

  • DOI: 10.1016/j.patrec.2014.03.025
  • Odkaz: https://doi.org/10.1016/j.patrec.2014.03.025
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The mean-shift procedure is a popular object tracking algorithm since it is f ast, easy to implement and performs well in a range of conditions. We address the problem of s cale adaptation and present a novel theoretically justified scale estimation mechanism which relies solely on the mean-shift procedure for the Hellinger distance. We also propose two impro vements of the mean-shift tracker that make the scale estimation more robust in the presence of background clutter. The first one is a novel histogram color weighting that exploits the object neighborhood to help discriminate the target called background ratio weighting (BRW). We s how that the BRW improves performance of MS-like tracking methods in general. The second impro vement boost the performance of the tracker with the proposed scale estimation by the introduc tion of a forward-backward consistency check and by adopting regularization terms that counter two major problems: scale expansion caused by background clutter and scale implosion on self-similar objects. The proposed mean-shift tracker with scale selection and BRW is compared with recent state-of-the-art algorithms on a dataset of 77 public sequences. It outperforms the re ference algorithms in average recall, processing speed and it achieves the best score for 30% of the sequences - the highest percentage among the reference algorithms.

The Enhanced Flock of Trackers

  • DOI: 10.1007/978-3-642-44907-9_6
  • Odkaz: https://doi.org/10.1007/978-3-642-44907-9_6
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The paper presents contributions to the design of the Flock of Trackers (FoT). The FoT estimates the pose of the tracked object by robustly combining displacement estimates from a subset of local trackers that cover the object and has been. The enhancements of the Flock of Trackers are: (i) new reliability predictors for the local trackers - the Neighbourhood consistency predictor and the Markov predictor, (ii) new rules for combining the predictions and (iii) introduction of a RANSAC-based estimator of object motion. The enhanced FoT was extensively tested on 62 sequences.Most of the sequences are standard and used in the literature. The improved FoT showed performance superior to the reference method. For all 62 sequences, the ground truth is made publicly available.

Approximate Models for Fast and Accurate Epipolar Geometry Estimation

  • DOI: 10.1109/IVCNZ.2013.6727000
  • Odkaz: https://doi.org/10.1109/IVCNZ.2013.6727000
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper investigates the plausibility of using approximate models for hypothesis generation in a RANSAC framework to accurately and reliably estimate the fundamental matrix. Two novel fundamental matrix estimators are introduced that sample two correspondences to generate affine-fundamental matrices for RANSAC hypotheses. A new RANSAC framework is presented that uses local optimization to estimate the fundamental matrix from the consensus correspondence sets of verified hy- potheses, which are approximate models. The proposed estimators are shown to perform better than other approximate models that have previously been used in the literature for fundamental matrix estimation in a rigorous evaluation. In addition the proposed estimators are over 30 times faster, in terms of models verified, than the 7-point method, and offer comparable accuracy and repeatability on a large subset of the test set.

Fast Detection of Multiple Textureless 3-D Objects

  • Autoři: Cai, H., doc. Ing. Tomáš Werner, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision Systems - 9th International Conference, ICVS 2013, St. Petersburg, Russian Federation, July 16-18, 2013. Proceedings. Heidelberg: Springer, 2013. p. 103-112. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-39401-0.
  • Rok: 2013
  • DOI: 10.1007/978-3-642-39402-7_11
  • Odkaz: https://doi.org/10.1007/978-3-642-39402-7_11
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a fast edge-based approach for detection and approximate pose estimation of multiple textureless objects in a single image. The objects are trained from a set of edge maps, each showing one object in one pose. To each scanning window in the input image, the nearest neighbor is found among these training templates by a two-level cascade. The first cascade level, based on a novel edge-based sparse image descriptor and fast search by index table, prunes the majority of background windows. The second level verifies the surviving detection hypotheses by oriented chamfer matching, improved by selecting discriminative edges and by compensating a bias towards simple objects. The method outperforms the state-of-the-art approach by Damen et al. (2012). The processing is near real-time, ranging from 2 to 4 frames per second for the training set size 10^4.

Image Retrieval for Online Browsing in Large Image Collections

  • DOI: 10.1007/978-3-642-41062-8_2
  • Odkaz: https://doi.org/10.1007/978-3-642-41062-8_2
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Two new methods for large scale image retrieval are proposed, showing that the classical ranking of images based on similarity addresses only one of possible user requirements. The novel retrieval methods add zoom-in and zoom-out capabilities and answer the 'What is this?' and 'Where is this?' questions. The functionality is obtained by modifying the scoring and ranking functions of a standard bag-of-words image retrieval pipeline. We show the importance of the DAAT scoring and query expansion for recall of zoomed images. The proposed methods were tested on a standard large annotated image dataset together with images of Sagrada Familia and 100000 image confusers downloaded from Flickr. For completeness, we present in detail components of image retrieval pipelines in state-of-the-art systems. Finally, open problems related to zoom-in and zoom-out queries are discussed.

Kernel-mapped Histograms of Multi-scale LBPs for Tree Bark Recognition

  • Autoři: Šulc, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 28th International Conference of Image and Vision Computing New Zealand (IVCNZ 2013). Piscataway: IEEE, 2013. pp. 82-87. ISSN 2151-2191. ISBN 978-1-4799-0882-0.
  • Rok: 2013
  • DOI: 10.1109/IVCNZ.2013.6726996
  • Odkaz: https://doi.org/10.1109/IVCNZ.2013.6726996
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a novel method for tree bark identification by SVM classification of feature-mapped multi-scale descriptors formed by concatenated histograms of Local Binary Patterns (LBPs). A feature map approximating the histogram intersection kernel significantly improves the methods accuracy. Contrary to common practice, we use the full 256 bin LBP histogram rather than the standard 59 bin histogram of uniform LBPs and obtain superior results. Robustness to scale changes is handled by forming multiple multi-scale descriptors. Experiments conducted on a standard dataset show 96.5% accuracy using ten-fold cross validation. Using the standard 15 training examples per class, the proposed method achieves a recognition rate of 82.5% and significantly outperforms both the state-of-the-art automatic recognition rate of 64.2% and human experts with recognition rates of 56.6% and 77.8%. Experiments on standard texture datasets confirm that the proposed method is suitable for general texture recognition.

Learning Vocabularies over a Fine Quantization

  • DOI: 10.1007/s11263-012-0600-1
  • Odkaz: https://doi.org/10.1007/s11263-012-0600-1
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel similarity measure for bag-of-words type large scale image retrieval is presented. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming embedding. The novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard Oxford 5k, Oxford 105k and Paris datasets/protocols. We study the effect of a fine quantization and very large vocabularies (up to 64 million words) and show that the performance of specific object retrieval increases with the size of the vocabulary. This observation is in contradiction with previously published methods. We further demonstrate that the large vocabularies increase the speed of the tf-idf scoring step.

Long-Term Tracking Through Failure Cases

  • Autoři: Lebeda, K., Hadfield, S., Bowden, R., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 2013 IEEE International Conference on Computer Vision (ICCV 2013) Worskhops. Piscataway: IEEE, 2013. pp. 153-160. ISSN 1550-5499. ISBN 978-0-7695-5161-6.
  • Rok: 2013
  • DOI: 10.1109/ICCVW.2013.26
  • Odkaz: https://doi.org/10.1109/ICCVW.2013.26
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Long term tracking of an object, given only a single instance in an initial frame, remains an open problem. We propose a visual tracking algorithm, robust to many of the difficulties which often occur in real-world scenes. Correspondences of edge-based features are used, to overcome the reliance on the texture of the tracked object and improve invariance to lighting. Furthermore we address long-term stability, enabling the tracker to recover from drift and to provide redetection following object disappearance or occlusion. The two-module principle is similar to the successful state-of-the-art long-term TLD tracker, however our approach extends to cases of low-textured objects. Besides reporting our results on the VOT Challenge dataset, we perform two additional experiments. Firstly, results on short-term sequences show the performance of tracking challenging objects which represent failure cases for competing state-of-the-art approaches. Secondly, long sequences are tracked, including one of almost 30 000 frames which to our knowledge is the longest tracking sequence reported to date. This tests the re-detection and drift resistance properties of the tracker. All the results are comparable to the state-of-the-art on sequences with textured objects and superior on non-textured objects. The new annotated sequences are made publicly available.

On Combining Multiple Segmentations in Scene Text Recognition

  • DOI: 10.1109/ICDAR.2013.110
  • Odkaz: https://doi.org/10.1109/ICDAR.2013.110
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An end-to-end real-time scene text localization and recognition method is presented. The three main novel features are: (i) keeping multiple segmentations of each character until the very last st age of the processing when the context of each character in a text line is known, (ii) an efficient algori thm for selection of character segmentations minimizing a global criterion, and (iii) showing that, despit e using theoretically scale-invariant methods, operating on a coarse Gaussian scale space pyramid yields i mproved results as many typographical artifacts are eliminated. The method runs in real time and achieves state-of-the-art text localization results on the ICDAR 2011 Robust Reading dataset. Results are also repo rted for end-to-end text recognition on the ICDAR 2011 dataset.

Robust Scale-Adaptive Mean-Shift for Tracking

  • Autoři: Ing. Tomáš Vojíř, Ph.D., Nosková, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: SCIA 2013: Proceedings of the 18th Scandinavian Conference on Image Analysis. Heidelberg: Springer, 2013. p. 652-663. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-38885-9.
  • Rok: 2013
  • DOI: 10.1007/978-3-642-38886-6_61
  • Odkaz: https://doi.org/10.1007/978-3-642-38886-6_61
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Mean-Shift tracking is a popular algorithm for object tracking since it is easy to implement and it is fast and robust. In this paper, we address the problem of scale adaptation of the Hellinger distance based Mean-Shift tracker. We start from a theoretical derivation of scale estimation in the Mean-Shift framework. To make the scale estimation robust and suitable for tracking, we in- troduce regularization terms that counter two major problem: (i) scale expansion caused by background clutter and (ii) scale implosion on self-similar objects. To further robustify the scale estimate, it is validated by a forward-backward consis- tency check. The proposed Mean-shift tracker with scale selection is compared with re- cent state-of-the-art algorithms on a dataset of 48 public color sequences and it achieved excellent results.

Scene Text Localization and Recognition with Oriented Stroke Detection

  • DOI: 10.1109/ICCV.2013.19
  • Odkaz: https://doi.org/10.1109/ICCV.2013.19
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearest-neighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.

The Visual Object Tracking VOT2013 Challenge Results

  • Autoři: Kristan, M., Pflugfelder, R., Leonardis, A., prof. Ing. Jiří Matas, Ph.D., Porikli, F., Cehovin, L., Nebehay, G., Fernandez, G., Ing. Tomáš Vojíř, Ph.D.,
  • Publikace: IEEE International Conference on Computer Vision (ICCV 2013) Worskhops. Piscataway: IEEE, 2013. pp. 98-111. ISSN 1550-5499. ISBN 978-1-4799-3022-7.
  • Rok: 2013
  • DOI: 10.1109/ICCVW.2013.20
  • Odkaz: https://doi.org/10.1109/ICCVW.2013.20
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Visual tracking has attracted a significant attention in the last few decades. The recent surge in the number of publications on tracking-related problems have made it almost impossible to follow he developments in the field. One of the reasons is that there is a lack of commonly accepted annotated data-sets and standardized evaluation protocols that would allow objective comparison of ifferent tracking methods. To address this issue, the Visual Object Tracking (VOT) workshop was organized in conjunction with ICCV2013. Researchers from academia as well as industry were invited to participate in the first VOT2013 challenge which aimed at single-object visual trackers that do not apply pre-learned models of object appearance (modelfree). Presented here is the OT2013 benchmark dataset for evaluation of single-object visual trackers as well as the results obtained by the trackers competing in the challenge. In contrast to related attempts in tracker enchmarking, the dataset is labeled per-frame by visual attributes that indicate occlusion, illumination change, motion change, size change and camera motion, offering a more systematic omparison of the trackers. Furthermore, we have designed an automated system for performing and evaluating the experiments. We present the evaluation protocol of the VOT2013 challenge and the results of a comparison of 27 trackers on the benchmark dataset. The dataset, the evaluation tools and the tracker rankings are publicly available from the challenge website1.

Tracking the Untrackable: How to Track When Your Object Is Featureless

  • Autoři: Lebeda, K., prof. Ing. Jiří Matas, Ph.D., Bowden, R.
  • Publikace: Computer Vision - ACCV 2012 Workshops. Heidelberg: Springer, 2013, pp. 347-359. ISSN 0302-9743. ISBN 978-3-642-37483-8. Available from: http://cmp.felk.cvut.cz/~lebedkar/Lebeda-2012-FLOtrack-ACCV_DTCE.pdf
  • Rok: 2013
  • DOI: 10.1007/978-3-642-37484-5_29
  • Odkaz: https://doi.org/10.1007/978-3-642-37484-5_29
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a novel approach to tracking objects by low-level line correspondences. In our implementation we show that this approach is usable even when tracking objects with lack of texture, exploiting situations, when feature-based trackers fail due to the aperture problem. Furthermore, we suggest an approach to failure detection and recovery to maintain long-term stability. This is achieved by remembering configurations which lead to good pose estimations and using them later for tracking corrections.

Two-view Matching with View Synthesis Revisited

  • DOI: 10.1109/IVCNZ.2013.6727054
  • Odkaz: https://doi.org/10.1109/IVCNZ.2013.6727054
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Wide-baseline matching focussing on problems with extreme viewpoint change is considered. We in troduce the use of view synthesis with affine-covariant detectors to solve such problems and show that matching with the Hessian-Affine or MSER detectors outperforms the state-of-the-art ASIFT [19]. To minimise the loss of speed caused by view synthesis, we propose the Matching On Demand with view Synthesis algorithm (MODS) that uses progressively more synthesized images and more (time-consuming) detectors until reliable estimation of geometry is possible. We show experimentally that the MODS algorithm solves problems beyond the state-of-the-art and yet is comparable in speed to standard wide-baseline matchers on simpler problems. Minor contributions include an improved method for tentative correspondence selection, applicable both with and without view synthesis and a view synthesis setup greatly improving MSER robustness to blur and scale change that increase its running time by 10% only.

USAC: A Universal Framework for Random Sample Consensus

  • DOI: 10.1109/TPAMI.2012.257
  • Odkaz: https://doi.org/10.1109/TPAMI.2012.257
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A computational problem that arises frequently in computer vision is that of estimating the parameters of a model from data that have been contaminated by noise and outliers. More generally, any practical system that seeks to estimate quantities from noisy data measurements must have at its core some means of dealing with data contamination. The random sample consensus (RANSAC) algorithm is one of the most popular tools for robust estimation. Recent years have seen an explosion of activity in this area, leading to the development of a number of techniques that improve upon the efficiency and robustness of the basic RANSAC algorithm. In this paper, we present a comprehensive overview of recent research in RANSAC-based robust estimation by analyzing and comparing various approaches that have been explored over the years. We provide a common context for this analysis by introducing a new framework for robust estimation, which we call Universal RANSAC (USAC). USAC extends the simple hypothesize-and-verify structure of standard RANSAC to incorporate a number of important practical and computational considerations. In addition, we provide a general-purpose C++ software library that implements the USAC framework by leveraging state-of-the-art algorithms for the various modules. This implementation thus addresses many of the limitations of standard RANSAC within a single unified package. We benchmark the performance of the algorithm on a large collection of estimation problems. The implementation we provide can be used by researchers either as a stand-alone tool for robust estimation or as a benchmark for evaluating new techniques.

A Real-Time Scene Text to Speech System

  • DOI: 10.1007/978-3-642-33885-4_66
  • Odkaz: https://doi.org/10.1007/978-3-642-33885-4_66
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An end-to-end real-time scene text localization and recognition method is demonstrated. The method localizes textual content in images, a video or a webcam stream, performs character recognition (OCR) and "reads" it out loud using a text-to-speech engine. The method has been recently published, achieves state-of-the-art results on public datasets and is able to recognize different fonts and scripts including non-latin ones. The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs) which has a linear computation complexity in the number of pixels in the image. Robustness to blur, noise and illumination and color variations is also demonstrated. Finally, we show effects of various control parameters.

A System for Real-time Detection and Tracking of Vehicles from a Single Car-mounted Camera

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel system for detection and tracking of vehicles from a single car-mounted camera is presented. The core of the system are high-performance vision algorithms: the WaldBoost detector and the TLD tracker that are scheduled so that a real-time performance is achieved. The vehicle monitoring system is evaluated on a new dataset collected on Italian motorways which is provided with approxi- mate ground truth (GT'') obtained from laser scans. For a wide range of distances, the recall and precision of detection for cars are excellent. Statistics for trucks are also reported. The dataset with the ground truth is made public.

Detection of Bubbles As Concentric Circular Arrangements

  • Autoři: Strokina, N., prof. Ing. Jiří Matas, Ph.D., Eerola, T., Lensu, L., Kalviainen, H.
  • Publikace: ICPR 2012: Proceedings of 21st International Conference on Pattern Recognition. New York: IEEE, 2012. pp. 2655-2659. ISSN 1051-4651. ISBN 978-4-9906441-0-9.
  • Rok: 2012
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A method for the detection of bubble-like transparent objects with multiple interfaces in a liquid is proposed. Depending on the lighting conditions, bubble appearance varies significantly, including contrast reversal and multiple inter-reflections. We formulate the bubble detection problem as the detection of Concentric Circular Arrangements (CCA). The CCAs are recovered in a hypothesize-optimizeverify framework. The hypothesis generation proceeds by sampling from the components of the non-maximum suppressed responses of oriented ridge filters followed by CCA parameter estimation. Parameter optimization is carried out by minimizing a novel cost-function by the simplex method. The proposed method for bubble detection showed good performance in an industrial application requiring estimation of gas volume in pulp suspension, achieving 1.5% mean absolute relative error.

Fast Computation of min-Hash Signatures for Image Collections

  • Autoři: prof. Mgr. Ondřej Chum, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2012: Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Computer Society Press, 2012, pp. 3077-3084. ISSN 1063-6919. ISBN 978-1-4673-1228-8.
  • Rok: 2012
  • DOI: 10.1109/CVPR.2012.6248039
  • Odkaz: https://doi.org/10.1109/CVPR.2012.6248039
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A new method for highly efficient min-Hash generation for document collections is proposed. It exploits the inverted file structure which is available in many applications based on a bag or a set of words. Fast min-Hash generation is important in applications such as image clustering where good recall and precision requires a large number of min-Hash signatures.

Fixing the Locally Optimized RANSAC

  • DOI: 10.5244/C.26.95
  • Odkaz: https://doi.org/10.5244/C.26.95
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The paper revisits the problem of local optimization for RANSAC. Improvements of the LO-RANSAC procedure are proposed: a use of truncated quadratic cost function, an introduction of a limit on the number of inliers used for the least squares computation and several implementation issues are addressed. The implementation is made publicly available.

Homography Estimation from Correspondences of Local Elliptical Features

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a novel unified approach for homography estimation from two or more correspondences of local elliptical features. The method finds a homography defined by first-order Taylor expansions at two (or more) points. The approximations are affine transformations that are constrained by the ellipse-to-ellipse correspondences. Unlike methods based on projective invariants of conics, the proposed method generates only a single homography model per pair of ellipse correspondences. We show experimentally, that the proposed method generates models of precision comparable or better than the state-of-the-art at lower computational costs.

Real-time scene text localization and recognition

  • Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2012: Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Computer Society Press, 2012. p. 3538-3545. ISSN 1063-6919. ISBN 978-1-4673-1228-8.
  • Rok: 2012
  • DOI: 10.1109/CVPR.2012.6248097
  • Odkaz: https://doi.org/10.1109/CVPR.2012.6248097
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An end-to-end real-time scene text localization and recognition method is presented. The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs). The ER detector is robust to blur, illumination, color and texture variation and handles low-contrast text. In the first classification stage, the probability of each ER being a character is estimated using novel features calculated with O(1) complexity per region tested. Only ERs with locally maximal probability are selected for the second stage, where the classification is improved using more computationally expensive features. A highly efficient exhaustive search with feedback loops is then applied to group ERs into words and to select the most probable character segmentation. Finally, text is recognized in an OCR stage trained using synthetic fonts. The method was evaluated on two public datasets. On the ICDAR 2011 dataset, the method achieves state-of-the-art text localization results amongst published methods and it is the first one to report results for end-to-end text recognition. On the more challenging Street View Text dataset, the method achieves state-of-the-art recall. The robustness of the proposed method against noise and low contrast of characters is demonstrated by false positives caused by detected watermark text in the dataset.

Rotation-Invariant Image and Video Description With Local Binary Pattern Features

  • Autoři: Zhao, G., Ahonen, T., prof. Ing. Jiří Matas, Ph.D., Pietikäinen, M.
  • Publikace: IEEE Transactions on Image Processing. 2012, 21(4), 1465-1477. ISSN 1057-7149.
  • Rok: 2012
  • DOI: 10.1109/TIP.2011.2175739
  • Odkaz: https://doi.org/10.1109/TIP.2011.2175739
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In this paper, we propose a novel approach to compute rotation-invariant features from histograms of local noninvariant patterns. We apply this approach to both static and dynamic local binary pattern (LBP) descriptors. For static-texture description, we present LBP histogram Fourier (LBP-HF) features, and for dynamic-texture recognition, we present two rotation-invariant descriptors computed from the LBPs from three orthogonal planes (LBP-TOP) features in the spatiotemporal domain. LBP-HF is a novel rotation-invariant image descriptor computed from discrete Fourier transforms of LBP histograms. The approach can be also generalized to embed any uniform features into this framework, and combining the supplementary information, e.g., sign and magnitude components of the LBP, together can improve the description ability. Moreover, two variants of rotation-invariant descriptors are proposed to the LBP-TOP, which is an effective descriptor for dynamic-texture recognition, as shown by its recent success in different application problems, but it is not rotation invariant. In the experiments, it is shown that the LBP-HF and its extensions outperform noninvariant and earlier versions of the rotation-invariant LBP in the rotation-invariant texture classification. In experiments on two dynamic-texture databases with rotations or view variations, the proposed video features can effectively deal with rotation variations of dynamic textures (DTs). They also are robust with respect to changes in viewpoint, outperforming recent methods proposed for view-invariant recognition of DTs.

Tracking-Learning-Detection

  • Autoři: Kálal, Z., Mikolajczyk, K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2012, 34(7), 1409-1422. ISSN 0162-8828.
  • Rok: 2012
  • DOI: 10.1109/TPAMI.2011.239
  • Odkaz: https://doi.org/10.1109/TPAMI.2011.239
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object?s location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates the detector?s errors and updates it to avoid these errors in the future. We study how to identify the detector?s errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of ?experts?: 1) P-expert estimates missed detections, and 2) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.

Ultra-fast tracking based on zero-shift points

  • Autoři: Dupač, J., prof. Ing. Jiří Matas, Ph.D., Naiser, F.
  • Publikace: Image and Vision Computing. 2012, 30(12), 1016-1031. ISSN 0262-8856.
  • Rok: 2012
  • DOI: 10.1016/j.imavis.2012.08.015
  • Odkaz: https://doi.org/10.1016/j.imavis.2012.08.015
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel tracker of so called zero-shift points (ZSPs) is presented. ZSPs are points where a dot product with a single period of a sinusoidal wave, both in horizontal and vertical directions, is equal to zero. Very efficient tracking and localization of ZSPs is possible as a consequence of the existence of the field of 2D shift vectors pointing toward them. A single point is tracked on average in less than 10 ?s on a standard notebook. When organized in a Multi-scale Flock (MSF), the ZSPs become a core of a robust, fast and accurate tracker. We demonstrated the applicability of the combination of MSF?ZSP with RANSAC based homography estimation on standard sequences reporting good tracking results.

A method for text localization and recognition in real-world images

  • Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ACCV 2010: Proceedings of the 10th Asian Conference on Computer Vision, Part III. Heidelberg: Springer, 2011. p. 770-783. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-19317-0.
  • Rok: 2011
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A method for affine rectification of a plane exploiting knowledge of relative scale changes is presented. The rectifying transformation is fully specified by the relative scale change at three non-collinear points or by two pairs of points where the relative scale change is known; the relative scale change between the pairs is not required. The method also allows homography estimation between two views of a planar scene from three point-with-scale correspondences. The proposed method is simple to implement and without parameters; linear and thus supporting (algebraic) least squares solutions; and general, without restrictions on either the shape of the corresponding features or their mutual position. The wide applicability of the method is demonstrated on text rectification, detection of repetitive patterns, texture normalization and estimation of homography from three point-with-scale correspondences.

Detection and matching of curvilinear structures

  • Autoři: Lemaitre, C., Perďoch, M., Rahmoune, A., prof. Ing. Jiří Matas, Ph.D., Miteran, J.
  • Publikace: Pattern recognition. 2011, 44(7), 1514-1527. ISSN 0031-3203.
  • Rok: 2011
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose an approach to curvilinear and wiry object detection and matching based on a new curvilinear region detector (CRD) and a shape context-like descriptor (COH). Standard methods for local patch detection and description are not directly applicable to wiry objects and curvilinear structures, such as roads, railroads and rivers i.e. that most elliptical matches around features cover only the object. The detection process is first evaluated in terms of segmentation quality of curvilinear regions. The repeatability of the detection is then assessed using the protocol introduced in Mikolajczyk et al. Experiments show that the CRD is at least as robust as to several image acquisition conditions changes (viewpoint, scale, illumination, compression, blur) as the commonly used affine-covariant detectors. The paper also introduces an image collection containing wiry objects and curvilinear structures (the W?CS dataset).

Estimating hidden parameters for text localization and recognition

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A new method for text line formation for text localization and recognition is proposed. The method exhaustively enumerates short sequences of character regions in order to infer values of hidden text line parameters (such as text direction) and applies the parameters to efficiently limit the search space for longer sequences. The exhaustive enumeration of short sequences is achieved by finding all character region triplets that fulfill constraints of textual content, which keeps the proposed method efficient yet still capable to perform a robust estimation of the hidden parameters in order to correctly initialize the search. The method is applied to character regions which are detected as Maximally Stable Extremal Regions (MSERs). The performance of the method is evaluated on the standard ICDAR 2003 dataset, where the method outperforms (precision 0.60, recall 0.60) a previously published method for text line formation of MSERs.

Learning Linear Discriminant Projections for Dimensionality Reduction of Image Descriptors

  • Autoři: Cai, H., Mikolajczyk, K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011, 33(2), 338-352. ISSN 0162-8828.
  • Rok: 2011
  • DOI: 10.1109/TPAMI.2010.89
  • Odkaz: https://doi.org/10.1109/TPAMI.2010.89
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In this paper, we present Linear Discriminant Projections (LDP) for reducing dimensionality and improving discriminability of local image descriptors. We place LDP into the context of state-of-the-art discriminant projections and analyze its properties. LDP requires a large set of training data with point-to-point correspondence ground truth. We demonstrate that training data produced by a simulation of image transformations leads to nearly the same results as the real data with correspondence ground truth. This makes it possible to apply LDP as well as other discriminant projection approaches to the problems where the correspondence ground truth is not available, such as image categorization. We perform an extensive experimental evaluation on standard data sets in the context of image matching and categorization. We demonstrate that LDP enables significant dimensionality reduction of local descriptors and performance increases in different applications. The results improve upon the st

Linear Regression and Adaptive Appearance Models for Fast Simultaneous Modelling and Tracking

  • Autoři: Ellis, L., Dowson, N., prof. Ing. Jiří Matas, Ph.D., Bowden, R.
  • Publikace: International Journal of Computer Vision. 2011, 95(2), 154-179. ISSN 0920-5691.
  • Rok: 2011
  • DOI: 10.1007/s11263-010-0364-4
  • Odkaz: https://doi.org/10.1007/s11263-010-0364-4
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We proposes an approach to tracking by regression that uses no hard-coded models and no offline learning stage. The Linear Predictor (LP) tracker has been shown to be highly computationally efficient, resulting in fast tracking. Regression tracking techniques tend to require offline learning to learn suitable regression functions. We removes offline learning and therefore increases the applicability of the technique. The online-LP tracker can simply be seeded with an initial target location, akin to the ubiquitous Lucas-Kanade algorithm that tracks by registering an image template via minimisation. The issue is the representation of the target appearance and how this representation is able to adapt to changes in target appearance over time. We proposed two methods, LP-SMAT and LP-MED, demonstratthe ability to adapt to large appearance variations by incrementally building an appearance model.

Planar Affine Rectification from Change of Scale

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A method for affine rectification of a plane exploiting knowledge of relative scale changes is presented. The rectifying transformation is fully specified by the relative scale change at three non-collinear points or by two pairs of points where the relative scale change is known; the relative scale change between the pairs is not required. The method also allows homography estimation between two views of a planar scene from three point-with-scale correspondences. The proposed method is simple to implement and without parameters; linear and thus supporting (algebraic) least squares solutions; and general, without restrictions on either the shape of the corresponding features or their mutual position. The wide applicability of the method is demonstrated on text rectification, detection of repetitive patterns, texture normalization and estimation of homography from three point-with-scale correspondences.

Robustifying the Flock of Trackers

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The paper presents contributions to the design of the Flock of Trackers (FoT). The FoT trackers estimate the pose of the tracked object by robustly combining displacement estimates from local trackers that cover the object. The first contribution, called the Cell FoT, allows local trackers to drift to points good to track. The Cell FoT was compared with the Kalal et al. Grid FoT [4] and outperformed it on all sequences but one and for all local failure prediction methods. As a second contribution, we introduce two new predictors of local tracker failure - the neighbourhood consistency predictor (Nh) and the Markov predictor (Mp) and show that the new predictors combined with the NCC predictor are more powerful than the Kalal et al. [4] predictor based on NCC and FB. The resulting tracker equipped with the new predictors combined with the NCC predictor was compared with state-of-the-art tracking algorithms and surpassed them in terms of the number of sequences where a given tracking.

Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search

  • DOI: 10.1109/ICDAR.2011.144
  • Odkaz: https://doi.org/10.1109/ICDAR.2011.144
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An efficient method for text localization and recognition in real-world images is proposed. Thanks to effective pruning, it is able to exhaustively search the space of all character sequences in real time (200ms on a 640x480 image). The method exploits higher-order properties of text such as word text lines. We demonstrate that the grouping stage plays a key role in the text localization performance and that a robust and precise grouping stage is able to compensate errors of the character detector. The method includes a novel selector of Maximally Stable Extremal Regions (MSER) which exploits region topology. Experimental validation shows that 95.7% characters in the ICDAR dataset are detected using the novel selector of MSERs with a low sensitivity threshold. The proposed method was evaluated on the standard ICDAR 2003 dataset where it achieved state-of-the-art results in both text localization and recognition.

Total Recall II: Query Expansion Revisited

  • Autoři: prof. Mgr. Ondřej Chum, Ph.D., Mikulík, A., Perďoch, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2011: Proceedings of the 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2011. pp. 889-896. IEEE Conference on Computer Vision and Pattern Recognition. ISSN 1063-6919. ISBN 978-1-4577-0393-5.
  • Rok: 2011
  • DOI: 10.1109/CVPR.2011.5995601
  • Odkaz: https://doi.org/10.1109/CVPR.2011.5995601
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Most effective particular object and image retrieval approaches are based on the bag-of-words (BoW) model. All state-of-the-art retrieval results have been achieved by methods that include a query expansion that brings a significant boost in performance. We introduce three modifications to automatic query expansion: (i) a method capable of preventing query expansion failure caused by the presence of confusers, (ii) an improved spatial verification and re-ranking step that incrementally builds a statistical model of the query object and (iii) we learn relevant spatial context to boost retrieval performance. The three improvements of query expansion were evaluated on established Paris and Oxford datasets according to a standard protocol, and state-of-the-art results were achieved.

Ultra-fast tracking based on zero-shift points

  • Autoři: Dupač, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ICASSP'11: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2011, pp. 1429-1432. ISSN 1520-6149. ISBN 978-1-4577-0539-7.
  • Rok: 2011
  • DOI: 10.1109/ICASSP.2011.5946682
  • Odkaz: https://doi.org/10.1109/ICASSP.2011.5946682
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel tracker based on points where the intensity function is locally even is presented. Tracking of these so called zeroshift points (ZSPs) is very efficient, a single point is tracked on average in less than 10 microseconds on a standard notebook. We demonstrate experimentally the robustness of the tracker to image transformations and a relatively long lifetime of ZSPs in real videosequences.

A Voting Strategy for Visual Ego-Motion from Stereo

  • DOI: 10.1109/IVS.2010.5548093
  • Odkaz: https://doi.org/10.1109/IVS.2010.5548093
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We present a procedure for egomotion estimation from visual input of a stereo pair of video cameras. The 3D egomotion problem, which has six degrees of freedom in general, is simplified to four dimensions and further decomposed to two two-dimensional subproblems. The decomposition allows us to use a voting strategy to identify the most probable solution, avoiding the random sampling (RANSAC) or other approximation techniques. The input constitutes of image correspondences between consecutive stereo pairs, i.e. feature points do not need to be tracked over time. The experiments show that even if a trajectory is put together as a simple concatenation of frame-to-frame increments, it comes out reliable and precise.

Construction of Precise Local Affine Frames

  • Autoři: Mikulík, A., prof. Ing. Jiří Matas, Ph.D., Perďoch, M., prof. Mgr. Ondřej Chum, Ph.D.,
  • Publikace: ICPR'2010: Proceedings of the 20th International Conference on Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2010, pp. 3565-3569. ISSN 1051-4651. ISBN 978-0-7695-4109-9.
  • Rok: 2010
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a novel method for the refinement of Maximally Stable Extremal Region (MSER) boundaries to sub-pixel precision by taking into account the intensity function in the 2x2 neighborhood of the contour points. The proposed method improves the repeatability and precision of Local Affine Frames (LAFs) constructed on extremal regions. Additionally, we propose a novel method for detection of local curvature extrema on the refined contour. Experimental evaluation on publicly available datasets shows that matching with the modified LAFs leads to a higher number of correspondences and a higher inlier ratio in more than 80% of the test image pairs. Since the processing time of the contour refinement is negligible, there is no reason not to include the algorithms as a standard part of the MSER detector and LAF constructions.

Efficient Sequential Correspondence Selection by Cosegmentation

  • DOI: 10.1109/TPAMI.2009.176
  • Odkaz: https://doi.org/10.1109/TPAMI.2009.176
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Object recognition and wide baseline stereo methods, correspondences of interest points (distinguished regions) are commonly established by matching compact descriptors such as SIFTs. We show that a subsequent cosegmentation process coupled with a quasi-optimal sequential decision process leads to a correspondence verification procedure that (i) has high precision (ii) has good recall and (iii) is fast. The sequential decision on the correctness of a correspondence is based on simple statistics of a modified dense stereo matching algorithm. The statistics are projected on a prominent discriminative direction by SVM. Wald's sequential probability ratio test is performed on the SVM projection computed on progressively larger cosegmented regions.We show experimentally that the proposed Sequential Correspondence Verification (SCV) algorithm significantly outperforms the correspondence selection method based on SIFT distance ratios.

Face-TLD: Tracking-Learning-Detection Applied to Faces

  • Autoři: Kálal, Z., Mikolajczyk, K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 17th IEEE International Conference on Image Processing (ICIP 2010). New Jersey: IEEE Signal Processing Society, 2010. pp. 3789-3792. ISSN 1522-4880. ISBN 978-1-4244-7994-8.
  • Rok: 2010
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel system for long-term tracking of a human face in unconstrained videos is built on Tracking-Learning-Detection (TLD) approach. The system extends TLD with the concept of a generic detector and a validator which is designed for real-time face tracking resistent to occlusions and appearance changes. The off-line trained detector localizes frontal faces and the online trained validator decides which faces corre- spond to the tracked subject. Several strategies for build- ing the validator during tracking are quantitatively evaluated. The system is validated on a sitcom episode (23 min.) and a surveillance (8 min.) video. In both cases the system detects- tracks the face and automatically learns a multi-view model from a single frontal example and an unlabeled video.

Forward-Backward Error: Automatic Detection of Tracking Failures

  • Autoři: Kálal, Z., Mikolajczyk, K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ICPR'2010: Proceedings of the 20th International Conference on Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2010, pp. 2756-2760. ISSN 1051-4651. ISBN 978-0-7695-4109-9.
  • Rok: 2010
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper proposes a novel method for tracking fail ure detection. The detection is based on the Forward-Backward error, i.e. the tracking is performed forward and backward in time and the discrepancies between these two trajectories are measured. We demonstrate that the proposed error enables reliable detection of tracking failures and selection of reliable trajectories in video sequences. We demonstrate that the approach is complementary to commonly used normalized cross-correlation (NCC). Based on the error, we propose a novel object tracker called Median Flow. State-of-the-art performance is achieved on challenging benchmark video sequences which include non-rigid objects.

Image Matching and Retrieval by Repetitive Patterns

  • Autoři: Doubek, P., prof. Ing. Jiří Matas, Ph.D., Perďoch, M., prof. Mgr. Ondřej Chum, Ph.D.,
  • Publikace: ICPR'2010: Proceedings of the 20th International Conference on Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2010, pp. 3195-3198. ISSN 1051-4651. ISBN 978-0-7695-4109-9.
  • Rok: 2010
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Detection of repetitive patterns in images has been studied for a long time in computer vision. This paper discusses a method for representing a lattice or line pattern by shift-invariant descriptor of the repeating element. The descriptor overcomes shift ambiguity and can be matched between different a views. The pattern matching is then demonstrated in retrieval experiment, where different images of the same buildings are retrieved solely by repetitive patterns.

Large Scale Discovery of Spatilly Related Images

  • DOI: 10.1109/TPAMI.2009.166
  • Odkaz: https://doi.org/10.1109/TPAMI.2009.166
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a randomized data mining method that finds clusters of spatially overlapping images. The core of the method relies on the min-Hash algorithm for fast detection of pairs of images with spatial overlap, the so-called cluster seeds. The seeds are then used as visual queries to obtain clusters which are formed as transitive closures of sets of partially overlapping images that include the seed. We show that the probability of finding a seed for an image cluster rapidly increases with the size of the cluster.

Learning a Fine Vocabulary

  • Autoři: Mikulík, A., Perďoch, M., prof. Mgr. Ondřej Chum, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Proceedings, Part III. Heidelberg: Springer, 2010. pp. 1-14. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-15557-4.
  • Rok: 2010
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We present a novel similarity measure for bag-of-words type large scale image retrieval. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming embedding. Experimentally we show that the novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard Oxford 105k dataset/protocol. At the same time, retrieval with the proposed similarity function is faster than the reference method.

P-N Learning: Bootstrapping Binary Classifiers by Structural Constraints

  • Autoři: Kálal, Z., prof. Ing. Jiří Matas, Ph.D., Mikolajczyk, K.
  • Publikace: CVPR 2010: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Madison: Omnipress, 2010. pp. 49-56. ISSN 1063-6919. ISBN 978-1-4244-6984-0.
  • Rok: 2010
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper shows that the performance of a binary clas- sifier can be significantly improved by the processing of structured unlabeled data, i.e. data are structured if know- ing the label of one example restricts the labeling of the others. We propose a novel paradigm for training a binary classifier from labeled and unlabeled examples that we call P-N learning. The learning process is guided by positive (P) and negative (N) constraints which restrict the label- ing of the unlabeled set. P-N learning evaluates the clas- sifier on the unlabeled data, identifies examples that have been classified in contradiction with structural constraints and augments the training set with the corrected samples in an iterative process. We propose a theory that formu- lates the conditions under which P-N learning guarantees improvement of the initial classifier and validate it on syn- thetic and real data. P-N learning is applied to the problem of on-line learning of object detector during tracking. We

Tracking the Invisible: Learning Where the Object Might be

  • Autoři: Grabner, H., prof. Ing. Jiří Matas, Ph.D., Van Gool, L., Cattin, P.
  • Publikace: CVPR 2010: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Madison: Omnipress, 2010. pp. 1285-1292. ISSN 1063-6919. ISBN 978-1-4244-6984-0.
  • Rok: 2010
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Objects are usually embedded into context. Visual context has been successfully used in object detection tasks, however, it is often ignored in object tracking. We propose a method to learn supporters which are, be it only temporally, useful for determining the position of the object of interest. Our approach exploits the General Hough Transform strategy. It couples the supporters with the target and naturally distinguishes between strongly and weakly coupled motions. By this, the position of an object can be estimated even when it is not seen directly (e.g., fully occluded or outside of the image region) or when it changes its appearance quickly and significantly. Experiments show substantial improvements in model-free tracking as well as in the tracking of virtual points, e.g., in medical applications.

Unsupervised Discovery of Co-occurrence in Sparse High Dimensional Data

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An efficient min-Hash based algorithm for discovery of dependencies in sparse high-dimensional data is presented. The dependencies are represented by sets of features cooccurring with high probability and are called co-ocsets. Sparse high dimensional descriptors, such as bag of words, have been proven very effective in the domain of image retrieval. To maintain high efficiency even for very large data collection, features are assumed independent. We show experimentally that co-ocsets are not rare, i.e. the independence assumption is often violated, and that they may ruin retrieval performance if present in the query image. Two methods for managing co-ocsets in such cases are proposed. Both methods significantly outperform the state-of-the-art in image retrieval, one is also significantly faster.

Anytime learning for the NoSLLiP tracker

  • DOI: 10.1016/j.physletb.2003.10.07
  • Odkaz: https://doi.org/10.1016/j.physletb.2003.10.07
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Anytime learning for the Sequence of Learned Linear Predictors (SLLiP) tracker is proposed. Learning might be time consuming for large problems, we present an anytime learning algorithm which, after a very short initialization period, provides a solution with defined precision. As SLLiP tracking requires only a fraction of the processing power of an ordinary PC, the learning can continue in a parallel background thread continuously delivering improved SLLiPs, ie. faster, with lower computational complexity, with the same pre-defined precision. The proposed approach is verified on publicly-available sequences with approximately 12000 ground truthed frames. The learning time is shown to be twenty times smaller than learning based on linear programming proposed in the paper that introduced the SLLiP tracker [TR]. Its robustness and accuracy is similar. Superiority in frame-rate and robustness with respect to the SIFT detector, Lucas-Kanade tracker and Jurie's tracker is also demonstrated.

Efficient Representation of Local Geometry for Large Scale Object Retrieval

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    State of the art methods for image and object retrieval exploit both appearance (via visual words) and local geometry (spatial extent, relative pose). In large scale problems, memory becomes a limiting factor - local geometry is stored for each feature detected in each image and requires storage larger than the inverted file and term frequency and inverted document frequency weights together. We propose a novel method for learning discretized local geometry representation based on minimization of average reprojection error in the space of ellipses. The representation requires only 24 bits per feature without drop in performance. Additionally, we show that if the gravity vector assumption is used consistently from the feature description to spatial verification, it improves retrieval performance and decreases the memory footprint. The proposed method outperforms state of the art retrieval algorithms in a standard image retrieval benchmark.

Geometric min-Hashing: Finding a (thick) needle in a haystack

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a novel hashing scheme for image retrieval, clustering and automatic object discovery. Unlike commonly used bag-of-words approaches, the spatial extent of image features is exploited in our method. The geometric information is used both to construct repeatable hash keys and to increase the discriminability of the description. Each hash key combines visual appearance (visual words) with semi-local geometric information. Compared with the state-of-the-art min-hash, the proposed method has both higher recall (probability of collision for hashes on the same object) and lower false positive rates (random collisions). The advantages of geometric min-hashing approach are most pronounced in the presence of viewpoint and scale change, significant occlusion or small physical overlap of the viewing fields. We demonstrate the power of the proposed method on small object discovery in a large unordered collection of images and on a large scale image clustering problem.

Integrated vision system for the semantic interpretation of activities where a person handles objects

  • DOI: 10.1016/j.cviu.2008.10.008
  • Odkaz: https://doi.org/10.1016/j.cviu.2008.10.008
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Interpretation of human activity is primarily known from surveillance and video analysis tasks and concerned with the persons alone. In this paper we present an integrated system that gives a natural language interpretation of activities where a person handles objects. The system integrates low-level image components such as hand and object tracking, detection and recognition, with high-level processes such as spatio-temporal object relationship generation, posture and gesture recognition, and activity reasoning. A task-oriented approach focuses processing to achieve near real-time and to react depending on the situation context.

Learning Fast Emulators of Binary Decision Processes

  • DOI: 10.1007/s11263-009-0229-x
  • Odkaz: https://doi.org/10.1007/s11263-009-0229-x
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We shows how existing binary decision algorithms can be approximated by a fast trained WaldBoost classifier. WaldBoost learning minimises the decision time of the classifier while guaranteeing predefined precision. The WaldBoost algorithm together with bootstrapping is able to efficiently handle an effectively unlimited number of training examples provided by the implementation of the approximated algorithm. Two interest point detectors, the Hessian-Laplace and the Kadir-Brady saliency detectors, are emulated to demonstrate the approach. Experiments show that while the repeatability and matching scores are similar for the original and emulated algorithms, a 9-fold speed-up for the Hessian-Laplace detector and a 142-fold speed-up for the Kadir-Brady detector is achieved.

Online learning of robust object detectors during unstable tracking

  • Autoři: Kálal, Z., prof. Ing. Jiří Matas, Ph.D., Mikolajczyk, K.
  • Publikace: 3rd On-line learning for Computer Vision Workshop OLCV'09 (held in conjunction with ICCV 2009). Los Alamitos: IEEE Computer Society Press, 2009. pp. 1417-1424. ISBN 978-1-4244-4441-0.
  • Rok: 2009
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This work investigates the problem of robust, longterm visual tracking of unknown objects in unconstrained environments. It therefore must cope with frame-cuts, fast camera movements and partial/total object occlusions/ disappearances. We propose a new approach, called Tracking-Modeling-Detection (TMD) that closely integrates adaptive tracking with online learning of the object-specific detector. Starting from a single click in the first frame, TMD tracks the selected object by an adaptive tracker. The trajectory is observed by two processes (growing and pruning event) that robustly model the appearance and build an object detector on the fly. Both events make errors, the stability of the system is achieved by their cancellation. The learnt detector enables re-initialization of the tracker whenever previously observed appearance reoccurs. We show the real-time learning and classification is achievable with random forests.

Rotation invariant image description with local binary pattern histogram fourier features.

  • Autoři: Ahonen, T., prof. Ing. Jiří Matas, Ph.D., He, C., Matti, P.
  • Publikace: SCIA 2009: Proceedings of the 16th Scandinavian Conference on Image Analysis. Berlin: Springer, 2009. p. 61-70. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-02229-6.
  • Rok: 2009
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In this paper, we propose Local Binary Pattern Histogram Fourier features (LBP-HF), a novel rotation invariant image descriptor computed from discrete Fourier transforms of local binary pattern (LBP) histograms. Unlike most other histogram based invariant texture descriptors which normalize rotation locally, the proposed invariants are constructed globally for the whole region to be described. In addition to being rotation invariant, the LBP-HF features retain the highly discrim- inative nature of LBP histograms. In the experiments, it is shown that these features outperform non-invariant and earlier version of rotation invariant LBP and the MR8 descriptor in texture classification, material categorization and face recognition tests.

Sputnik Tracker: Looking for a Companion Improves Robustness of the Tracker

  • Autoři: Cerman, L., Hlaváč, V., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: SCIA 2009: Proceedings of the 16th Scandinavian Conference on Image Analysis. Berlin: Springer, 2009, pp. 291-300. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-02229-6.
  • Rok: 2009
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Tracked objects rarely move alone. They are often temporarily accompanied by other objects undergoing similar motion. We propose a novel tracking algorithm called Sputnik (Sputnik, pronounced sput-nik in Russian, was the first Earth-orbiting satellite launched in 1957. According to Merriam-Webster dictionary, the English translation of the Russian word sputnik is a travelling companion.) Tracker. It is capable of identifying which image regions move coherently with the tracked object. This information is used to stabilize tracking in the presence of occlusions or fluctuations in the appearance of the tracked object, without the need to model its dynamics. In addition, Sputnik Tracker is based on a novel template tracker integrating foreground and background appearance cues. The time varying shape of the target is also estimated in each video frame, together with the target position. The time varying shape is used as another cue when estimating the target position in the next frame.

Tracking by an Optimal Sequence of Linear Predictors

  • DOI: 10.1109/TPAMI.2008.119
  • Odkaz: https://doi.org/10.1109/TPAMI.2008.119
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a learning approach to tracking explicitly minimizing the computational complexity of the tracking process subject to user-defined probability of failure (loss-of-lock) and precision. The tracker is formed by a Number of Sequences of Learned Linear Predictors (NoSLLiP). Robustness of NoSLLiP is achieved by modeling the object as a collection of local motion predictors --- object motion is estimated by the outlier-tolerant Ransac algorithm from local predictions. Efficiency of the NoSLLiP tracker stems from (i) the simplicity of the local predictors and (ii) from the fact that all design decisions - the number of local predictors used by the tracker, their computational complexity (ie the number of observations the prediction is based on), locations as well as the number of Ransac iterations are all subject to the optimization (learning) process. All time-consuming operations are performed during the learning stage - t.

Dense Linear-Time Correspondences for Tracking

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel method is proposed for the problem of frame-to-frame correspondence search in video sequences. The method, based on hashing of low-dimensional image descriptors, establishes dense correspondences and allows large motions. All image pixels are considered for matching, the notion of interest points is reviewed. In our formulation, points of interest are those that can be reliably matched. Their saliency depends on properties of the chosen matching function and on actual image content. Both computational time and memory requirements of the correspondence search are asymptoticaly linear in the number of image pixels, irrespective of correspondence density and of image content. All steps of the method are simple and allow for a hardware implementation. Functionality is demonstrated on sequences taken from a vehicle moving in an urban environment.

Efficient Sequential Correspondence Selection by Cosegmentation

  • Autoři: Ing. Jan Čech, Ph.D., prof. Ing. Jiří Matas, Ph.D., Perďoch, M.
  • Publikace: CVPR 2008: Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Medison: Omnipress, 2008. pp. 1020-1027. ISSN 1063-6919. ISBN 978-1-4244-2242-5.
  • Rok: 2008
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In many retrieval, object recognition and wide baseline stereo methods, correspondences of interest points (distinguished regions, transformation covariant points) are established possibly sublinearly by matching a compact descriptor such as SIFT. We show that a subsequent cosegmentation process coupled with a quasi-optimal sequential decision process leads to a correspondence verification procedure that has (i) high precision (is highly discriminative) (ii) good recall and (iii) is fast. The sequential decision on the correctness of a correspondence is based on trivial attributes of a modified dense stereo matching algorithm. The attributes are projected on a prominent discriminative direction by SVM. Wald's sequential probability ratio test is performed for SVM projection computed on progressively larger co-segmented regions. Experimentally we show that the process significantly outperforms the standard correspondence selection process based on SIFT distance ratios on challenging mat

Learning Linear Discriminant Projections for Dimensionality Reduction of Image Descriptors

  • Autoři: Cai, H., Mikolajczyk, K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: BMVC 2008: Proceedings of the 19th British Machine Vision Conference. London: British Machine Vision Association, 2008. pp. 503-512. ISBN 978-1-901725-36-0.
  • Rok: 2008
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper proposes a general method for improving image descriptors using discriminant projections. Two methods based on Linear Discriminant Analysis have been recently introduced to improve matching performance of local descriptors and to reduce their dimensionality. These methods require large training set with ground truth of accurate point-to-point correspondences which limits their applicability. We demonstrate the theoretical equivalence of these methods and provide a means to derive projection vectors on data without available ground truth. It makes it possible to apply this technique and improve performance of any combination of interest point detectors-descriptors. We conduct an extensive evaluation of the discriminative projection methods in various application scenarios. The results validate the proposed method in viewpoint invariant matching and category recognition.

Mobile Mapping of Vertical Traffic Infrastructure

  • Autoři: Doubek, P., Perďoch, M., prof. Ing. Jiří Matas, Ph.D., Mgr. Jan Šochman, Ph.D.,
  • Publikace: CVWW 2008: Proceedings of the 13th Computer Vision Winter Workshop. Ljubljana: Slovenian Pattern Recognition Society, 2008, pp. 115-122. ISBN 978-961-90901-4-5.
  • Rok: 2008
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In this paper, we present a method for detection and localization of vertical traffic infrastructure using video sequences recorded by a survey vehicle. Search for pole-like structures in the images creates initial 2D hypotheses. They are fused on the groundplane to form 3D hypotheses which are finally verified and classified by search for the distinguished part of the infrastructure. Each step is followed by pruning the set of hypotheses using SVM classifier. The method was tested in a streetlight detection application with video sequences containing over one thousand streetlights.

Online Learning and Partitioning of Linear Displacement Predictors for Tracking

  • Autoři: Ellis, L., prof. Ing. Jiří Matas, Ph.D., Bowden, R.
  • Publikace: BMVC 2008: Proceedings of the 19th British Machine Vision Conference. London: British Machine Vision Association, 2008. pp. 33-42. ISBN 978-1-901725-36-0.
  • Rok: 2008
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel approach to learning and tracking arbitrary image features is presented. Tracking is tackled by learning the mapping from image intensity differences to displacements. Linear regression is used, resulting in low computational cost. An appearance model of the target is built on-the-fly by clustering sub-sampled image templates. The medoidshift algorithm is used to cluster the templates thus identifying various modes or aspects of the target appearance, each mode is associated to the most suitable set of linear predictors allowing piecewise linear regression from image intensity differences to warp updates. Despite no hard-coding or offline learning, excellent results are shown on three publicly available video sequences and comparisons with related approaches made.

Optimal Randomized RANSAC

  • DOI: 10.1109/TPAMI.2007.70787
  • Odkaz: https://doi.org/10.1109/TPAMI.2007.70787
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A randomized model verification strategy for RANSAC is presented. The proposed method finds, like RANSAC, a solution that is optimal with user-specified probability. The solution is found in time that is close to the shortest possible and superior to any deterministic verification strategy. A provably fastest model verification strategy is designed for the (theoretical) situation when the contamination of data by outliers is known. In this case, the algorithm is the fastest possible (on the average) of all randomized RANSAC algorithms guaranteeing a confidence in the solution. The derivation of the optimality property is based on Wald's theory of sequential decision making, in particular, a modified sequential probability ratio test (SPRT). Next, the R-RANSAC with SPRT algorithm is introduced. The algorithm removes the requirement for a priori knowledge of the fraction of outliers and estimates the quantity online. We show experimentally that on standard test data, the method has perf

Simultaneous learning of motion and appearance

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A new learning method for motion estimation of objects with significantly varying appearance is proposed. Varying object appearance is represented by a low dimensional space of appearance parameters. The appearance mapping and motion estimation method are optimized simultaneously. Appearance parameters are estimated by unsupervised learning. The method is experimentally verified by a tracking application on sequences which exhibit strong variable illumination, non-rigid deformations and self-occlusions.

Training Sequential On-line Boosting Classifier for Visual Tracking

  • Autoři: Grabner, H., Mgr. Jan Šochman, Ph.D., Bischof, H., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ICPR 2008: Proceedings of the 19th International Conference on Pattern Recognition. Madison: Omnipress, 2008. pp. 1360-1363. ISSN 1051-4651. ISBN 978-1-4244-2174-9.
  • Rok: 2008
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    On-line boosting allows to adapt a trained classifier to changing environmental conditions or to use sequentially available training data. Yet, two important problems in the on-line boosting training remain unsolved: (i) classifier evaluation speed optimization and, (ii) automatic classifier complexity estimation. In this paper we show how the on-line boosting can be combined with Wald's sequential decision theory to solve both of the problems.The properties of the proposed on-lineWaldBoost algorithm are demonstrated on a visual tracking problem. The complexity of the classifier is changing dynamically depending on the difficulty of the problem. On average, a speedup of a factor of 5-10 is achieved compared to the non-sequential on-line boosting.

Wald's Sequential Analysis for Time-constrained Vision Problems

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In detection and matching problems in computer vision, both classification errors and time to decision characterize the quality of an algorithmic solution. It is shown how to formalize such problems in the framework of sequential decision-making and derive quasi-optimal time-constrained solutions for three vision problems. The methodology is applied to face and interest point detection and to the RANSAC robust estimator. Error rates of the face detector proposed algorithm are comparable to the state-of-the-art methods. In the interest point application, the output of the Hessian-Laplace detector [Mikolajczyk-IJCV04] is approximated by a sequential WaldBoost classifier which is about five times faster than the original with comparable repeatability. A sequential strategy based on Wald's SPRT for evaluation of model quality in RANSAC leads to significant speed-up in geometric matching problems.

Weighted Sampling for Large-Scale Boosting

  • Autoři: Kálal, Z., prof. Ing. Jiří Matas, Ph.D., Mikolajczyk, K.
  • Publikace: BMVC 2008: Proceedings of the 19th British Machine Vision Conference. London: British Machine Vision Association, 2008. pp. 413-422. ISBN 978-1-901725-36-0.
  • Rok: 2008
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper addresses the problem of learning from very large databases where batch learning is impractical or even infeasible. Bootstrap is a popular technique applicable in such situations. We show that sampling strategy used for bootstrapping has a significant impact on the resulting classifier performance. We design a new general sampling strategy quasi-random weighted sampling + trimming (QWS+) that includes well established strategies as special cases. The QWS+ approach minimizes the variance of hypothesis error estimate and leads to significant improvement in performance compared to standard sampling techniques. The superior performance is demonstrated on several problems including profile and frontal face detection.

Adaptive parameter optimization for real-time tracking

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Adaptation of a tracking procedure combined in a common way with a Kalman filter is formulated as an constrained optimization problem, where a trade-off between precision and loss-of-lock probability is explicitly taken into account. While the tracker is learned in order to minimize computational complexity during a learning stage, in a tracking stage the precision is maximized online under a constraint imposed by the loss-of-lock probability resulting in an optimal setting of the tracking procedure. We experimentally show that the proposed method converges to a steady solution in all variables. In contrast to a common Kalman filter based tracking, we achieve a significantly lower state covariance matrix. We also show, that if the covariance matrix is continuously updated, the method is able to adapt to a different situations. If a dynamic model is precise enough the tracker is allowed to spend a longer time with a fine motion estimation, however, if the motion gets saccadic, i.e. unpr

Definition of a model-based detector of curvilinear regions

  • Autoři: Lemaitre, C., Mitéran, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CAIP 2007: Proceedings of the 12th International Conference on Computer Analysis of Images and Patterns. Berlin: Springer, 2007. pp. 686-693. ISBN 978-3-540-74271-5.
  • Rok: 2007
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A new approach for detection of curvilinear regions is described.

Efficient Symmetry Detection Using Local Affine Frames

  • Autoři: Cornelius, H., Perďoch, M., prof. Ing. Jiří Matas, Ph.D., Loy, G.
  • Publikace: SCIA 2007: Proceedings of 15th Scandinavian Conference on Image Analysis. Heidelberg: Springer, 2007. pp. 152-161. ISSN 0302-9743. ISBN 978-3-540-73039-2.
  • Rok: 2007
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An efficient method for detecting planar bilateral symmetries under perspective projection.

Improving SIFT for Fast Tree Matching by Optimal Linear Projection

  • Autoři: Mikolajczyk, K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ICCV 2007: Proceedings of Eleventh IEEE International Conference on Computer Vision. Madison: Omnipress, 2007, ISSN 1550-5499. ISBN 978-1-4244-1631-8.
  • Rok: 2007
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose to transform an image descriptor so that nearest neighbor (NN) search for correspondences becomes the optimal matching strategy under the assumption that inter-image deviations of corresponding descriptors have Gaussian distribution. The Euclidean NN in the transformed domain corresponds to the NN according to a truncated Mahalanobis metric in the original descriptor space. We provide theoretical justification for the proposed approach and show experimentally that the transformation allows a significant dimensionality reduction and improves matching performance of a state-of-the art SIFT descriptor. We observe consistent improvement in precision-recall and speed of fast matching in tree structures at the expense of little overhead for projecting the descriptors into transformed space. In the context of SIFT vs. transformed MSIFT comparison, tree search structures are evaluated according to different criteria and query types. All search tree experiments confirm that transform

Learning A Fast Emulator of a Binary Decision Process

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Computation time is an important performance characteristic of computer vision algorithms. This paper shows how existing (slow) binary-valued decision algorithms can be approximated by a trained WaldBoost classifier, which minimises the decision time while guaranteeing predefined approximation precision. The core idea is to take an existing algorithm as a black box performing some useful binary decision task and to train the WaldBoost classifier as its emulator. Two interest point detectors, Hessian-Laplace and Kadir-Brady saliency detector, are emulated to demonstrate the approach. The experiments show similar repeatability and matching score of the original and emulated algorithms while achieving a 70-fold speed-up for Kadir-Brady detector.

Linear Predictors for Fast Simultaneous Modeling and Tracking

  • Autoři: Ellis, L., prof. Ing. Jiří Matas, Ph.D., Dowson, N., Bowden, R.
  • Publikace: NRTL 2007: Proceedings of workshop on Non-rigid registration and tracking through learning - ICCV. Madison: Omnipress, 2007. ISSN 1550-5499. ISBN 978-1-4244-1631-8.
  • Rok: 2007
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An approach for fast tracking of arbitrary image features with no prior model and no offline learning stage is presented. Fast tracking is achieved using banks of linear displacement predictors learnt online. A multi-modal appearance model is also learnt on-the-fly that facilitates the selection of subsets of predictors suitable for prediction in the next frame. The approach is demonstrated in real-time on a number of challenging video sequences and experimentally compared to other simultaneous modeling and tracking approaches with favourable results.

Stable Affine Frames on Isophotes

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a new affine-covariant feature, the Stable Affine Frame (SAF). SAFs lie on the boundary of extremal regions, ie. on isophotes. Instead of requiring the whole isophote to be stable with respect to intensity perturbation as in maximally stable extremal regions (MSERs), stability is required only locally, for the primitives constituting the three-point frames. The primitives are extracted by an affine invariant process that exploits properties of bitangents and algebraic moments. Thus, instead of using closed stable isophotes, ie. MSERs, and detecting affine frames on them, SAFs are sought even on some unstable extremal regions. We show experimentally on standard datasets that SAFs have repeatability comparable to the best affine covariant detectors and consistently produce a significantly higher number of features per image. Moreover, the features cover images more evenly than MSERs, which facilitates robustness to occlusion.

Wald's Sequential Analysis for Time-constrained Vision Problems

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    n detection and matching problems in computer vision, both classification errors and time to decision characterize the quality of an algorithmic solution. We show how to formalize such problems in the framework of sequential decision-making and derive quasi-optimal time-constrained solutions for three vision problems. The methodology is applied to face and interest point detection and to the RANSAC robust estimator. Error rates of the face detector proposed algorithm are comparable to the state-of-the-art methods. In the interest point application, the output of the Hessian-Laplace detector [Mikolajczyk-IJCV04] is approximated by a sequential WaldBoost classifier which is about five times faster than the original with comparable repeatability. A sequential strategy based on Wald's SPRT for evaluation of model quality in RANSAC leads to significant speed-up in geometric matching problems.

3D Geometry from Uncalibrated Images

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We present an automatic pipeline for recovering the geometry of a 3D scene from a set of unordered, uncalibrated images. The contributions in the paper are the presentation of the system as a whole, from images to geometry, the estimation of the local scale for various scene components in the orientation-topology module, the procedure for orienting the cloud components, and the method for dealing with points of contact. The methods are aimed to process complex scenes and nonuniformly sampled, noisy data sets.

Embedded system study for real time boosting based face detection

  • Autoři: Khattab, K., Miteran, J., Dubois, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of the 32nd Annual Conference of the IEEE Industrial Electronics Society. ???: IEEE Industrial Electronic Society, 2006. p. 3461-3465. ISSN 1553-572X. ISBN 1-4244-0136-4.
  • Rok: 2006
  • DOI: 10.1109/IECON.2006.347828
  • Odkaz: https://doi.org/10.1109/IECON.2006.347828
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This paper describes a study for a real time embedded face detection system. Recently, the boosting based face detection algorithms proposed by [1, 2] have gained a lot of attention and are considered as the fastest accurate face detection algorithms today. However, the embedded implementation of such algorithms into hardware is still a challenge, since these algorithms are heavily based on memory access. A sequential implementation model is built showing its lack of regularity in time consuming and speed of detection. We propose a parallel implementation that exploits the parallelism and the pipelining in these algorithms. This implementation proves capable of increasing the speed of the detector as well as bringing regularity in time consuming.

Epipolar Geometry from Two Correspondences

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Paper that stress Epipolar geometry from three correspondences to an extreme.

Geometric Hashing with Local Affine Frames

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a novel representation of local image structure and a matching scheme that are insensitive to a wide range of appearance changes. The representation is a collection of local affine frames that are constructed on outer boundaries of maximally stable extremal regions (MSERs) in an affine-covariant way. Each local affine frame is de- scribed by a relative location of other local affine frames in its neighborhood. The image is thus represented by quan- tities that depend only on the location of the boundaries of MSERs. Inter-image correspondences between local affine frames are formed in constant time by geometric hashing. Direct detection of local affine frames removes the require- ment of point-based hashing to establish reference frames in a combinatorial way, which has in the case of affine trans- form complexity that is cubic in the number of points. Local affine frames, which are also the quantities represented in the hash table, occupy a 6D space and hence data collisions

Learning Efficient Linear Predictors for Motion Estimation

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel object representation for tracking is proposed. The tracked object is represented as a constellation of spatially localised linear predictors which are learned on a single training image. In the learning stage, sets of pixels whose intensities allow for optimal least square predictions of the transformations are selected as a support of the linear predictor. The approach comprises three contributions: learning object specific linear predictors, explicitly dealing with the predictor precision - computational complexity trade-off and selecting a view-specific set of predictors suitable for global object motion estimate. Robustness to occlusion is achieved by RANSAC procedure. The learned tracker is very efficient, achieving frame rate generally higher than 30 frames per second despite the Matlab implementation.

Multiview 3D Tracking with an Incrementally Constructed 3D Model

  • DOI: 10.1109/3DPVT.2006.101
  • Odkaz: https://doi.org/10.1109/3DPVT.2006.101
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel object representation for tracking is proposed. The tracked object is represented as a constellation of spatially localised linear predictors which are learned on a single training image. In the learning stage, sets of pixels whose intensities allow for optimal least square predictions of the transformations are selected as a support of the linear predictor. The approach comprises three contributions: learning object specific linear predictors, explicitly dealing with the predictor precision - computational complexity trade-off and selecting a view-specific set of predictors suitable for global object motion estimate. Robustness to occlusion is achieved by RANSAC procedure. The learned tracker is very efficient, achieving frame rate generally higher than 30 frames per second despite the Matlab implementation.

Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The chapter focuses on a method exploiting local coordinate systems (local affine frames) established on aximally stable extremal regions. We provide a taxonomy of affine-covariant constructions of local coordinate systems, rove their affine covariance and present algorithmic details on their computation.

Počítačová podpora detekce "zajímavých" obrázků

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Manualně obtírealizovatelné prohled ávánívlekého množstvísnímků je možné usnadnit jejich počítačovým předzpracovánía setříděním podle "zajímavosti". "Zajímavost" snímků může operátor ovlivnit volbou příznaků a koeficientů hodnotícífunkce.

A Comparison of Affine Region Detectors

  • Autoři: Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., prof. Ing. Jiří Matas, Ph.D., Schaffalitzky, F., Kadir, T., Van Gool, L.
  • Publikace: International Journal of Computer Vision. 2005, 65(7), 43-72. ISSN 0920-5691.
  • Rok: 2005
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris (Mikolajczyk and Schmid, 2002; Schaffalitzky and Zisserman, 2002) and Hessian points (Mikolajczyk and Schmid, 2002), a detector of ''maximally stable extremal regions'', proposed by Matas et al. (2002); an edge-based region detector (Tuytelaars and Van Gool, 1999) and a detector based on intensity extrema (Tuytelaars and Van Gool, 2000), and a detector of ''salient regions'', proposed by Kadir, Zisserman and Brady (2004). The performance is measured against changes in viewpoint, scale, illumination, defocus and image compression

A New Class of Learnable Detectors for Categorisation

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A new class of image-level detectors that can be adapted by machine learning techniques to detect parts of objects from a given category is proposed. A classifier (e.g. neural network or adaboost) within the detector selects a relevant subset of extremal regions, i.e. regions that are connected components of a thresholded image. Properties of extremal regions render the detector very robust to illumination change.

Effective Use o Pattern recognition Method for Composition of Structure Microphotographs

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Detailed knowledge of material structure is necessary in advanced applications. Commercial systems use systematic scanning of small patches of structure cuts. Since an exact geometric and photometric relation of the scanned images is not known, composition of the overall image is non-trivial. Two independent methods exploiting image overlaps for precise registration are proposed and evaluated. The first method is based on robust matching of maximally stable extremal regions, the second one compares image columns. Both methods show comparable performance.

Feature-Based Affine-Invariant Localization of Faces

  • Autoři: Hamouz, M., Kittler, J., Kamarainen, J., Paalanen, P., Kalviainen, H., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005, 27(9), 1490-1495. ISSN 0162-8828.
  • Rok: 2005
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We present a novel method for localizing faces in person identification scenarios. Such scenarios involve high resolution images of frontal faces. The proposed algorithm does not require color, copes well in cluttered backgrounds, and accurately localizes faces including eye centers. An extensive analysis and a performance evaluation on the XM2VTS database and on the realistic BioID and BANCA face databases is presented. We show that the algorithm has precision superior to reference methods.

Hardware Implemplentation of a Discrete AdaBoost Based Decision Rule

  • Autoři: Mitéran, J., prof. Ing. Jiří Matas, Ph.D., Bourennane, E., Paindavoine, M., Dubois, J.
  • Publikace: Journal on Applied Signal Processing. 2005, 2005(7), 1035-1046. ISSN 1110-8657.
  • Rok: 2005
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a method and a tool for automatic generation of hardware implementation of a decision rule based on the Adaboost algorithm. We review the principles of the classification method and we evaluate its hardware implementation cost in terms of FPGA's slice, using different weak classifiers based on the general concept of hyperrectangle. The main novelty of our approach is that the tool allows the user to find automatically an appropriate tradeoff between classification performances and hardware implementation cost, and that the generated architecture is optimized for each training process. We present results obtained using Gaussian distributions and examples from UCI databases. Finally, we present an example of industrial application of real-time textured image segmentation.

Matching with PROSAC - Progressive Sample Consensus

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A new robust matching method is proposed. The Progressive Sample Consensus (PROSAC) algorithm exploits the linear ordering defined on the set of correspondences by a similarity function used in establishing tentative correspondences. Unlike RANSAC, which treats all correspondences equally and draws random samples uniformly from the full set, PROSAC samples are drawn from progressively larger sets of top-ranked correspondences. Under the mild assumption that the similarity measure predicts correctness of a match better than random guessing, we show that PROSAC achieves large computational savings. Experiments demonstrate it is often significantly faster (up to more than hundred times) than RANSAC. For the derived size of the sampled set of correspondences as a function of the number of samples already drawn, PROSAC converges towards RANSAC in the worst case. The power of the method is demonstrated on widebaseline matching problems.

On the Stability of Local Affine Frames for the Correspondence Problem

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper presents an overview and a classification of affine-covariant constructions of local coordinate systems (frames), prove the affine covariance of the constructions, and give details on their computation. Then a technique to avoid generating unnecessarily abundant amount of frames is proposed, which identify frames with highest probability of being also generated (repeated) in other images. Ordering of frames by expected repeatability provides a simple, single-parametric way of controlling the amount of generated frames.

Optimal Randomised RANSAC

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A randomized model verification strategy for RANSAC is presented. The proposed method finds, like RANSAC, a solution that is optimal with user-controllable probability. A provably optimal model verification strategy is designed for the situation when the contamination of data by outliers is known, ie the algorithm is the fastest possible (on average) of all randomized RANSAC algorithms guaranteeing given confidence in the solution. The derivation of the optimality property is based on Wald's theory of sequential decision making. The RRANSAC with SPRT, which does not require the a priori knowledge of the fraction of outliers and has results close to the optimal strategy, is introduced. We show experimentally that on standard test data the method is 2 to 10 times faster than the standard RANSAC and up to 4 times faster than previously published methods.

Randomized RANSAC with Sequential Probability Ratio Test

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A randomized model verification strategy for RANSAC is presented. The proposed method finds, like RANSAC, a solution that is optimal with user-controllable probability. A provably optimal model verification strategy is designed for the situation when the contamination of data by outliers is known, i.e. the algorithm is the fastest possible (on average) of all randomized RANSAC algorithms guaranteeing confidence in the solution. The derivation of the optimality property is based on Wald.s theory of sequential decision making. The R-RANSAC with SPRT, which does not require the a priori knowledge of the fraction of outliers and has results close to the optimal strategy, is introduced. We show experimentally that on standard test data the method is 2 to 10 times faster than the standard RANSAC and up to 4 times faster than previously published methods.

Sub-linear Indexing for Large Scale Object Recognition

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Realistic approaches to large scale object recognition, i.e. for detection and localisation of hundreds or more objects, must support sub-linear time indexing. In the paper, we propose a method capable of recognising one of N objects in log(N) time. The .visual memory. is organised as a binary decision tree that is built to minimise average time to decision. Leaves of the tree represent a few local image areas, and each non-terminal node is associated with a .weak classifier.. In the recognition phase, a single invariant measurement decides in which subtree a corresponding image area is sought. The method preserves all the strengths of local affine region methods . robustness to background clutter, occlusion, and large changes of viewpoints. Experimentally we show that it supports near real-time recognition of hundreds of objects with state-of-the-art recognition rates. After the test image is processed (in a second on a current PCs), the recognition via indexing into the visual memory

Two-view Geometry Estimation Unaffected by a Dominant Plane

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A RANSAC-based algorithm for robust estimation of epipolar geometry from point correspondences in the possible presence of a dominant scene plane is presented. The algorithm handles scenes with (i) all points in a single plane, (ii) majority of points in a single plane and the rest off the plane, (iii) no dominant plane. It is not required to know a priori which of the cases (i) - (iii) occurs. The algorithm exploits a theorem we proved, that if five or more of seven correspondences are related by a homography then there is an epipolar geometry consistent with the seven-tuple as well as with all correspondences related by the homography. This means that a seven point sample consisting of two outliers and five inliers lying in a dominant plane produces an epipolar geometry which is completely wrong and yet consistent with a high number of correspondences. The theorem explains why RANSAC often fails to estimate epipolar geometry in the presence of a dominant plane. Rather surprisingly, t

Unconstrained Licence Plate Detection

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Licence plates and traffic signs detection and recognition have a number of different applications relevant for transportation systems, such as traffic monitoring, detection of stolen vehicles, driver navigation support or any statistical research. A number of methods have been proposed, but only for particular cases and working under constraints (e.g. known text direction or high resolution). Therefore a new class of locally threshold separable detectors based on extremal regions, which can be adapted by machine learning techniques to arbitrary shapes, is proposed. In the test set of licence plate images taken from different viewpoints <-45dg.,45dg.>, scales (from seven to hundreds of pixels height) even in bad illumination conditions and partial occlusions, the high detection accuracy is achieved (95%). Finally we present the detector generic abilities by traffic signs detection. The standard classifier (neural network) within the detector selects a relevant subset of extremal region

WaldBoost - Learning for Time Constrained Sequential Detection

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In many computer vision classification problems, both the error and time characterizes the quality of a decision. We show that such problems can be formalized in the framework of sequential decision-making. If the false positive and false negative error rates are given, the optimal strategy in terms of the shortest average time to decision (number of measurements used) is the Wald's sequential probability ratio test (SPRT). We built on the optimal SPRT test and enlarge its capabilities to problems with dependent measurements. We show, how the limitations of SPRT to a priori ordered measurements and known joint probability density functions can be overcome. We propose an algorithm with near optimal time - error rate trade-off, called WaldBoost, which integrates the AdaBoost algorithm for measurement selection and ordering and the joint probability density estimation with the optimal SPRT decision strategy. The WaldBoost algorithm is tested on the face detection problem. The results are

AdaBoost with Totally Corrective Updates for Fast Face Detection

  • Autoři: Mgr. Jan Šochman, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: FGR '04: Proceeding of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition. Los Alamitos: IEEE Computer Society Press, 2004. pp. 445-450. ISBN 0-7695-2122-3.
  • Rok: 2004

Automatic FPGA based implementation of a classification tree

  • Autoři: Mitéran, J., prof. Ing. Jiří Matas, Ph.D., Dubois, J., Bourennane, E.
  • Publikace: IEEE SCS: Proceedings of the 1st International Conference on "Signaux, Circuits et Systemes". Los Alamitos: IEEE Computer Society Press, 2004, pp. 188-192.
  • Rok: 2004

Boosting : From data to hardware using automatic implementation tool

  • Autoři: Mitéran, J., prof. Ing. Jiří Matas, Ph.D., Dubois, J., Bourennane, E.
  • Publikace: XIIth European Signal Processing Conference EUSIPCO - 2004. Vienna: TU Vienna, 2004, pp. 1721-1727. ISBN 3-200-00165-8.
  • Rok: 2004

Enhancing RANSAC by Generalized Model Optimization

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An extension of the RANSAC procedure is proposed. By adding a generalized model optimization step (the LO step) applied only to models with a score (quality) better than all previous ones, an algorithm with the following desirable properties is obtained: a near perfect agreement with theoretical (i.e. optimal) performance and lower sensitivity to noise and poor conditioning. The chosen scheduling strategy is shown to guarantee that the optimization step is applied so rarely that it has minimal impact on the execution time.

Epipolar Geometry Estimation via RANSAC Benefits from the Oriented Epipolar Constraint

Geometric and photometric image stabilization for detection of significant events in video from a low flying Unmanned Aerial Vehicles

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    On-line video sequences acquired by cameras on board of a small surveillance plane are very unstable. As a first step facilitating visual interpretation, a dynamic adaptation of brightness and contrast have been designed and implemented. Secondly, stabilisation of camera movement is achieved. After stabilisation, moving object are identified. Finally, objects of interest, whose models are automatically built from example images, are recognised and localised.

Inter-stage Feature Propagation in Cascade Building with AdaBoost

Object recognition methods based on transformation covariant features

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Methods based on distinguished regions (transformation covariant detectable patches) have achieved considerable success in a range of object recognition, retrieval and matching problems, in still images and videos. We review the state-of-the-art, describe relationship to other recognition methods, analyse their strengths and weaknesses, and present examples of successful applications.

Randomized RANSAC with T_d,d test

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Many computer vision algorithms include a robust estimation step where model parameters are computed from a data set containing a significant proportion of outliers. The RANSAC algorithm is possibly the most widely used robust estimator in the field of computer vision. In the paper we show that under a broad range of conditions, RANSAC efficiency is significantly improved if its hypothesis evaluation step is randomized.

Robust wide-baseline stereo from maximally stable extremal regions

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The wide-baseline stereo problem, i.e. the problem of establishing correspondences between a pair of images taken from different viewpoints is studied.

Towards Complete Free-Form Reconstruction of Complex 3D cenes from an Unordered Set of Uncalibrated Images

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper describes a method for accurate dense reconstruction of a complex scene from a small set of high-resolution unorganized still images taken by a hand-held digital camera. A fully automatic data processing pipeline is proposed. Highly discriminative features are first detected in all images. Correspondences are then found in all image pairs by wide-baseline stereo matching and used in a scene structure and camera reconstruction step that can cope with occlusion and outliers. Image pairs suitable for dense matching are automatically selected, rectified and used in dense binocular matching. The dense point cloud obtained as the union of all pairwise reconstructions is fused by local roximation using oriented geometric primitives. For texturing, every primitive is mapped on the image with the best resolution.

Effective Use o Pattern recognition Method for Composition of Structure Microphotographs

  • Autoři: prof. Ing. Jiří Matas, Ph.D., Košek, M.
  • Publikace: STRUTEX 2003 - 10th International Conference on tructure and Structural Mechanics of Textile Fabric. Liberec: Technická univerzita, 2003, pp. 99-103. ISBN 80-7083-769-1.
  • Rok: 2003
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A wide-baseline matching algorithm is applied to the problem of registration of microphotographs of textiles.

Ellipse detection using efficient grouping of arc segmetns

  • Autoři: Zillich, M., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: OAGM 2003: Vision in a Dynamic World: Proceedings of the 27th Workshop of the Austrian Association for Pattern Recognition. Wien: Österreichische Computer Gesellschaft, 2003, pp. 143-148. ISBN 3-85403-168-8.
  • Rok: 2003
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A fast algorithm for ellipse detection is presented. The algorithm makes use of an effient grouping of are segments based on tanget intersections.

Epipolar Geometry from Three Correspondences

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In this paper, LO-RANSAC 3-LAF, a new algorithm for the correspondence problem is described. Exploiting processes proposed for computation of affineinvariant local frames, three point-to-point correspondences are found for each region-to-region correspondence. Consequently, it is sufficient to select only triplets of region correspondences in the hypothesis stage of epipolar geometry estimation by RANSAC.

Face Detection from Discriminative Regions

  • Autoři: Bílek, P., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of Workshop 2003. Praha: České vysoké učení technické v Praze, 2003, pp. 282-283. ISBN 80-01-02708-2.
  • Rok: 2003

Image Retrieval Using Local Compact DCT-Based Representation

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An image retrieval system based on local affine frames is introduced. The system provides highly discriminative retrieval of rigid objects under a very wide range of viewing and illumination conditions, and is robust to occlusion and background clutter. Distinguished regions of data dependent shape are detected, and local affine frames (coordinate systems) are obtained. Photometrically and geometrically normalised image patches are extracted and used for matching. Local correspondences are formed either by direct comparison of photometrically normalised colour intensities in the normalised patches, or by comparison of DCT (discrete cosine transform) coefficients of the patches. Experimental results are presented on a publicly available database of real outdoor images of buildings. We demonstrate the effect of the number of DCT coefficients that are used for the matching. Using the DCT, retrieval performance of 100% in rank 1 is achieved, and memory usage is reduced by a factor of 4.

Locally optimized RANSAC

Obecné systémy rozpoznávání objektů ve snímcích a videosekvencích

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Článek popisuje obecnou metodu rozpoznávání objektů v obrázcích. Výhodou metody je nízká míra apriorních předpokladů o charakteru objektu. Proto se tato metoda hodí pro roznávání libovolných objektů

On the Interaction between Object Recognition and Colour Constancy

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In this paper we investigate some aspects of the interaction between colour constancy and object recognition. We demonstrate that even under severe changes of illumination, many objects are reliably recognised if relying only on geometry and on invariant representation of local colour appearance. We feel that colour constancy as a prePROCESSING step of an object recognition algorithm is important only in cases when colour is major (or the only available) clue for object discrimination. We also show that successful object recognition allows for "colour constancy by recognition" - an approach where the global photometric transformation is estimated from locally corresponding image patches.

Colour-Based Object Recognition for Video Annotation

  • Autoři: Koubaroulis, D., prof. Ing. Jiří Matas, Ph.D., Kittler, J.
  • Publikace: ICPR 02: Proceedings 16th International Conference on Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2002. pp. 1069-1072. ISBN 0-7695-1695-X.
  • Rok: 2002

Discontinuity detection on industrial parts : real-time image segmentation using Parzen's Kernel

  • Autoři: Mitéran, J., Kohler, S., Geveaux, P., Gorria, P., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Materials Evaluation. 2002, 60(3), 430-436. ISSN 0025-5327.
  • Rok: 2002

Discriminative Regions fro Human Face Detection

  • Autoři: prof. Ing. Jiří Matas, Ph.D., Bílek, P., Hamouz, M., Kittler, J.
  • Publikace: 5th Asian Conference on Computer Vision. Victoria: Asian Federetion of Computer Vision Societies, 2002, pp. 604-609. ISBN 0-9580256-0-6.
  • Rok: 2002

Evaluating Colour-Based Object Recognition Algorithms on the SOIL-47 Database

  • Autoři: Koubaroulis, D., prof. Ing. Jiří Matas, Ph.D., Kittler, J.
  • Publikace: 5th Asian Conference on Computer Vision. Victoria: Asian Federetion of Computer Vision Societies, 2002, pp. 840-845. ISBN 0-9580256-0-6.
  • Rok: 2002

Face Detection by Learned Affine Correspondences

  • Autoři: Hamouz, M., Kittler, J., prof. Ing. Jiří Matas, Ph.D., Bílek, P.
  • Publikace: Proceedings of Joint IAPR International Workshops SSPR02 and SPR02. Berlin: Springer, 2002. pp. 566-575. ISBN 3-540-44011-9.
  • Rok: 2002

Homogeneous Nucleation Rates in Supersaturated Vapor of N-Propanol: Raw Results

  • Autoři: Zdimal, V., Brus, D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 6th International Aerosol Conference. Taipei: Chinese Association for Aerosol Research in Taiwan (CAAR), 2002, pp. 1-2.
  • Rok: 2002

Homogeneous Nucleation Rates of N-Propanol in Static Diffusion Chamber: First Results

  • Autoři: Zdimal, V., Brus, D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Czech-Finnish Aerosol Symposium. Helsinki: Aerosolitutkimusseura ry, 2002, pp. 175-180. ISBN 952-5027-34-1.
  • Rok: 2002

Learning Parameters of a Recognition System Based on Local Affine Frames

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An approach to object recognition, based on matching of local image features, is presented. First, distinguished regions of data-dependent shape are robustly detected. On these regions, local affine frames are established using several affine invariant constructions. Direct comparison of photometrically normalised colour intensities in local, geometrically aligned frames results in a matching scheme that is invariant to piecewise-affine image deformations, but still remains very discriminative. Nevertheless, invariance to a wide range of local geometric and photometric transformations reduces the discriminative power - not all possible transformations are equiprobable. Probability of the transformations is estimated from matches established by the invariant method on the training data. The estimate is exploited in the recognition phase to favour local correspondences with more likely transformations.

Local Affine Frames for Image Retrieval

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel approach to content-based image retrieval is presented. The method supports recognition of objects under a very wide range of viewing and illumination conditions and is robust to occlusion and background clutter. Starting from robustly detected 'distinguished regions' of data dependent shape, local affine frames are established by affine-invariant constructions exploiting invariant properties of the second moment matrix and bi-tangent points. Direct comparison of photometrically normalised colour intensities in normalised frames facilitates robust, affine and illumination invariant, but still very selective matching. The potential of the proposed approach is experimentally verified on FOCUS - a publicly available image database - using a standard set of query images. The results obtained are superior to the state of the art. The method operates successfully on images with complex background, where the sought object covers only a fraction (around 2%) of the database image.

Local Affine Frames for Wide-Baseline Stereo

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel procedure for establishing wide-baseline correspondence is introduced. Tentative correspondences are established by matching photometrically normalised colour measurements represented in a local affine frame. The affine frames are obtained by a number of affine invariant constructions on robustly detected maximally stable extremal regions of data-dependent shape. Several processes for local affine frame construction are proposed and proved affine covariant. The potential of the proposed approach is demonstrated on demanding wide-baseline matching problems. Correspondence between two views taken from different viewpoints and camera orientations as well as at very different scales is reliably established. For the scale change present (a factor more than 3), the zoomed-in image covers less than 10% of the wider view.

Object Recognition using Local Affine Frames on Distinguished Regions

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel approach to appearance based object recognition is introduced. The proposed method, based on matching of local image features, reliably recognises objects under very different viewing conditions. First, distinguished regions of data-dependent shape are robustly detected. On these regions, local affine frames are established using several affine invariant constructions. Direct comparison of photometrically normalised colour intensities in local, geometrically aligned frames results in a matching scheme that is invariant to piecewise-affine image deformations, but still remains very discriminative. The potential of the approach is experimentally verified on public databases. On SOIL-47, 100% recognition rate is achieved for single training view per object. On COIL-100, 99.9% recognition rate is obtained for 18 training views per object. Robustness to occlusions is demonstrated by only a moderate decrease of performance in an experiment where half of each test image is erased.

Periodic Textures as Distinguished Regions for Wide-Baseline Stereo Correspondance

  • Autoři: Chetverikov, D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: TEXTURE 2002 The 2nd International Workshop on Texure Analysis and Synthesis. Edinburgh: Heriot - Watt University, 2002, pp. 25-30. ISBN 0-901658-99-5.
  • Rok: 2002

Randomized RANSAC

Randomized RANSAC with Td,d test

Robust Wide baseline Stereo from Maximally Stable Extremal Regions

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Clanek ocenen jako "Best Scientific Paper" konference BMVC 02

Rotational Invariants for Wide-baseline Stereo

The Multimodal Neighborhood Signature for Modeling Object Color Appearance and Applications in Object Recognition and Image Retrieval

  • Autoři: prof. Ing. Jiří Matas, Ph.D., Koubaroulis, D., Kittler, J.
  • Publikace: Computer Vision and Image Understanding. 2002, 88(1), 1-23. ISSN 1077-3142.
  • Rok: 2002

Using Periodic Texture as a tool for Wide-Baseline Stereo

  • Autoři: Chetverikov, D., Megyesi, Z., Janko, Z., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Vision with Non-Traditional Sensors, 26th Workshop of the Austiran Association for Pattern Recognition (OAGm-AAPR). Wien: Österreichische Computer Gesellschaft, 2002, pp. 37-44. ISBN 3-85403-160-0.
  • Rok: 2002

Affine Invariant Object Recognition without a 3D Model

  • Autoři: Buriánek, J., prof. Ing. Jiří Matas, Ph.D., Kittler, J.
  • Publikace: Proceedings of Workshop 2001. Praha: České vysoké učení technické v Praze, 2001, pp. 194-195. ISBN 80-01-02335-4.
  • Rok: 2001

Empirical Evaluation of a Calibration Chart Detector

  • Autoři: Soh, A., Kittler, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Machine Vision and Applications. 2001, 12(6), 305-325. ISSN 0932-8092.
  • Rok: 2001

Gradient Based Progressive Probabilistic Hough Transform

  • Autoři: Galambos, Ch., Kittler, J., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: IEE Proceedings - Vision Image and Signal Processing. 2001, 148(15), 158-165. ISSN 1350-245X.
  • Rok: 2001

Unifying View for Wide-Baseline Stereo

  • Autoři: prof. Ing. Jiří Matas, Ph.D., Urban, M., Pajdla, T.
  • Publikace: Proceedings of Computer Vision Winter Workshop. Ljubljana: Slovenian Pattern Recognition Society, 2001, pp. 214-222. ISBN 961-90901-0-1.
  • Rok: 2001

Colour Image Retrieval and Object Recognition Using the Multimodal Neighbourhood Signature

  • Autoři: prof. Ing. Jiří Matas, Ph.D., Koubaroulis, D., Kittler, J.
  • Publikace: Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2000. pp. 48-64. ISBN 3-540-67685-6.
  • Rok: 2000

Colour-based Image Retrieval from Video Sequences

  • Autoři: Koubaroulis, D., prof. Ing. Jiří Matas, Ph.D., Kittler, J.
  • Publikace: Proceedings of the Czech Pattern Recognition Workshop. Prague: Czech Pattern Recognition Society, 2000, pp. 1-12. ISBN 80-238-5215-9.
  • Rok: 2000

Comparison of Face Verification Results on the XM2VTS Database

  • Autoři: prof. Ing. Jiří Matas, Ph.D., Hamouz, M., Jonsson, K., Kittler, J., Li, Y.P., Kotroupolous, C., Tefas, A., Pitas, I., Tan, T., Yan, H., Smeraldi, F., Bigun, J., Capdevielle, N., Gerstner, W., Ben-Yacoub, S., Abduljaoued, Y., Majoraz, E.
  • Publikace: Proceedings of the 15th IAPR Int. Conf. on Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2000. pp. 858-863. ISBN 0-7695-0750-6.
  • Rok: 2000

Illumination Invariant Object Recognition Using the MNS Method

  • Autoři: Koubaroulis, D., prof. Ing. Jiří Matas, Ph.D., Kittler, J.
  • Publikace: Proceedings of the 10th European Signal Processing Conference. Tampere: Tampere University of Technology, 2000. pp. 2173-2176. ISBN 952-15-0443-9.
  • Rok: 2000

Improvement of the Homogeneous Nucleation Rate Measurements in a Static Diffusion Chamber with Use of a CCD Camera

  • Autoři: Zdimal, V., Smolik, J., Hopke, P.K., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Nucleation and Atmospheric Aerosols 2000: 15th Int.'l Conf. Woodbury: American Institute of Physics, 2000, pp. 311-314. ISBN 1-56396-958-0.
  • Rok: 2000

Learning Support Vectors for Face Verification and Recognition Biometrics and Benchmarking

  • Autoři: Jonsson, K., prof. Ing. Jiří Matas, Ph.D., Li, Y.P., Kittler, J.
  • Publikace: Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000. Piscataway: IEEE, 2000. pp. 208-213. ISBN 0-7695-0580-5.
  • Rok: 2000

Object Recognition using the Invariant Pixel-Set Signature

  • Autoři: prof. Ing. Jiří Matas, Ph.D., Buriánek, J., Kittler, J.
  • Publikace: Proceedings of British Machine Vision Conference BMVC2000. London: British Machine Vision Association, 2000, pp. 606-615. ISBN 1-901725-13-8.
  • Rok: 2000

Performance Evaluation of the Multi-modal Neighbourhood Signature Method for Colour Object Recognition

  • Autoři: prof. Ing. Jiří Matas, Ph.D., Koubaroulis, D., Kittler, J.
  • Publikace: Proceedings of the Czech Pattern Recognition Workshop. Prague: Czech Pattern Recognition Society, 2000, pp. 27-34. ISBN 80-238-5215-9.
  • Rok: 2000

Robust Detection of Lines Using Progressive Probabilistic Hough Transform

  • Autoři: prof. Ing. Jiří Matas, Ph.D., Galambos, Ch., Kittler, J.
  • Publikace: Computer Vision and Image Understanding. 2000, 78(1), 119-137. ISSN 1077-3142.
  • Rok: 2000

The Multimodal Signature Method: An Efficiency and Sensitivity Study

  • Autoři: Koubaroulis, D., prof. Ing. Jiří Matas, Ph.D., Kittler, J.
  • Publikace: Proceedings of the 15th IAPR Int. Conf. on Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2000, pp. 379-382. ISBN 0-7695-0750-6.
  • Rok: 2000

Using Gradient Information to Enhance the Progressive Probabilistic Hough Transform

  • Autoři: Galambos, Ch., prof. Ing. Jiří Matas, Ph.D., Kittler, J.
  • Publikace: Proceedings of the 15th IAPR Int. Conf. on Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2000. pp. 564-567. ISBN 0-7695-0750-6.
  • Rok: 2000

Learning Salient Features for Real-time Face Verification

  • Autoři: Jonsson, K., prof. Ing. Jiří Matas, Ph.D., Kittler, J.
  • Publikace: Second International Conference on Audio and Video-based Biometric Person Authentication. Washington: University of Maryland, 1999, pp. 60-66.
  • Rok: 1999

Support Vector Machines for Face Authentication

  • Autoři: Jonsson, K., Kittler, J., Li, Y.P., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of British Machine Vision Conference BMVC99. London: British Machine Vision Association, 1999, pp. 543-552. ISBN 1-901725-04-9.
  • Rok: 1999

Za stránku zodpovídá: Ing. Mgr. Radovan Suk