Lidé

Mgr. Dmytro Mishkin, Ph.D.

Všechny publikace

HarrisZ+: Harris corner selection for next-gen image matching pipelines

  • Autoři: Bellavia, F., Mgr. Dmytro Mishkin, Ph.D.,
  • Publikace: Pattern Recognition Letters. 2022, 2022(158) 141-147. ISSN 0167-8655.
  • Rok: 2022
  • DOI: 10.1016/j.patrec.2022.04.022
  • Odkaz: https://doi.org/10.1016/j.patrec.2022.04.022
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Due to its role in many computer vision tasks, image matching has been subjected to an active investigation by researchers, which has lead to better and more discriminant feature descriptors and to more robust matching strategies, also thanks to the advent of the deep learning and the increased computational power of the modern hardware. Despite of these achievements, the keypoint extraction process at the base of the image matching pipeline has not seen equivalent progresses. This paper presents HarrisZ+, an upgrade to the HarrisZ corner detector, optimized to synergically take advance of the recent improvements of the other steps of the image matching pipeline. HarrisZ+ does not only consists of a tuning of the setup parameters, but introduces further refinements to the selection criteria delineated by HarrisZ, so providing more, yet discriminative, keypoints, which are better distributed on the image and with higher localization accuracy. The image matching pipeline including HarrisZ+, together with the other modern components, obtained in different recent matching benchmarks state-of-the-art results among the classic image matching pipelines. These results are quite close to those obtained by the more recent fully deep end-to-end trainable approaches and show that there is still a proper margin of improvement that can be granted by the research in classic image matching methods.

Efficient Initial Pose-Graph Generation for Global SfM

  • Autoři: Baráth, D., Mgr. Dmytro Mishkin, Ph.D., Eichhardt, I., Shipachev, I., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2021. p. 14541-14550. ISSN 2575-7075. ISBN 978-1-6654-4509-2.
  • Rok: 2021
  • DOI: 10.1109/CVPR46437.2021.01431
  • Odkaz: https://doi.org/10.1109/CVPR46437.2021.01431
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We propose ways to speed up the initial pose-graph generation for global Structure-from-Motion algorithms. To avoid forming tentative point correspondences by FLANN and geometric verification by RANSAC, which are the most time-consuming steps of the pose-graph creation, we propose two new methods -- built on the fact that image pairs usually are matched consecutively. Thus, candidate relative poses can be recovered from paths in the partly-built pose-graph. We propose a heuristic for the A* traversal, considering global similarity of images and the quality of the pose-graph edges. Given a relative pose from a path, descriptor-based feature matching is made "light-weight" by exploiting the known epipolar geometry. To speed up PROSAC-based sampling when RANSAC is applied, we propose a third method to order the correspondences by their inlier probabilities from previous estimations. The algorithms are tested on 402130 image pairs from the 1DSfM dataset and they speed up the feature matching 17 times and pose estimation 5 times. The source code will be made public.

Image Matching across Wide Baselines: From Paper to Practice

  • DOI: 10.1007/s11263-020-01385-0
  • Odkaz: https://doi.org/10.1007/s11263-020-01385-0
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We introduce a comprehensive benchmark for local features and robust estimation algorithms, focusing on the downstream task -- the accuracy of the reconstructed camera pose -- as our primary metric. Our pipeline's modular structure allows easy integration, configuration, and combination of different methods and heuristics. This is demonstrated by embedding dozens of popular algorithms and evaluating them, from seminal works to the cutting edge of machine learning research. We show that with proper settings, classical solutions may still outperform the perceived state of the art. Besides establishing the actual state of the art, the conducted experiments reveal unexpected properties of Structure from Motion (SfM) pipelines that can help improve their performance, for both algorithmic and learned methods. Data and code are online https://github.com/team-yi-ubc/image-matching-benchmark providing an easy-to-use and flexible framework for the benchmarking of local features and robust estimation methods, both alongside and against top-performing methods. This work provides a basis for the Image Matching Challenge https://vision.uvic.ca/image-matching-challenge/.

Kornia: an Open Source Differentiable Computer Vision Library for PyTorch

  • Autoři: Riba, E., Mgr. Dmytro Mishkin, Ph.D., Ponsa, D., Rublee, E., Bradski, G.
  • Publikace: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). New Jersey: IEEE, 2020. p. 3663-3672. ISSN 2642-9381. ISBN 978-1-7281-6553-0.
  • Rok: 2020
  • DOI: 10.1109/WACV45572.2020.9093363
  • Odkaz: https://doi.org/10.1109/WACV45572.2020.9093363
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This work presents Kornia -- an open source computer vision library which consists of a set of differentiable routines and modules to solve generic computer vision problems. At its core, the package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions. Inspired by OpenCV, Kornia is composed of a set of modules containing operators that can be inserted inside neural networks to train models to perform image transformations, camera calibration, epipolar geometry, and low level image processing techniques such as filtering and edge detection that operate directly on high dimensional tensor representations. Examples of classical vision problems implemented using our framework are also provided including a benchmark comparing to existing vision libraries.

Saddle: Fast and repeatable features with good coverage

  • DOI: 10.1016/j.imavis.2019.08.011
  • Odkaz: https://doi.org/10.1016/j.imavis.2019.08.011
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A novel similarity-covariant feature detector that extracts points whose neighborhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile is presented. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Saddle is a fast approximation of Hessian detector as ORB, that implements the FAST detector, is for Harris detector. We propose to use the matching strategy called the first geometric inconsistent with binary descriptors that is suitable for our feature detector, including experiments with fix point descriptors hand-crafted and learned. Experiments show that the Saddle features are general, evenly spread and appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets. Compared to recently proposed deep-learning based interest point detectors and popular hand-crafted keypoint detectors, evaluated for repeatability in the ApolloScape dataset [1], the Saddle detectors shows the best performance in most of the street-level view sequences a.k.a. traversals.

Leveraging Outdoor Webcams for Local Descriptor Learning

  • DOI: 10.3217/978-3-85125-652-9-06
  • Odkaz: https://doi.org/10.3217/978-3-85125-652-9-06
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present AMOS Patches, a large set of image cut-outs, intended primarily for the robustification of trainable local feature descriptors to illumination and appearance changes. Images contributing to AMOS Patches originate from the AMOS dataset of recordings from a large set of outdoor webcams. The semiautomatic method used to generate AMOS Patches is described. It includes camera selection, viewpoint clustering and patch selection. For training, we provide both the registered full source images as well as the patches. A new descriptor, trained on the AMOS Patches and 6Brown datasets, is introduced. It achieves state-of-the-art in matching under illumination changes onstandard benchmarks.

DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

  • Autoři: Kupyn, O., Budzan, V., Mykhailych, M., Mgr. Dmytro Mishkin, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2018: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018. p. 8183-8192. ISSN 2575-7075. ISBN 978-1-5386-6420-9.
  • Rok: 2018
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present DeblurGAN, an end-to-end learned method for motion deblurring. The learning is based on a conditional GAN and the content loss. DeblurGAN achieves state-of-the art performance both in the structural similarity measure and visual appearance. The quality of the deblurring model is also evaluated in a novel way on a real-world problem - object detection on (de-)blurred images. The method is 5 times faster than the closest competitor - DeepDeblur. We also introduce a novel method for generating synthetic motion blurred images from sharp ones, allowing realistic dataset augmentation. The model, code and the dataset are available https://github.com/KupynOrest/DeblurGAN

Repeatability Is Not Enough: Learning Affine Regions via Discriminability

  • Autoři: Mgr. Dmytro Mishkin, Ph.D., Radenović, F., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ECCV2018: Proceedings of the European Conference on Computer Vision, Part IX. Springer, Cham, 2018. p. 287-304. Lecture Notes in Computer Vision. vol. 11213. ISSN 0302-9743. ISBN 978-3-030-01239-7.
  • Rok: 2018
  • DOI: 10.1007/978-3-030-01240-3_18
  • Odkaz: https://doi.org/10.1007/978-3-030-01240-3_18
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A method for learning local affine-covariant regions is presented. We show that maximizing geometric repeatability does not lead to local regions, a.k.a features, that are reliably matched and this necessitates descriptor-based learning. We explore factors that influence such learning and registration: the loss function, descriptor type, geometric parametrization and the trade-off between matchability and geometric accuracy and propose a novel hard negative-constant loss function for learning of affine regions. The affine shape estimator – AffNet – trained with the hard negative-constant loss outperforms the state-of-the-art in bag-of-words image retrieval and wide baseline stereo. The proposed training process does not require precisely geometrically aligned patches. The source codes and trained weights are available at https://github.com/ducha-aiki/affnet

In the Saddle: Chasing fast and repeatable features

  • DOI: 10.1109/ICPR.2016.7899712
  • Odkaz: https://doi.org/10.1109/ICPR.2016.7899712
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A novel similarity-covariant feature detector that extracts points whose neighborhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Experiments show that the Saddle features are general, evenly spread and appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets.

Systematic Evaluation of Convolution Neural Network Advances on the ImageNet

  • DOI: 10.1016/j.cviu.2017.05.007
  • Odkaz: https://doi.org/10.1016/j.cviu.2017.05.007
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    The paper systematically studies the impact of a range of recent advances in convolution neural network (CNN) architectures and learning methods on the object categorization (ILSVRC) problem. The evaluation tests the influence of the following choices of the architecture: non-linearity (ReLU, ELU, maxout, compatability with batch normalization), pooling variants (stochastic, max, average, mixed), network width, classifier design (convolutional, fully-connected, SPP), image pre-processing, and of learning parameters: learning rate, batch size, cleanliness of the data, etc. The performance gains of the proposed modifications are first tested individually and then in combination. The sum of individual gains is greater than the observed improvement when all modifications are introduced, but the “deficit” is small suggesting independence of their benefits. We show that the use of 128 × 128 pixel images is sufficient to make qualitative conclusions about optimal network structure that hold for the full size Caffe and VGG nets. The results are obtained an order of magnitude faster than with the standard 224 pixel images.

Working hard to know your neighbor's margins: Local descriptor learning loss

  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We introduce a loss for metric learning, which is inspired by the Lowe's matching criterion for SIFT. We show that the proposed loss, that maximizes the distance between the closest positive and closest negative example in the batch, is better than complex regularization methods; it works well for both shallow and deep convolution network architectures. Applying the novel loss to the L2Net CNN architecture results in a compact descriptor named HardNet. It has the same dimensionality as SIFT (128) and shows state-of-art performance in wide baseline stereo, patch verification and instance retrieval benchmarks.

All you need is a good init

  • Pracoviště: Katedra kybernetiky, Skupina vizuálního rozpoznávání
  • Anotace:
    Layer-sequential unit-variance (LSUV) initialization - a simple method for weight initialization for deep net learning - is proposed. The method consists of the two steps. First, pre-initialize weights of each convolution or inner-product layer with orthonormal matrices. Second, proceed from the first to the final layer, normalizing the variance of the output of each layer to be equal to one. Experiment with different activation functions (maxout, ReLU-family, tanh) show that the proposed initialization leads to learning of very deep nets that (i) produces networks with test accuracy better or equal to standard methods and (ii) is at least as fast as the complex schemes proposed specifically for very deep nets such as FitNets (Romero et al. 2015)) and Highway (Srivastava et al. (2015)). Performance is evaluated on GoogLeNet, CaffeNet, FitNets and Residual nets and the state-of-the-art, or very close to it, is achieved on the MNIST, CIFAR-10/100 and ImageNet datasets.

Very Deep Residual Networks with MaxOut for Plant Identification in the Wild

  • Autoři: Šulc, M., Mgr. Dmytro Mishkin, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum. Aachen: CEUR Workshop Proceedings, 2016. pp. 579-586. CEUR Workshop Proceedings. vol. 1609. ISSN 1613-0073.
  • Rok: 2016
  • Pracoviště: Katedra kybernetiky, Skupina vizuálního rozpoznávání
  • Anotace:
    The paper presents our deep learning approach to automatic recognition of plant species from photos. We utilized a very deep 152-layer residual network model pre-trained on ImageNet, replaced the original fully connected layer with two randomly initialized fully connected layers connected with maxout, and fine-tuned the network on the PlantCLEF 2016 training data. Bagging of 3 networks was used to further improve accuracy. With the proposed approach we scored among the top 3 teams in the PlantCLEF 2016 plant identification challenge.

MODS: Fast and robust method for two-view matching

  • DOI: 10.1016/j.cviu.2015.08.005
  • Odkaz: https://doi.org/10.1016/j.cviu.2015.08.005
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Abstract A novel algorithm for wide-baseline matching called MODS - matching on demand with view synthesis - is presented. The MODS algorithm is experimentally shown to solve a broader range of wide-baseline problems than the state of the art while being nearly as fast as standard matchers on simple problems. The apparent robustness vs. speed trade-off is finessed by the use of progressively more time-consuming feature detectors and by on-demand generation of synthesized images that is performed until a reliable estimate of geometry is obtained. We introduce an improved method for tentative correspondence selection, applicable both with and without view synthesis. A modification of the standard first to second nearest distance rule increases the number of correct matches by 5-20% at no additional computational cost. Performance of the MODS algorithm is evaluated on several standard publicly available datasets, and on a new set of geometrically challenging wide baseline problems that is made public together with the ground truth. Experiments show that the MODS outperforms the state-of-the-art in robustness and speed. Moreover, MODS performs well on other classes of difficult two-view problems like matching of images from different modalities, with wide temporal baseline or with significant lighting changes.

Place Recognition with WxBS Retrieval

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We present a novel visual place recognition method designed for operation in challenging conditions such as encountered in day to night or winter to summer matching. The proposed WxBS Retrieval method is novel in enriching a bag of words approach with the use of multiple detectors, descriptors with suitable visual vocabularies, view synthesis, and adaptive thresholding to compensate for large variations in contrast and richness of features in different conditions. The performance of the method evaluated on the public Visual Place Recognition in Changing Environments (VPRiCE) dataset was achieved with precision 0.689 and recall 0.798 and F1-score 0.740. The precision and F1 score are best results so far reported for VPRiCE dataset. Experiments show that the combination of retrieval and matching algorithms with detectors and descriptors insensitive to gradient reversal and contrast lead to both high accuracy and scalability.

WxBS: Wide Baseline Stereo Generalizations

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We have presented a new problem - the wide multiple baseline stereo (WxBS) -- which considers matching of images that simultaneously differ in more than one image acquisition factor such as viewpoint, illumination, sensor type or where object appearance changes significantly, e.g. over time. A new dataset with the ground truth for evaluation of matching algorithms has been introduced and will be made public. We have extensively tested a large set of popular and recent detectors and descriptors and show than the combination of RootSIFT and HalfRootSIFT as descriptors with MSER and Hessian-Affine detectors works best for many different nuisance factors. We show that simple adaptive thresholding improves Hessian-Affine, DoG, MSER (and possibly other) detectors and allows to use them on infrared and low contrast images. A novel matching algorithm for addressing the WxBS problem has been introduced. We have shown experimentally that the WxBS-M matcher dominantes the state-of-the-art methods both on both the new and existing datasets.

A Few Things One Should Know About Feature Extraction, Description and Matching

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We explore the computational bottlenecks of the affine feature extraction process and sho w how this process can be speeded up by 2-3 times with no or very modest loss of performance. With o ur improvements the speed of the Hessian-Affine and MSER detector is comparable with similarity-inva riant SURF and DoG-SIFT detectors. The improvements presented include a faster anisotropic patch ext raction algorithm which does not depend on the feature scale, a speed up of a feature dominant orien tation estimation and SIFT descriptor computation using a look-up table. In the second part of the paper we explore performance of the recently proposed first geometrically inconsistent nearest neighbour criterion and domination orientation generation process.

Matching of Images of Non-planar Objects with View Synthesis

  • Autoři: Mgr. Dmytro Mishkin, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: SOFSEM 2014: Theory and Practice of Computer Science. Cham: Springer International Publishing AG, 2014. pp. 30-39. Lecture notes in computer science. ISSN 0302-9743. ISBN 978-3-319-04297-8.
  • Rok: 2014
  • DOI: 10.1007/978-3-319-04298-5_4
  • Odkaz: https://doi.org/10.1007/978-3-319-04298-5_4
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We explore the performance of the recently proposed two-view image matchin g algorithms using affine view synthesis ASIFT (Morel and Yu, 2009) [14] and MODS (Mishkin, Perdoch and Matas, 2013) [10] on images of objects that do not have significant local texture and that are l ocally not well approximated by planes. Experiments show that view synthesis improves matching resul ts on images of such objects, but the number of useful synthetic views is lower than for planar objects matching. The best detector for matching images of 3D objects is the Hessian-Affine in the Sparse configuration. The iterative MODS matcher performs comparably confirming it is a robust, generic method for two view matching that performs well for different types of scenes and a wide range of viewing conditions.

Two-view Matching with View Synthesis Revisited

  • DOI: 10.1109/IVCNZ.2013.6727054
  • Odkaz: https://doi.org/10.1109/IVCNZ.2013.6727054
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Wide-baseline matching focussing on problems with extreme viewpoint change is considered. We in troduce the use of view synthesis with affine-covariant detectors to solve such problems and show that matching with the Hessian-Affine or MSER detectors outperforms the state-of-the-art ASIFT [19]. To minimise the loss of speed caused by view synthesis, we propose the Matching On Demand with view Synthesis algorithm (MODS) that uses progressively more synthesized images and more (time-consuming) detectors until reliable estimation of geometry is possible. We show experimentally that the MODS algorithm solves problems beyond the state-of-the-art and yet is comparable in speed to standard wide-baseline matchers on simpler problems. Minor contributions include an improved method for tentative correspondence selection, applicable both with and without view synthesis and a view synthesis setup greatly improving MSER robustness to blur and scale change that increase its running time by 10% only.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk