Lidé

Ing. Štěpán Obdržálek, Ph.D.

Všechny publikace

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects

  • DOI: 10.1109/WACV.2017.103
  • Odkaz: https://doi.org/10.1109/WACV.2017.103
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.

Design and Evaluation of an Interactive Exercise Coaching System for Older Adults: Lessons Learned

  • Autoři: Ofli, F., Kurillo, G., Ing. Štěpán Obdržálek, Ph.D., Bajcsy, R., Jimison, H.B., Pavel, M.
  • Publikace: IEEE Journal of Biomedical and Health Informatics. 2016, 20(1), 201-212. ISSN 2168-2194.
  • Rok: 2016
  • DOI: 10.1109/JBHI.2015.2391671
  • Odkaz: https://doi.org/10.1109/JBHI.2015.2391671
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Although the positive effects of exercise on the well-being and quality of independent living for older adults are well accepted, many elderly individuals lack access to exercise facilities, or the skills and motivation to perform exercise at home. To provide a more engaging environment that promotes physical activity, various fitness applications have been proposed. Many of the available products, however, are geared toward a younger population and are not appropriate or engaging for an older population. To address these issues, we developed an automated interactive exercise coaching system using the Microsoft Kinect. The coaching system guides users through a series of video exercises, tracks and measures their movements, provides real-time feedback, and records their performance over time. Our system consists of exercises to improve balance, flexibility, strength, and endurance, with the aim of reducing fall risk and improving performance of daily activities. In this paper, we report on the development of the exercise system, discuss the results of our recent field pilot study with six independently living elderly individuals, and highlight the lessons learned relating to the in-home system setup, user tracking, feedback, and exercise performance evaluation.

Hessian Interest Points on GPU

  • Pracoviště: Katedra počítačové grafiky a interakce, Skupina vizuálního rozpoznávání
  • Anotace:
    This paper is about interest point detection and GPU programming. We take a popular GPGPU implementation of SIFT - the de-facto standard in fast interest point detectors - SiftGPU and implement modifications that according to recent research result in better performance in terms of repeatability of the detected points. The interest points found at local extrema of the Difference of Gaussians (DoG) function in the original SIFT are replaced by the local extrema of determinant of Hessian matrix of the intensity function. Experimentally we show that the GPU implementation of Hessian-based detector (i) surpasses in repeatability the original DoG-based implementation, (ii) gives result very close to those of a reference CPU implementation, and (iii) is significantly faster than the CPU implementation. We show what speedup is achieved for different image sizes and provide analysis of computational cost of individual steps of the algorithm. The source code is publicly available.

On Evaluation of 6D Object Pose Estimation

  • DOI: 10.1007/978-3-319-49409-8_52
  • Odkaz: https://doi.org/10.1007/978-3-319-49409-8_52
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A pose of a rigid object has 6 degrees of freedom and its full knowledge is required in many robotic and scene understanding applications. Evaluation of 6D object pose estimates is not straightforward. Object pose may be ambiguous due to object symmetries and occlusions, i.e. there can be multiple object poses that are indistinguishable in the given image and should be therefore treated as equivalent. The paper defines 6D object pose estimation problems, proposes an evaluation methodology and introduces three new pose error functions that deal with pose ambiguity. The new error functions are compared with functions commonly used in the literature and shown to remove certain types of non-intuitive outcomes. Evaluation tools are provided at: https://github.com/thodan/obj_pose_eval

Detection and Fine 3D Pose Estimation of Texture-less Objects in RGB-D Images

  • Autoři: Hodaň, T., Zabulis, X., Lourakis, M., Ing. Štěpán Obdržálek, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: IROS 2015: Proceedings IEEE/RSJ International Conference on Inteligent Robots and Systems. Los Alamitos: IEEE Computer Society, 2015. p. 4421-4428. ISSN 2153-0858. ISBN 978-1-4799-9994-1.
  • Rok: 2015
  • DOI: 10.1109/IROS.2015.7354005
  • Odkaz: https://doi.org/10.1109/IROS.2015.7354005
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Despite their ubiquitous presence, texture-less objects present significant challenges to contemporary visual object detection and localization algorithms. This paper proposes a practical method for the detection and accurate 3D localization of multiple texture-less and rigid objects depicted in RGB-D images. The detection procedure adopts the sliding window paradigm, with an efficient cascade-style evaluation of each window location. A simple pre-filtering is performed first, rapidly rejecting most locations. For each remaining location, a set of candidate templates (i.e. trained object views) is identified with a voting procedure based on hashing, which makes the method's computational complexity largely unaffected by the total number of known objects. The candidate templates are then verified by matching feature points in different modalities. Finally, the approximate object pose associated with each detected template is used as a starting point for a stochastic optimization procedure that estimates accurate 3D pose. Experimental evaluation shows that the proposed method yields a recognition rate comparable to the state of the art, while its complexity is sub-linear in the number of templates.

A Voting Strategy for Visual Ego-Motion from Stereo

  • DOI: 10.1109/IVS.2010.5548093
  • Odkaz: https://doi.org/10.1109/IVS.2010.5548093
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We present a procedure for egomotion estimation from visual input of a stereo pair of video cameras. The 3D egomotion problem, which has six degrees of freedom in general, is simplified to four dimensions and further decomposed to two two-dimensional subproblems. The decomposition allows us to use a voting strategy to identify the most probable solution, avoiding the random sampling (RANSAC) or other approximation techniques. The input constitutes of image correspondences between consecutive stereo pairs, i.e. feature points do not need to be tracked over time. The experiments show that even if a trajectory is put together as a simple concatenation of frame-to-frame increments, it comes out reliable and precise.

Integrated vision system for the semantic interpretation of activities where a person handles objects

  • DOI: 10.1016/j.cviu.2008.10.008
  • Odkaz: https://doi.org/10.1016/j.cviu.2008.10.008
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Interpretation of human activity is primarily known from surveillance and video analysis tasks and concerned with the persons alone. In this paper we present an integrated system that gives a natural language interpretation of activities where a person handles objects. The system integrates low-level image components such as hand and object tracking, detection and recognition, with high-level processes such as spatio-temporal object relationship generation, posture and gesture recognition, and activity reasoning. A task-oriented approach focuses processing to achieve near real-time and to react depending on the situation context.

Dense Linear-Time Correspondences for Tracking

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel method is proposed for the problem of frame-to-frame correspondence search in video sequences. The method, based on hashing of low-dimensional image descriptors, establishes dense correspondences and allows large motions. All image pixels are considered for matching, the notion of interest points is reviewed. In our formulation, points of interest are those that can be reliably matched. Their saliency depends on properties of the chosen matching function and on actual image content. Both computational time and memory requirements of the correspondence search are asymptoticaly linear in the number of image pixels, irrespective of correspondence density and of image content. All steps of the method are simple and allow for a hardware implementation. Functionality is demonstrated on sequences taken from a vehicle moving in an urban environment.

Stable Affine Frames on Isophotes

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a new affine-covariant feature, the Stable Affine Frame (SAF). SAFs lie on the boundary of extremal regions, ie. on isophotes. Instead of requiring the whole isophote to be stable with respect to intensity perturbation as in maximally stable extremal regions (MSERs), stability is required only locally, for the primitives constituting the three-point frames. The primitives are extracted by an affine invariant process that exploits properties of bitangents and algebraic moments. Thus, instead of using closed stable isophotes, ie. MSERs, and detecting affine frames on them, SAFs are sought even on some unstable extremal regions. We show experimentally on standard datasets that SAFs have repeatability comparable to the best affine covariant detectors and consistently produce a significantly higher number of features per image. Moreover, the features cover images more evenly than MSERs, which facilitates robustness to occlusion.

3D Geometry from Uncalibrated Images

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We present an automatic pipeline for recovering the geometry of a 3D scene from a set of unordered, uncalibrated images. The contributions in the paper are the presentation of the system as a whole, from images to geometry, the estimation of the local scale for various scene components in the orientation-topology module, the procedure for orienting the cloud components, and the method for dealing with points of contact. The methods are aimed to process complex scenes and nonuniformly sampled, noisy data sets.

Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The chapter focuses on a method exploiting local coordinate systems (local affine frames) established on aximally stable extremal regions. We provide a taxonomy of affine-covariant constructions of local coordinate systems, rove their affine covariance and present algorithmic details on their computation.

On the Stability of Local Affine Frames for the Correspondence Problem

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    This paper presents an overview and a classification of affine-covariant constructions of local coordinate systems (frames), prove the affine covariance of the constructions, and give details on their computation. Then a technique to avoid generating unnecessarily abundant amount of frames is proposed, which identify frames with highest probability of being also generated (repeated) in other images. Ordering of frames by expected repeatability provides a simple, single-parametric way of controlling the amount of generated frames.

Sub-linear Indexing for Large Scale Object Recognition

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Realistic approaches to large scale object recognition, i.e. for detection and localisation of hundreds or more objects, must support sub-linear time indexing. In the paper, we propose a method capable of recognising one of N objects in log(N) time. The .visual memory. is organised as a binary decision tree that is built to minimise average time to decision. Leaves of the tree represent a few local image areas, and each non-terminal node is associated with a .weak classifier.. In the recognition phase, a single invariant measurement decides in which subtree a corresponding image area is sought. The method preserves all the strengths of local affine region methods . robustness to background clutter, occlusion, and large changes of viewpoints. Experimentally we show that it supports near real-time recognition of hundreds of objects with state-of-the-art recognition rates. After the test image is processed (in a second on a current PCs), the recognition via indexing into the visual memory

Enhancing RANSAC by Generalized Model Optimization

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An extension of the RANSAC procedure is proposed. By adding a generalized model optimization step (the LO step) applied only to models with a score (quality) better than all previous ones, an algorithm with the following desirable properties is obtained: a near perfect agreement with theoretical (i.e. optimal) performance and lower sensitivity to noise and poor conditioning. The chosen scheduling strategy is shown to guarantee that the optimization step is applied so rarely that it has minimal impact on the execution time.

Geometric and photometric image stabilization for detection of significant events in video from a low flying Unmanned Aerial Vehicles

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    On-line video sequences acquired by cameras on board of a small surveillance plane are very unstable. As a first step facilitating visual interpretation, a dynamic adaptation of brightness and contrast have been designed and implemented. Secondly, stabilisation of camera movement is achieved. After stabilisation, moving object are identified. Finally, objects of interest, whose models are automatically built from example images, are recognised and localised.

Object recognition methods based on transformation covariant features

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Methods based on distinguished regions (transformation covariant detectable patches) have achieved considerable success in a range of object recognition, retrieval and matching problems, in still images and videos. We review the state-of-the-art, describe relationship to other recognition methods, analyse their strengths and weaknesses, and present examples of successful applications.

Epipolar Geometry from Three Correspondences

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In this paper, LO-RANSAC 3-LAF, a new algorithm for the correspondence problem is described. Exploiting processes proposed for computation of affineinvariant local frames, three point-to-point correspondences are found for each region-to-region correspondence. Consequently, it is sufficient to select only triplets of region correspondences in the hypothesis stage of epipolar geometry estimation by RANSAC.

Image Retrieval Using Local Compact DCT-Based Representation

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An image retrieval system based on local affine frames is introduced. The system provides highly discriminative retrieval of rigid objects under a very wide range of viewing and illumination conditions, and is robust to occlusion and background clutter. Distinguished regions of data dependent shape are detected, and local affine frames (coordinate systems) are obtained. Photometrically and geometrically normalised image patches are extracted and used for matching. Local correspondences are formed either by direct comparison of photometrically normalised colour intensities in the normalised patches, or by comparison of DCT (discrete cosine transform) coefficients of the patches. Experimental results are presented on a publicly available database of real outdoor images of buildings. We demonstrate the effect of the number of DCT coefficients that are used for the matching. Using the DCT, retrieval performance of 100% in rank 1 is achieved, and memory usage is reduced by a factor of 4.

Obecné systémy rozpoznávání objektů ve snímcích a videosekvencích

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Článek popisuje obecnou metodu rozpoznávání objektů v obrázcích. Výhodou metody je nízká míra apriorních předpokladů o charakteru objektu. Proto se tato metoda hodí pro roznávání libovolných objektů

On the Interaction between Object Recognition and Colour Constancy

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    In this paper we investigate some aspects of the interaction between colour constancy and object recognition. We demonstrate that even under severe changes of illumination, many objects are reliably recognised if relying only on geometry and on invariant representation of local colour appearance. We feel that colour constancy as a prePROCESSING step of an object recognition algorithm is important only in cases when colour is major (or the only available) clue for object discrimination. We also show that successful object recognition allows for "colour constancy by recognition" - an approach where the global photometric transformation is estimated from locally corresponding image patches.

Learning Parameters of a Recognition System Based on Local Affine Frames

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An approach to object recognition, based on matching of local image features, is presented. First, distinguished regions of data-dependent shape are robustly detected. On these regions, local affine frames are established using several affine invariant constructions. Direct comparison of photometrically normalised colour intensities in local, geometrically aligned frames results in a matching scheme that is invariant to piecewise-affine image deformations, but still remains very discriminative. Nevertheless, invariance to a wide range of local geometric and photometric transformations reduces the discriminative power - not all possible transformations are equiprobable. Probability of the transformations is estimated from matches established by the invariant method on the training data. The estimate is exploited in the recognition phase to favour local correspondences with more likely transformations.

Local Affine Frames for Image Retrieval

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel approach to content-based image retrieval is presented. The method supports recognition of objects under a very wide range of viewing and illumination conditions and is robust to occlusion and background clutter. Starting from robustly detected 'distinguished regions' of data dependent shape, local affine frames are established by affine-invariant constructions exploiting invariant properties of the second moment matrix and bi-tangent points. Direct comparison of photometrically normalised colour intensities in normalised frames facilitates robust, affine and illumination invariant, but still very selective matching. The potential of the proposed approach is experimentally verified on FOCUS - a publicly available image database - using a standard set of query images. The results obtained are superior to the state of the art. The method operates successfully on images with complex background, where the sought object covers only a fraction (around 2%) of the database image.

Local Affine Frames for Wide-Baseline Stereo

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel procedure for establishing wide-baseline correspondence is introduced. Tentative correspondences are established by matching photometrically normalised colour measurements represented in a local affine frame. The affine frames are obtained by a number of affine invariant constructions on robustly detected maximally stable extremal regions of data-dependent shape. Several processes for local affine frame construction are proposed and proved affine covariant. The potential of the proposed approach is demonstrated on demanding wide-baseline matching problems. Correspondence between two views taken from different viewpoints and camera orientations as well as at very different scales is reliably established. For the scale change present (a factor more than 3), the zoomed-in image covers less than 10% of the wider view.

Object Recognition using Local Affine Frames on Distinguished Regions

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A novel approach to appearance based object recognition is introduced. The proposed method, based on matching of local image features, reliably recognises objects under very different viewing conditions. First, distinguished regions of data-dependent shape are robustly detected. On these regions, local affine frames are established using several affine invariant constructions. Direct comparison of photometrically normalised colour intensities in local, geometrically aligned frames results in a matching scheme that is invariant to piecewise-affine image deformations, but still remains very discriminative. The potential of the approach is experimentally verified on public databases. On SOIL-47, 100% recognition rate is achieved for single training view per object. On COIL-100, 99.9% recognition rate is obtained for 18 training views per object. Robustness to occlusions is demonstrated by only a moderate decrease of performance in an experiment where half of each test image is erased.

Improvement of Oliva's Algorithm for Surface Reconstruction from Contours

  • Pracoviště: Katedra počítačů
  • Anotace:
    Oliva proposed an interesting algorithm for a 3D surface reconstruction from contours in parallel crosssections. As stated in the article [9], the algorithm constructs the surface for any non-self-intersecting contour shape also with holes [9] by means of adding appropriate number of intermediate cross-sections between complicated contours and triangulation of every pair of contours in different slices separately. For this task a Straight skeleton (Angular Bisector Network) [1, 9] is exploited. We have implemented Oliva's algorithm and we have found cases, which are not handled properly. The resulting surface can contain overhangs and selfintersected triangles can occur. We propose a modification of a triangulation step in Oliva's algorithm which handles differently the cases when overhangs (artifacts looking like folds) can appear and generates an overhang-free surface. By exclusion of overhangs we also prevent the creation of degenerated surface with mutually-intersected triangles.

Straight Skeleton Implementation

  • Pracoviště: Katedra počítačů
  • Anotace:
    Straight skeleton (Angular Bisector Network, ABN) of a planar polygon, which can be grasped as a modification of a planar Voronoi diagram without parabolic arcs, has been successfully used by Oliva et al. as a part of a system for three dimensional reconstruction of objects from a given set of 2D contours in parallel cross sections. The algorithm itself is used for the construction of intermediate contour layers during the reconstruction process, in order not to create self intersected surface triangles or a surface with holes. But Oliva's algorithm is not publicly available and we have not found any other useful code on the net. We have followed our older ideas and implemented our version of straight skeleton. Our algorithm runs in O(nm + n log n) time, where n denotes the total number of polygon vertices and m the number of reflex ones.

Straight Skeleton Implementation

  • Pracoviště: Katedra počítačů
  • Anotace:
    Straight skeleton (Angular Bisector Network, ABN) of a planar polygon, which can be grasped as a modification of a planar Voronoi diagram without parabolic arcs, has been successfully used by Oliva et al. as a part of a system for three dimensional reconstruction of objects from a given set of 2D contours in parallel cross sections. The algorithm itself is used for the construction of intermediate contour layers during the reconstruction process, in order not to create self intersected surface triangles or a surface with holes. But Oliva's algorithm is not publicly available and we have not found any other useful code on the net. We have followed our older ideas and implemented our version of straight skeleton. Our algorithm runs in O(nm + n log n) time, where n denotes the total number of polygon vertices and m the number of reflex ones.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk