Lidé

Ing. Lukáš Neumann, Ph.D.

Všechny publikace

Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views

  • Autoři: McCraith, R., Insafutdinov, E., Ing. Lukáš Neumann, Ph.D., Vedaldi, A.
  • Publikace: 2022 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2022. p. 2411-2418. ISSN 1050-4729. ISBN 978-1-7281-9681-7.
  • Rok: 2022
  • DOI: 10.1109/ICRA46639.2022.9811693
  • Odkaz: https://doi.org/10.1109/ICRA46639.2022.9811693
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    We present a system for automatic converting of 2D mask object predictions and raw LiDAR point clouds into full 3D bounding boxes of objects. Because the LiDAR point clouds are partial, directly fitting bounding boxes to the point clouds is meaningless. Instead, we suggest that obtaining good results requires sharing information between all objects in the dataset jointly, over multiple frames. We then make three improvements to the baseline. First, we address ambiguities in predicting the object rotations via direct optimization in this space while still backpropagating rotation prediction through the model. Second, we explicitly model outliers and task the network with learning their typical patterns, thus better discounting them. Third, we enforce temporal consistency when video data is available. With these contributions, our method significantly outperforms previous work despite the fact that those methods use significantly more complex pipelines, 3D models and additional human-annotated external sources of prior information.

Pedestrian and Ego-vehicle Trajectory Prediction from Monocular Camera

  • Autoři: Ing. Lukáš Neumann, Ph.D., Vedaldi, A.
  • Publikace: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2021. p. 10199-10207. ISSN 2575-7075. ISBN 978-1-6654-4509-2.
  • Rok: 2021
  • DOI: 10.1109/CVPR46437.2021.01007
  • Odkaz: https://doi.org/10.1109/CVPR46437.2021.01007
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Predicting future pedestrian trajectory is a crucial component of autonomous driving systems, as recognizing critical situations based only on current pedestrian position may come too late for any meaningful corrective action (e.g. breaking) to take place. In this paper, we propose a new method to predict future position of pedestrians, with respect to a predicted future position of the ego-vehicle, thus giving a assistive/autonomous driving system sufficient time to respond. The method explicitly disentangles actual movement of pedestrians in real world from the ego-motion of the vehicle, using a future pose prediction network trained in self-supervised fashion, which allows the method to observe and predict the intrinsic pedestrian motion in a normalised view, that captures the same real-world location across multiple frames.

Real time monocular vehicle velocity estimation using synthetic data

  • Autoři: McCraith, R., Ing. Lukáš Neumann, Ph.D., Vedaldi, A.
  • Publikace: Proceedings of 2021 IEEE Intelligent Vehicles Symposium (IV). Piscataway: IEEE, 2021. p. 1406-1412. ISBN 978-1-7281-5394-0.
  • Rok: 2021
  • DOI: 10.1109/IV48863.2021.9575204
  • Odkaz: https://doi.org/10.1109/IV48863.2021.9575204
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    Vision is one of the primary sensing modalities in autonomous driving. In this paper we look at the problem of estimating the velocity of road vehicles from a camera mounted on a moving car. Contrary to prior methods that train end-to-end deep networks that estimate the vehicles' velocity from the video pixels, we propose a two-step approach where first an off-the-shelf tracker is used to extract vehicle bounding boxes and then a small neural network is used to regress the vehicle velocity from the tracked bounding boxes. Surprisingly, we find that this still achieves state-of-the-art estimation performance with the significant benefit of separating perception from dynamics estimation via a clean, interpretable and verifiable interface which allows us distill the statistics which are crucial for velocity estimation. We show that the latter can be used to easily generate synthetic training data in the space of bounding boxes and use this to improve the performance of our method further.

ICDAR2017 Robust Reading Challenge on COCO-Text

  • Autoři: Gomez, R., Shi, B., Gomez, L., Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Los Alamitos: IEEE Computer Society, 2018. p. 1435-1443. ISSN 1520-5363. ISBN 978-1-5386-3586-5.
  • Rok: 2018
  • DOI: 10.1109/ICDAR.2017.234
  • Odkaz: https://doi.org/10.1109/ICDAR.2017.234
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    This report presents the final results of the ICDAR 2017 Robust Reading Challenge on COCO-Text. A challenge on scene text detection and recognition based on the largest real scene text dataset currently available: the COCO-Text dataset. The competition is structured around three tasks: Text Localization, Cropped Word Recognition and End-To-End Recognition. The competition received a total of 27 submissions over the different opened tasks. This report describes the datasets and the ground truth, details the performance evaluation protocols used and presents the final results along with a brief summary of the participating methods.

Relaxed softmax: Efficient confidence auto-calibration for safe pedestrian detection

  • Autoři: Ing. Lukáš Neumann, Ph.D., Vedaldi, A., Zisserman, A.
  • Publikace: NIPS 2018 Workshop MLITS. Massachusetts: OpenReview.net / University of Massachusetts, 2018.
  • Rok: 2018
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    As machine learning moves from the lab into the real world, reliability is often of paramount importance. The clearest example are safety-critical applications such as pedestrian detection in autonomous driving. Since algorithms can never be expected to be perfect in all cases, managing reliability becomes crucial. To this end, in this paper we investigate the problem of learning in an end-to-end manner object detectors that are accurate while providing an unbiased estimate of the reliablity of their own predictions. We do so by proposing a modification of the standard softmax layer where a probabilistic confidence score is explicitly pre-multiplied into the incoming activations to modulate confidence. We adopt a rigorous assessment protocol based on reliability diagrams to evaluate the quality of the resulting calibration and show excellent results in pedestrian detection on two challenging public benchmarks.

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

  • DOI: 10.1109/ICCV.2017.242
  • Odkaz: https://doi.org/10.1109/ICCV.2017.242
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    A method for scene text localization and recognition is proposed. The novelties include: training of both text detection and recognition in a single end-to-end pass, the structure of the recognition CNN and the geometry of its input layer that preserves the aspect of the text and adapts its resolution to the data. The proposed method achieves state-of-the-art accuracy in the end-to-end text recognition on two standard datasets -- ICDAR 2013 and ICDAR 2015, whilst being an order of magnitude faster than competing methods - the whole pipeline runs at $10$ frames per second on an NVidia K80 GPU.

Real-Time Lexicon-Free Scene Text Localization and Recognition

  • DOI: 10.1109/TPAMI.2015.2496234
  • Odkaz: https://doi.org/10.1109/TPAMI.2015.2496234
  • Pracoviště: Skupina vizuálního rozpoznávání
  • Anotace:
    An end-to-end real-time text localization and recognition method is presented. Its real-time performance is achieved by posing the character detection and segmentation problem as an efficient sequential selection from the set of Extremal Regions. The ER detector is robust against blur, low contrast and illumination, color and texture variation. In the first stage, the probability of each ER being a character is estimated using features calculated by a novel algorithm in constant time and only ERs with locally maximal probability are selected for the second stage, where the classification accuracy is improved using computationally more expensive features. A highly efficient clustering algorithm then groups ERs into text lines and an OCR classifier trained on synthetic fonts is exploited to label character regions. The most probable character sequence is selected in the last stage when the context of each character is known. The method was evaluated on three public datasets. On the ICDAR 2013 dataset the method achieves state-of-the-art results in text localization; on the more challenging SVT dataset, the proposed method significantly outperforms the state-of-the-art methods and demonstrates that the proposed pipeline can incorporate additional prior knowledge about the detected text. The proposed method was exploited as the baseline in the ICDAR 2015 Robust Reading competition, where it compares favourably to the state-of-the art.

A machine learning approach to hypothesis decoding in scene text recognition

  • Autoři: Libovicky, J., Ing. Lukáš Neumann, Ph.D., Pecina, P., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision -- ACCV 2014 Workshops. Heidelberg: Springer, 2015, pp. 169-180. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-16630-8. Available from: http://dx.doi.org/10.1007/978-3-319-16631-5_13
  • Rok: 2015
  • DOI: 10.1007/978-3-319-16631-5_13
  • Odkaz: https://doi.org/10.1007/978-3-319-16631-5_13
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Scene Text Recognition (STR) is a task of localizing and transcribing textual information captured in real-word images. With its increasing accuracy, it becomes a new source of textual data for standard Natural Language Processing tasks and poses new problems because of the specific nature of Scene Text. In this paper, we learn a string hypotheses decoding procedure in an STR pipeline using structured prediction methods that proved to be useful in automatic Speech Recognition and Machine Translation. The model allow to employ a wide range of typographical and language features into the decoding process. The proposed method is evaluated on a standard dataset and improves both character and word recognition performance over the baseline.

Efficient Character Skew Rectification in Scene Text Images

  • Autoři: Bušta, M., Drtina, T., Helekal, D., Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: Computer Vision -- ACCV 2014 Workshops. Heidelberg: Springer, 2015, pp. 134-146. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-16630-8.
  • Rok: 2015
  • DOI: 10.1007/978-3-319-16631-5_10
  • Odkaz: https://doi.org/10.1007/978-3-319-16631-5_10
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We present an efficient method for character skew rectification in scene text images. The method is based on a novel skew estimators, which exploit intuitive glyph properties and which can be efficiently computed in a linear time. The estimators are evaluated on a synthetically generated data (including Latin, Cyrillic, Greek, Runic scripts) and real scene text images, where the skew rectification by the proposed method improves the accuracy of a state-of-the-art scene text recognition pipeline.

Efficient Scene text localization and recognition with local character refinement

  • DOI: 10.1109/ICDAR.2015.7333861
  • Odkaz: https://doi.org/10.1109/ICDAR.2015.7333861
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An unconstrained end-to-end text localization and recognition method is p resented. The method detects initial text hypothesis in a single pass by an efficient reg ion-based method and subsequently refines the text hypothesis using a more robust local t ext model, which deviates from the common assumption of region-based methods that all cha racters are detected as connected components.

FASText: Efficient Unconstrained Scene Text Detector

  • DOI: 10.1109/ICCV.2015.143
  • Odkaz: https://doi.org/10.1109/ICCV.2015.143
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We propose a novel easy-to-implement stroke detector based on an efficie nt pixel intensity comparison to surrounding pixels. Stroke-specific keypoints are efficiently detected and text fragments are subsequently extracted by local thresholding guided by keypoint properties. Classification based on effectively calculated features then eliminates non-text regions.

ICDAR 2015 competition on Robust Reading

  • Autoři: Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., prof. Ing. Jiří Matas, Ph.D., Ing. Lukáš Neumann, Ph.D., Chandrasekhar, V., Lu, S., Shafait, F., Uchida, S.
  • Publikace: Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. Piscataway: IEEE, 2015. pp. 1156-1160. ISSN 1520-5363. ISBN 978-1-4799-1805-8.
  • Rok: 2015
  • DOI: 10.1109/ICDAR.2015.7333942
  • Odkaz: https://doi.org/10.1109/ICDAR.2015.7333942
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Results of the ICDAR 2015 Robust Reading Competition are presented. A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Im ages, Focused Scene Images and Video Text. Challenge 4 is run on a newly acquired dataset of 1,670 images evaluating Text Localisation, Word Recognition and End-to-End pipelines. In addition, the dataset for Challenge 3 on Video Text has been substantially updated wi th more video sequences and more accurate ground truth data. Finally, tasks assessing End -to-End system performance have been introduced to all Challenges. The competition took p lace in the first quarter of 2015, and received a total of 44 submissions. Only the tasks newly introduced in 2015 are reported on. The datasets, the ground truth specification a nd the evaluation protocols are presented together with the results and a brief summary o f the participating methods.

On Combining Multiple Segmentations in Scene Text Recognition

  • DOI: 10.1109/ICDAR.2013.110
  • Odkaz: https://doi.org/10.1109/ICDAR.2013.110
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An end-to-end real-time scene text localization and recognition method is presented. The three main novel features are: (i) keeping multiple segmentations of each character until the very last st age of the processing when the context of each character in a text line is known, (ii) an efficient algori thm for selection of character segmentations minimizing a global criterion, and (iii) showing that, despit e using theoretically scale-invariant methods, operating on a coarse Gaussian scale space pyramid yields i mproved results as many typographical artifacts are eliminated. The method runs in real time and achieves state-of-the-art text localization results on the ICDAR 2011 Robust Reading dataset. Results are also repo rted for end-to-end text recognition on the ICDAR 2011 dataset.

Scene Text Localization and Recognition with Oriented Stroke Detection

  • DOI: 10.1109/ICCV.2013.19
  • Odkaz: https://doi.org/10.1109/ICCV.2013.19
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearest-neighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.

A Real-Time Scene Text to Speech System

  • DOI: 10.1007/978-3-642-33885-4_66
  • Odkaz: https://doi.org/10.1007/978-3-642-33885-4_66
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An end-to-end real-time scene text localization and recognition method is demonstrated. The method localizes textual content in images, a video or a webcam stream, performs character recognition (OCR) and "reads" it out loud using a text-to-speech engine. The method has been recently published, achieves state-of-the-art results on public datasets and is able to recognize different fonts and scripts including non-latin ones. The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs) which has a linear computation complexity in the number of pixels in the image. Robustness to blur, noise and illumination and color variations is also demonstrated. Finally, we show effects of various control parameters.

Real-time scene text localization and recognition

  • Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: CVPR 2012: Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Computer Society Press, 2012. p. 3538-3545. ISSN 1063-6919. ISBN 978-1-4673-1228-8.
  • Rok: 2012
  • DOI: 10.1109/CVPR.2012.6248097
  • Odkaz: https://doi.org/10.1109/CVPR.2012.6248097
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An end-to-end real-time scene text localization and recognition method is presented. The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs). The ER detector is robust to blur, illumination, color and texture variation and handles low-contrast text. In the first classification stage, the probability of each ER being a character is estimated using novel features calculated with O(1) complexity per region tested. Only ERs with locally maximal probability are selected for the second stage, where the classification is improved using more computationally expensive features. A highly efficient exhaustive search with feedback loops is then applied to group ERs into words and to select the most probable character segmentation. Finally, text is recognized in an OCR stage trained using synthetic fonts. The method was evaluated on two public datasets. On the ICDAR 2011 dataset, the method achieves state-of-the-art text localization results amongst published methods and it is the first one to report results for end-to-end text recognition. On the more challenging Street View Text dataset, the method achieves state-of-the-art recall. The robustness of the proposed method against noise and low contrast of characters is demonstrated by false positives caused by detected watermark text in the dataset.

A method for text localization and recognition in real-world images

  • Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
  • Publikace: ACCV 2010: Proceedings of the 10th Asian Conference on Computer Vision, Part III. Heidelberg: Springer, 2011. p. 770-783. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-19317-0.
  • Rok: 2011
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A method for affine rectification of a plane exploiting knowledge of relative scale changes is presented. The rectifying transformation is fully specified by the relative scale change at three non-collinear points or by two pairs of points where the relative scale change is known; the relative scale change between the pairs is not required. The method also allows homography estimation between two views of a planar scene from three point-with-scale correspondences. The proposed method is simple to implement and without parameters; linear and thus supporting (algebraic) least squares solutions; and general, without restrictions on either the shape of the corresponding features or their mutual position. The wide applicability of the method is demonstrated on text rectification, detection of repetitive patterns, texture normalization and estimation of homography from three point-with-scale correspondences.

Estimating hidden parameters for text localization and recognition

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    A new method for text line formation for text localization and recognition is proposed. The method exhaustively enumerates short sequences of character regions in order to infer values of hidden text line parameters (such as text direction) and applies the parameters to efficiently limit the search space for longer sequences. The exhaustive enumeration of short sequences is achieved by finding all character region triplets that fulfill constraints of textual content, which keeps the proposed method efficient yet still capable to perform a robust estimation of the hidden parameters in order to correctly initialize the search. The method is applied to character regions which are detected as Maximally Stable Extremal Regions (MSERs). The performance of the method is evaluated on the standard ICDAR 2003 dataset, where the method outperforms (precision 0.60, recall 0.60) a previously published method for text line formation of MSERs.

Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search

  • DOI: 10.1109/ICDAR.2011.144
  • Odkaz: https://doi.org/10.1109/ICDAR.2011.144
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    An efficient method for text localization and recognition in real-world images is proposed. Thanks to effective pruning, it is able to exhaustively search the space of all character sequences in real time (200ms on a 640x480 image). The method exploits higher-order properties of text such as word text lines. We demonstrate that the grouping stage plays a key role in the text localization performance and that a robust and precise grouping stage is able to compensate errors of the character detector. The method includes a novel selector of Maximally Stable Extremal Regions (MSER) which exploits region topology. Experimental validation shows that 95.7% characters in the ICDAR dataset are detected using the novel selector of MSERs with a low sensitivity threshold. The proposed method was evaluated on the standard ICDAR 2003 dataset where it achieved state-of-the-art results in both text localization and recognition.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk