Ing. Lukáš Neumann, Ph.D.

Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views

Autoři: McCraith, R., Insafutdinov, E., Ing. Lukáš Neumann, Ph.D., Vedaldi, A.
Publikace: 2022 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2022. p. 2411-2418. ISSN 1050-4729. ISBN 978-1-7281-9681-7.
Rok: 2022

DOI: 10.1109/ICRA46639.2022.9811693
Odkaz: https://doi.org/10.1109/ICRA46639.2022.9811693
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
We present a system for automatic converting of 2D mask object predictions and raw LiDAR point clouds into full 3D bounding boxes of objects. Because the LiDAR point clouds are partial, directly fitting bounding boxes to the point clouds is meaningless. Instead, we suggest that obtaining good results requires sharing information between all objects in the dataset jointly, over multiple frames. We then make three improvements to the baseline. First, we address ambiguities in predicting the object rotations via direct optimization in this space while still backpropagating rotation prediction through the model. Second, we explicitly model outliers and task the network with learning their typical patterns, thus better discounting them. Third, we enforce temporal consistency when video data is available. With these contributions, our method significantly outperforms previous work despite the fact that those methods use significantly more complex pipelines, 3D models and additional human-annotated external sources of prior information.

Pedestrian and Ego-vehicle Trajectory Prediction from Monocular Camera

Autoři: Ing. Lukáš Neumann, Ph.D., Vedaldi, A.
Publikace: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2021. p. 10199-10207. ISSN 2575-7075. ISBN 978-1-6654-4509-2.
Rok: 2021

DOI: 10.1109/CVPR46437.2021.01007
Odkaz: https://doi.org/10.1109/CVPR46437.2021.01007
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Predicting future pedestrian trajectory is a crucial component of autonomous driving systems, as recognizing critical situations based only on current pedestrian position may come too late for any meaningful corrective action (e.g. breaking) to take place. In this paper, we propose a new method to predict future position of pedestrians, with respect to a predicted future position of the ego-vehicle, thus giving a assistive/autonomous driving system sufficient time to respond. The method explicitly disentangles actual movement of pedestrians in real world from the ego-motion of the vehicle, using a future pose prediction network trained in self-supervised fashion, which allows the method to observe and predict the intrinsic pedestrian motion in a normalised view, that captures the same real-world location across multiple frames.

Real time monocular vehicle velocity estimation using synthetic data

Autoři: McCraith, R., Ing. Lukáš Neumann, Ph.D., Vedaldi, A.
Publikace: Proceedings of 2021 IEEE Intelligent Vehicles Symposium (IV). Piscataway: IEEE, 2021. p. 1406-1412. ISBN 978-1-7281-5394-0.
Rok: 2021

DOI: 10.1109/IV48863.2021.9575204
Odkaz: https://doi.org/10.1109/IV48863.2021.9575204
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
Vision is one of the primary sensing modalities in autonomous driving. In this paper we look at the problem of estimating the velocity of road vehicles from a camera mounted on a moving car. Contrary to prior methods that train end-to-end deep networks that estimate the vehicles' velocity from the video pixels, we propose a two-step approach where first an off-the-shelf tracker is used to extract vehicle bounding boxes and then a small neural network is used to regress the vehicle velocity from the tracked bounding boxes. Surprisingly, we find that this still achieves state-of-the-art estimation performance with the significant benefit of separating perception from dynamics estimation via a clean, interpretable and verifiable interface which allows us distill the statistics which are crucial for velocity estimation. We show that the latter can be used to easily generate synthetic training data in the space of bounding boxes and use this to improve the performance of our method further.

ICDAR2017 Robust Reading Challenge on COCO-Text

Autoři: Gomez, R., Shi, B., Gomez, L., Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Los Alamitos: IEEE Computer Society, 2018. p. 1435-1443. ISSN 1520-5363. ISBN 978-1-5386-3586-5.
Rok: 2018

DOI: 10.1109/ICDAR.2017.234
Odkaz: https://doi.org/10.1109/ICDAR.2017.234
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
This report presents the final results of the ICDAR 2017 Robust Reading Challenge on COCO-Text. A challenge on scene text detection and recognition based on the largest real scene text dataset currently available: the COCO-Text dataset. The competition is structured around three tasks: Text Localization, Cropped Word Recognition and End-To-End Recognition. The competition received a total of 27 submissions over the different opened tasks. This report describes the datasets and the ground truth, details the performance evaluation protocols used and presents the final results along with a brief summary of the participating methods.

Relaxed softmax: Efficient confidence auto-calibration for safe pedestrian detection

Autoři: Ing. Lukáš Neumann, Ph.D., Vedaldi, A., Zisserman, A.
Publikace: NIPS 2018 Workshop MLITS. Massachusetts: OpenReview.net / University of Massachusetts, 2018.
Rok: 2018

Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
As machine learning moves from the lab into the real world, reliability is often of paramount importance. The clearest example are safety-critical applications such as pedestrian detection in autonomous driving. Since algorithms can never be expected to be perfect in all cases, managing reliability becomes crucial. To this end, in this paper we investigate the problem of learning in an end-to-end manner object detectors that are accurate while providing an unbiased estimate of the reliablity of their own predictions. We do so by proposing a modification of the standard softmax layer where a probabilistic confidence score is explicitly pre-multiplied into the incoming activations to modulate confidence. We adopt a rigorous assessment protocol based on reliability diagrams to evaluate the quality of the resulting calibration and show excellent results in pedestrian detection on two challenging public benchmarks.

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

Autoři: Bušta, M., Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: 2017 IEEE International Conference on Computer Vision (ICCV 2017). Piscataway: IEEE, 2017. p. 2223-2231. ISSN 1550-5499. ISBN 978-1-5386-1032-9.
Rok: 2017

DOI: 10.1109/ICCV.2017.242
Odkaz: https://doi.org/10.1109/ICCV.2017.242
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
A method for scene text localization and recognition is proposed. The novelties include: training of both text detection and recognition in a single end-to-end pass, the structure of the recognition CNN and the geometry of its input layer that preserves the aspect of the text and adapts its resolution to the data. The proposed method achieves state-of-the-art accuracy in the end-to-end text recognition on two standard datasets -- ICDAR 2013 and ICDAR 2015, whilst being an order of magnitude faster than competing methods - the whole pipeline runs at $10$ frames per second on an NVidia K80 GPU.

Real-Time Lexicon-Free Scene Text Localization and Recognition

Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2016, 38(9), 1872-1885. ISSN 0162-8828.
Rok: 2016

DOI: 10.1109/TPAMI.2015.2496234
Odkaz: https://doi.org/10.1109/TPAMI.2015.2496234
Pracoviště: Skupina vizuálního rozpoznávání
Anotace:
An end-to-end real-time text localization and recognition method is presented. Its real-time performance is achieved by posing the character detection and segmentation problem as an efficient sequential selection from the set of Extremal Regions. The ER detector is robust against blur, low contrast and illumination, color and texture variation. In the first stage, the probability of each ER being a character is estimated using features calculated by a novel algorithm in constant time and only ERs with locally maximal probability are selected for the second stage, where the classification accuracy is improved using computationally more expensive features. A highly efficient clustering algorithm then groups ERs into text lines and an OCR classifier trained on synthetic fonts is exploited to label character regions. The most probable character sequence is selected in the last stage when the context of each character is known. The method was evaluated on three public datasets. On the ICDAR 2013 dataset the method achieves state-of-the-art results in text localization; on the more challenging SVT dataset, the proposed method significantly outperforms the state-of-the-art methods and demonstrates that the proposed pipeline can incorporate additional prior knowledge about the detected text. The proposed method was exploited as the baseline in the ICDAR 2015 Robust Reading competition, where it compares favourably to the state-of-the art.

A machine learning approach to hypothesis decoding in scene text recognition

Autoři: Libovicky, J., Ing. Lukáš Neumann, Ph.D., Pecina, P., prof. Ing. Jiří Matas, Ph.D.,
Publikace: Computer Vision -- ACCV 2014 Workshops. Heidelberg: Springer, 2015, pp. 169-180. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-16630-8. Available from: http://dx.doi.org/10.1007/978-3-319-16631-5_13
Rok: 2015

DOI: 10.1007/978-3-319-16631-5_13
Odkaz: https://doi.org/10.1007/978-3-319-16631-5_13
Pracoviště: Katedra kybernetiky
Anotace:
Scene Text Recognition (STR) is a task of localizing and transcribing textual information captured in real-word images. With its increasing accuracy, it becomes a new source of textual data for standard Natural Language Processing tasks and poses new problems because of the specific nature of Scene Text. In this paper, we learn a string hypotheses decoding procedure in an STR pipeline using structured prediction methods that proved to be useful in automatic Speech Recognition and Machine Translation. The model allow to employ a wide range of typographical and language features into the decoding process. The proposed method is evaluated on a standard dataset and improves both character and word recognition performance over the baseline.

Efficient Character Skew Rectification in Scene Text Images

Autoři: Bušta, M., Drtina, T., Helekal, D., Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: Computer Vision -- ACCV 2014 Workshops. Heidelberg: Springer, 2015, pp. 134-146. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-16630-8.
Rok: 2015

DOI: 10.1007/978-3-319-16631-5_10
Odkaz: https://doi.org/10.1007/978-3-319-16631-5_10
Pracoviště: Katedra kybernetiky
Anotace:
We present an efficient method for character skew rectification in scene text images. The method is based on a novel skew estimators, which exploit intuitive glyph properties and which can be efficiently computed in a linear time. The estimators are evaluated on a synthetically generated data (including Latin, Cyrillic, Greek, Runic scripts) and real scene text images, where the skew rectification by the proposed method improves the accuracy of a state-of-the-art scene text recognition pipeline.

Efficient Scene text localization and recognition with local character refinement

Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. Piscataway: IEEE, 2015. pp. 746-750. ISSN 1520-5363. ISBN 978-1-4799-1805-8.
Rok: 2015

DOI: 10.1109/ICDAR.2015.7333861
Odkaz: https://doi.org/10.1109/ICDAR.2015.7333861
Pracoviště: Katedra kybernetiky
Anotace:
An unconstrained end-to-end text localization and recognition method is p resented. The method detects initial text hypothesis in a single pass by an efficient reg ion-based method and subsequently refines the text hypothesis using a more robust local t ext model, which deviates from the common assumption of region-based methods that all cha racters are detected as connected components.

FASText: Efficient Unconstrained Scene Text Detector

Autoři: Bušta, M., Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: 2015 IEEE International Conference on Computer Vision (ICCV 2015). Piscataway: IEEE, 2015. p. 1206-1214. ISSN 1550-5499. ISBN 978-1-4673-8391-2.
Rok: 2015

DOI: 10.1109/ICCV.2015.143
Odkaz: https://doi.org/10.1109/ICCV.2015.143
Pracoviště: Katedra kybernetiky
Anotace:
We propose a novel easy-to-implement stroke detector based on an efficie nt pixel intensity comparison to surrounding pixels. Stroke-specific keypoints are efficiently detected and text fragments are subsequently extracted by local thresholding guided by keypoint properties. Classification based on effectively calculated features then eliminates non-text regions.

ICDAR 2015 competition on Robust Reading

Autoři: Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., prof. Ing. Jiří Matas, Ph.D., Ing. Lukáš Neumann, Ph.D., Chandrasekhar, V., Lu, S., Shafait, F., Uchida, S.
Publikace: Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. Piscataway: IEEE, 2015. pp. 1156-1160. ISSN 1520-5363. ISBN 978-1-4799-1805-8.
Rok: 2015

DOI: 10.1109/ICDAR.2015.7333942
Odkaz: https://doi.org/10.1109/ICDAR.2015.7333942
Pracoviště: Katedra kybernetiky
Anotace:
Results of the ICDAR 2015 Robust Reading Competition are presented. A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Im ages, Focused Scene Images and Video Text. Challenge 4 is run on a newly acquired dataset of 1,670 images evaluating Text Localisation, Word Recognition and End-to-End pipelines. In addition, the dataset for Challenge 3 on Video Text has been substantially updated wi th more video sequences and more accurate ground truth data. Finally, tasks assessing End -to-End system performance have been introduced to all Challenges. The competition took p lace in the first quarter of 2015, and received a total of 44 submissions. Only the tasks newly introduced in 2015 are reported on. The datasets, the ground truth specification a nd the evaluation protocols are presented together with the results and a brief summary o f the participating methods.

On Combining Multiple Segmentations in Scene Text Recognition

Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: ICDAR 2013: Proceedings of the 12th International Conference on Document Analysis and Recognition. Los Alamitos: IEEE Computer Society, 2013. pp. 523-527. ISSN 1520-5363.
Rok: 2013

DOI: 10.1109/ICDAR.2013.110
Odkaz: https://doi.org/10.1109/ICDAR.2013.110
Pracoviště: Katedra kybernetiky
Anotace:
An end-to-end real-time scene text localization and recognition method is presented. The three main novel features are: (i) keeping multiple segmentations of each character until the very last st age of the processing when the context of each character in a text line is known, (ii) an efficient algori thm for selection of character segmentations minimizing a global criterion, and (iii) showing that, despit e using theoretically scale-invariant methods, operating on a coarse Gaussian scale space pyramid yields i mproved results as many typographical artifacts are eliminated. The method runs in real time and achieves state-of-the-art text localization results on the ICDAR 2011 Robust Reading dataset. Results are also repo rted for end-to-end text recognition on the ICDAR 2011 dataset.

Scene Text Localization and Recognition with Oriented Stroke Detection

Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: IEEE International Conference on Computer Vision (ICCV 2013). Piscataway: IEEE, 2013. pp. 97-104. ISSN 1550-5499. ISBN 978-1-4799-2839-2.
Rok: 2013

DOI: 10.1109/ICCV.2013.19
Odkaz: https://doi.org/10.1109/ICCV.2013.19
Pracoviště: Katedra kybernetiky
Anotace:
An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearest-neighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.

A Real-Time Scene Text to Speech System

Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: Computer Vision - ECCV 2012. Workshops and Demonstrations. Heidelberg: Springer, 2012. pp. 619-622. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-33884-7.
Rok: 2012

DOI: 10.1007/978-3-642-33885-4_66
Odkaz: https://doi.org/10.1007/978-3-642-33885-4_66
Pracoviště: Katedra kybernetiky
Anotace:
An end-to-end real-time scene text localization and recognition method is demonstrated. The method localizes textual content in images, a video or a webcam stream, performs character recognition (OCR) and "reads" it out loud using a text-to-speech engine. The method has been recently published, achieves state-of-the-art results on public datasets and is able to recognize different fonts and scripts including non-latin ones. The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs) which has a linear computation complexity in the number of pixels in the image. Robustness to blur, noise and illumination and color variations is also demonstrated. Finally, we show effects of various control parameters.

Real-time scene text localization and recognition

Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: CVPR 2012: Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Computer Society Press, 2012. p. 3538-3545. ISSN 1063-6919. ISBN 978-1-4673-1228-8.
Rok: 2012

DOI: 10.1109/CVPR.2012.6248097
Odkaz: https://doi.org/10.1109/CVPR.2012.6248097
Pracoviště: Katedra kybernetiky
Anotace:
An end-to-end real-time scene text localization and recognition method is presented. The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs). The ER detector is robust to blur, illumination, color and texture variation and handles low-contrast text. In the first classification stage, the probability of each ER being a character is estimated using novel features calculated with O(1) complexity per region tested. Only ERs with locally maximal probability are selected for the second stage, where the classification is improved using more computationally expensive features. A highly efficient exhaustive search with feedback loops is then applied to group ERs into words and to select the most probable character segmentation. Finally, text is recognized in an OCR stage trained using synthetic fonts. The method was evaluated on two public datasets. On the ICDAR 2011 dataset, the method achieves state-of-the-art text localization results amongst published methods and it is the first one to report results for end-to-end text recognition. On the more challenging Street View Text dataset, the method achieves state-of-the-art recall. The robustness of the proposed method against noise and low contrast of characters is demonstrated by false positives caused by detected watermark text in the dataset.

A method for text localization and recognition in real-world images

Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: ACCV 2010: Proceedings of the 10th Asian Conference on Computer Vision, Part III. Heidelberg: Springer, 2011. p. 770-783. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-19317-0.
Rok: 2011

Pracoviště: Katedra kybernetiky
Anotace:
A method for affine rectification of a plane exploiting knowledge of relative scale changes is presented. The rectifying transformation is fully specified by the relative scale change at three non-collinear points or by two pairs of points where the relative scale change is known; the relative scale change between the pairs is not required. The method also allows homography estimation between two views of a planar scene from three point-with-scale correspondences. The proposed method is simple to implement and without parameters; linear and thus supporting (algebraic) least squares solutions; and general, without restrictions on either the shape of the corresponding features or their mutual position. The wide applicability of the method is demonstrated on text rectification, detection of repetitive patterns, texture normalization and estimation of homography from three point-with-scale correspondences.

Estimating hidden parameters for text localization and recognition

Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: CVWW 2011: Proceedings of the 16th Computer Vision Winter Workshop. Graz: Graz University of Technology, 2011, pp. 29-36. ISBN 978-3-85125-129-6.
Rok: 2011

Pracoviště: Katedra kybernetiky
Anotace:
A new method for text line formation for text localization and recognition is proposed. The method exhaustively enumerates short sequences of character regions in order to infer values of hidden text line parameters (such as text direction) and applies the parameters to efficiently limit the search space for longer sequences. The exhaustive enumeration of short sequences is achieved by finding all character region triplets that fulfill constraints of textual content, which keeps the proposed method efficient yet still capable to perform a robust estimation of the hidden parameters in order to correctly initialize the search. The method is applied to character regions which are detected as Maximally Stable Extremal Regions (MSERs). The performance of the method is evaluated on the standard ICDAR 2003 dataset, where the method outperforms (precision 0.60, recall 0.60) a previously published method for text line formation of MSERs.

Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search

Autoři: Ing. Lukáš Neumann, Ph.D., prof. Ing. Jiří Matas, Ph.D.,
Publikace: Document Analysis and Recognition (ICDAR), 2011 International Conference on. Los Alamitos: IEEE Computer Society, 2011, pp. 687-691. ISSN 1520-5363. ISBN 978-1-4577-1350-7.
Rok: 2011

DOI: 10.1109/ICDAR.2011.144
Odkaz: https://doi.org/10.1109/ICDAR.2011.144
Pracoviště: Katedra kybernetiky
Anotace:
An efficient method for text localization and recognition in real-world images is proposed. Thanks to effective pruning, it is able to exhaustively search the space of all character sequences in real time (200ms on a 640x480 image). The method exploits higher-order properties of text such as word text lines. We demonstrate that the grouping stage plays a key role in the text localization performance and that a robust and precise grouping stage is able to compensate errors of the character detector. The method includes a novel selector of Maximally Stable Extremal Regions (MSER) which exploits region topology. Experimental validation shows that 95.7% characters in the ICDAR dataset are detected using the novel selector of MSERs with a low sensitivity threshold. The proposed method was evaluated on the standard ICDAR 2003 dataset where it achieved state-of-the-art results in both text localization and recognition.

Ing. Lukáš Neumann, Ph.D.

Všechny publikace

Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views

Pedestrian and Ego-vehicle Trajectory Prediction from Monocular Camera

Real time monocular vehicle velocity estimation using synthetic data

ICDAR2017 Robust Reading Challenge on COCO-Text

Relaxed softmax: Efficient confidence auto-calibration for safe pedestrian detection

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

Real-Time Lexicon-Free Scene Text Localization and Recognition

A machine learning approach to hypothesis decoding in scene text recognition

Efficient Character Skew Rectification in Scene Text Images

Efficient Scene text localization and recognition with local character refinement

FASText: Efficient Unconstrained Scene Text Detector

ICDAR 2015 competition on Robust Reading

On Combining Multiple Segmentations in Scene Text Recognition

Scene Text Localization and Recognition with Oriented Stroke Detection

A Real-Time Scene Text to Speech System

Real-time scene text localization and recognition

A method for text localization and recognition in real-world images

Estimating hidden parameters for text localization and recognition

Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search

Mějte přehled