doc. Ing. Petr Pollák, CSc.

Automatic Phonetic Segmentation and Pronunciation Detection with Various Approaches of Acoustic Modeling

Autoři: Mizera, P., doc. Ing. Petr Pollák, CSc.,
Publikace: Speech and Computer. Basel: Springer, 2018. p. 419-429. LNAI. vol. 11096. ISSN 0302-9743. ISBN 978-3-319-99578-6.
Rok: 2018

DOI: 10.1007/978-3-319-99579-3_44
Odkaz: https://doi.org/10.1007/978-3-319-99579-3_44
Pracoviště: Katedra teorie obvodů
Anotace:
The paper describes HMM-based phonetic segmentation realized by KALDI toolkit with the focus on study of accuracy of various acoustic modeling such as GMM-HMM vs. DNN-HMM, monophone vs. triphone, speaker independent vs. speaker dependent. The analysis was performed with TIMIT database and it proved the contribution of advanced acoustic modeling, especially for the choice of a proper pronunciation variant. For this purpose, the lexicon covering the pronunciation variability among TIMIT speakers was created on the basis of phonetic transcriptions available in TIMIT corpus. When the proper sequence of phones is recognized by DNN-HMM system, more precise boundary placement can be then obtained using basic monophone acoustic models.

Dithering techniques in automatic recognition of speech corrupted by MP3 compression: Analysis, solutions and experiments

Autoři: Borský, M., Mizera, P., doc. Ing. Petr Pollák, CSc., Nouza, J.
Publikace: Speech Communication. 2017, 86 75-84. ISSN 0167-6393.
Rok: 2017

DOI: 10.1016/j.specom.2016.11.007
Odkaz: https://doi.org/10.1016/j.specom.2016.11.007
Pracoviště: Katedra teorie obvodů
Anotace:
A large portion of the audio files distributed over the Internet or those stored in personal and corporate media archives are in a compressed form. There exist several compression techniques and algorithms but it is the MPEG Layer-3 (known as MP3) that has achieved a really wide popularity in general audio coding, and in speech, too. However, the algorithm is lossy in nature and introduces distortion into spectral and temporal characteristics of a signal. In this paper we study its impact on automatic speech recognition (ASR). We show that with decreasing MP3 bitrates the major source of ASR performance degradation is deep spectral valleys (i.e. bins with almost zero energy) caused by the masking effect of the MP3 algorithm. We demonstrate that these unnatural gaps in spectrum can be effectively compensated by adding a certain amount of noise to the distorted signal. We provide theoretical background for this approach where we show that the added noise affects mainly the spectral valleys. They are filled by the noise while the spectral bins with speech remain almost unchanged. This helps to restore a more natural shape of log spectrum and cepstrum, and consequently has a positive impact on ASR performance. In our previous work, we have proposed two types of the signal dithering (noise addition) technique, one applied globally, the other in a more selective way. In this paper, we offer a more detailed insight into their performance. We provide results from many experiments where we test them in various scenarios, using a large vocabulary continuous speech recognition (LVCSR) system, acoustic models based on gaussian-mixture model (GMM) as well as on deep-neural network (DNN), and multiple speech databases in three languages (Czech, English and German). Our results prove that both the proposed techniques, and the selective dithering method, in particular, yield consistent compensation of the negative impact of the MP3 compressed speech on ASR performance.

Improving of LVCSR for Casual Czech Using Publicly Available Language Resources

Autoři: Mizera, P., doc. Ing. Petr Pollák, CSc.,
Publikace: Speech and Computer. Heidelberg: Springer, 2017. p. 427-437. Lecture Notes in Artificial Intelligence. vol. LNAI 10458. ISSN 0302-9743. ISBN 978-3-319-66428-6.
Rok: 2017

DOI: 10.1007/978-3-319-66429-3_42
Odkaz: https://doi.org/10.1007/978-3-319-66429-3_42
Pracoviště: Katedra teorie obvodů
Anotace:
The paper presents the design of Czech casual speech recognition which is a part of the wider research focused on understanding very informal speaking styles. The study was carried out using the NCCCz corpus and the contributions of optimized acoustic and language models as well as pronunciation lexicon optimization were analyzed. Special attention was paid to the impact of publicly available corpora suitable for language model (LM) creation. Our final DNN-HMM system achieved in the task of casual speech recognition WER of 30-60% depending on LM used. The results of recognition for other speaking styles are presented as well for the comparison purposes. The system was built using KALDI toolkit and created recipes are available for the research community.

KALDI Recipes for the Czech Speech Recognition Under Various Conditions

Autoři: Mizera, P., Fiala, J., Brich, A., doc. Ing. Petr Pollák, CSc.,
Publikace: Text, Speech, and Dialogue. 19th International Conference, TSD 2016. Heidelberg: University of Heidelberg, 2016. Lecture Notes in Artificial Intelligence. ISSN 0302-9743. ISBN 978-3-319-45510-5.
Rok: 2016

DOI: 10.1007/978-3-319-45510-5_45
Odkaz: https://doi.org/10.1007/978-3-319-45510-5_45
Pracoviště: Katedra teorie obvodů
Anotace:
The paper presents the implementation of Czech ASR system under various conditions using KALDI speech recognition toolkit in two standard state-of-the-art architectures (GMM-HMM and DNN-HMM). We present the recipes for the building of LVCSR using SpeechDat, SPEECON, CZKCC, and NCCCz corpora with the new update of feature extraction tool CtuCopy which supports currently KALDI format. All presented recipes same as CtuCopy tool are publicly available under the Apache license v2.0. Finally, an extension of KALDI toolkit which supports the running of described LVCSR recipes on MetaCentrum computing facilities (Czech National Grid Infrastructure operated by CESNET) is described. In the experimental part the baseline performance of both GMM-HMM and DNN-HMM LVCSR systems applied on given Czech corpora is presented. These results also demonstrate the behaviour of designed LVCSR under various acoustic conditions same as various speaking styles.

Advanced Acoustic Modelling Techniques in MP3 Speech Recognition

Autoři: Borský, M., doc. Ing. Petr Pollák, CSc., Mizera, P.
Publikace: EURASIP Journal on Audio Speech and Music Processing. 2015, 2015:20 ISSN 1687-4722.
Rok: 2015

DOI: 10.1186/s13636-015-0064-7
Odkaz: https://doi.org/10.1186/s13636-015-0064-7
Pracoviště: Katedra teorie obvodů
Anotace:
The automatic recognition of MP3 compressed speech presents a challenge to the current systems due to the lossy nature of compression which causes irreversible degradation of the speech wave. This article evaluates the performance of a recognition system optimized for MP3 compressed speech with current state-of-the-art acoustic modelling techniques and one specific front-end compensation method. The article concentrates on acoustic model adaptation, discriminative training and additional dithering as a prominent means of compensating for the described distortion in the task of phoneme and large vocabulary continuous speech recognition (LVCSR). The experiments presented on the phoneme task show a dramatic increase of the recognition error for unvoiced speech units as a direct result of compression. The application of acoustic model adaptation has proved to yield the highest relative contribution while the gain of discriminative training diminished with decreasing bit-rate. The application of additional dithering yielded a consistent improvement only for the MFCC features, but the overall results were still worse than those for the PLP features.

Analysis and automatic recognition of compressed speech

Autoři: Borský, M., doc. Ing. Petr Pollák, CSc.,
Publikace: Tackling the Complexity in Speech. Praha: Filozofická fakulta Univerzity Karlovy v Praze, 2015. p. 205-221. Opea Facultatis philosophicae Universitatis Carolinae Pragensis. vol. 14. ISBN 978-80-7308-558-2.
Rok: 2015

Pracoviště: Katedra teorie obvodů
Anotace:
The deployment of automatic speech recognition (ASR) systems into real-life are often met with difficulties of diverse acoustic conditions. This diversity is what forces the necessity to build the systems as robust to ensure their reliable performance regardless of the conditions. The usage of MP3 compression represents one of such conditions, when the property of lossy encoding degrades the quality of extracted features and therefore the recognition. The research of optimized settings for MP3 recognition has been conducted by various authors and different solutions have been proposed. This work presents the analysis of optimized setup which was focused on blocks of feature extraction and acoustic modeling. The work summarizes the effects of methods proposed the author and other authors, all tested to determine the potential contribution of each method separately as well as in unison. The main goal of the optimization was to find the proper segmentation, determine the importance of feature normalization and dithering and the application of acoustic model adaptation. The experiments were performed on signals of very good quality which were artificially compressed to simulate the effect of the spectral distortion. The PLP features were extracted and normalized using CMVN and various levels of noise were added. The main purpose was to reduce the effect of spectral distortion brought by compression. The context dependent AMs were trained for RAW data and 160kbit, 32kbit, 24kbit, 16kbit compression speeds. The final AMs were adapted by CMLLR and MAP techniques. The goal of adaptation was to further improve the AM quality and to test the model interchangeability. The recognition was done on LVCSR task of 1 hour with trigram LM.

Improved Estimation of Articulatory Features Based on Acoustic Features with Temporal Context

Autoři: Mizera, P., doc. Ing. Petr Pollák, CSc.,
Publikace: Text, Speech, and Dialogue. 18th International Conference, TSD 2015. Heidelberg: Springer, 2015. pp. 560-568. Lecture Notes in Artificial Intelligence. ISSN 0302-9743. ISBN 978-3-319-24032-9.
Rok: 2015

DOI: 10.1007/978-3-319-24033-6_63
Odkaz: https://doi.org/10.1007/978-3-319-24033-6_63
Pracoviště: Katedra teorie obvodů
Anotace:
The paper deals with neural network-based estimation of articulatory features for Czech which are intended to be applied within automatic phonetic segmentation or automatic speech recognition. In our current approach we use the multi-layer perceptron networks to extract the articulatory features on the basis of non-linear mapping from standard acoustic features extracted from speech signal. The suitability of various acoustic features and the optimum length of temporal context at the input of used network were analysed. The temporal context is represented by a context window created from the stacked feature vectors. The optimum length of the temporal contextual information was analysed and identified for the context window in the range from 9 to 21 frames.We obtained 90.5% frame level accuracy on average across all the articulatory feature classes for mellog filter-bank features. The highest classification rate of 95.3% was achieved for the voicing class.

Phonetic Segmentation Using KALDI and Reduced Pronunciation Detection in Causal Czech Speech

Autoři: Patč, Z., Mizera, P., doc. Ing. Petr Pollák, CSc.,
Publikace: Text, Speech, and Dialogue. 18th International Conference, TSD 2015. Heidelberg: Springer, 2015. p. 433-441. Lecture Notes in Artificial Intelligence. ISSN 0302-9743. ISBN 978-3-319-24032-9.
Rok: 2015

DOI: 10.1007/978-3-319-24033-6_49
Odkaz: https://doi.org/10.1007/978-3-319-24033-6_49
Pracoviště: Katedra teorie obvodů
Anotace:
The paper describes the implementation of phonetic segmentation using the tools from KALDI toolkit. Its usage is motivated by the big development and support of topical techniques of ASR which are available in KALDI. The presented work is related to the research on pronunciation variability in casual Czech speech. For this purpose we use the automatic phonetic segmentation to analyze the particular phone boundaries, deletions, etc. We also present the tool for pronunciation detection. Both tools can be used for processing large databases as well as for an interactive work within the environment of Praat. Also the illustrative analysis of the segmentation accuracy and the design of new environment for phonetic segmentation in Praat are presented.

Spectrally Selective Dithering for Distorted Speech Recognition

Autoři: Borský, M., Mizera, P., doc. Ing. Petr Pollák, CSc.,
Publikace: INTERSPEECH 2015. Bochum: ISCA - International Speech Communication Association, 2015. ISSN 2308-457X.
Rok: 2015

Pracoviště: Katedra teorie obvodů
Anotace:
The performance of speech recognition systems can be significantly degraded if the speech spectrum is distorted. This includes situations such as the usage of an improper recording device, enhancement technique or speech coder. This paper presents a front-end compensation method called spectrally selective dithering aimed at reconstructing the spectral characteristics of nonlinearly distorted speech. The technique is designed to detect the suppressed frequency bands in the speech signal and add a weighted amount of additive noise. The detection algorithm is based on the smoothness of the excitation signal spectrum obtained through analyzing LPC filtration. The gain of the added noise is estimated from the unaffected frequency bands. The practical usability of the algorithm has been studied in the task of MP3 speech recognition for very low bit-rates. The obtained results have demonstrated the advantage of using the proposed technique. We achieved up to 1.85% absolute WER reduction using the standard HMM-GMM architecture in LVCSR task.

Estimation of Articulatory Features for Czech Language

Autoři: Mizera, P., doc. Ing. Petr Pollák, CSc.,
Publikace: 22nd Czech-German Workshop on Speech Communication. Book of Abstracts. 2014. pp. 25-26.
Rok: 2014

Pracoviště: Katedra teorie obvodů
Anotace:
The issues of automatic speech recognition (ASR) aimed at the Czech language have been intensively studied in the past decades. The researches have successfully managed to develop several practical applications such as dictation programs, automatic broadcast transcription (subtitling) and others. Accuracy of these ASR systems is generally satisfactory high, however it is significantly lower if the signal is corrupted, e.g. in the case of high-level background noise, spontaneous speech or when speech is masked and pronounced in a reduced form. These issues are still an obstacle for a wider usage of voice recognition technology under such conditions, because commonly achieved WER (Word Error Rate) of spontaneous speech recognition is above 50% in average. A possible solution to overcome this deficiency can be in the usage of speech production knowledge within ASR systems. Consequently, the speech production knowledge based on articulatory features (AFs) starts being used more often at feature level with the main purpose of improving the recognition of spontaneous or casual speech. The aim of our research is to analyse the possible contribution of articulatory features to the description of spontaneous or casual speech aimed for the Czech language.

Impact of Irregular Pronunciation on Phonetic Segmentation of Nijmegen Corpus of Casual Czech

Autoři: Mizera, P., doc. Ing. Petr Pollák, CSc., Kolman, A., Ernestus, M.
Publikace: Text, Speech, and Dialogue. 17th International Conference, TSD 2014. Heidelberg: Springer, 2014. pp. 499-507. Lecture Notes in Artificial Intelligence. ISSN 0302-9743. ISBN 978-3-319-10815-5.
Rok: 2014

DOI: 10.1007/978-3-319-10816-2_60
Odkaz: https://doi.org/10.1007/978-3-319-10816-2_60
Pracoviště: Katedra teorie obvodů
Anotace:
This paper describes the pilot study of phonetic segmentation applied to Nijmegen Corpus of Casual Czech (NCCCz). This corpus contains informal speech of strong spontaneous nature which influences the character of produced speech at various levels. This work is the part of wider research related to the analysis of pronunciation reduction in such informal speech. We present the analysis of the accuracy of phonetic segmentation when canonical or reduced pronunciation is used. The achieved accuracy of realized phonetic segmentation provides information about general accuracy of proper acoustic modelling which is supposed to be applied in spontaneous speech recognition. As a byproduct of presented spontaneous speech segmentation, this paper also describes the created lexicon with canonical pronunciations of words in NCCCz, a tool supporting pronunciation check of lexicon items, and finally also a minidatabase of selected utterances from NCCCz manually labelled on phonetic level suitable for evaluation purposes.

Recognition of Spectrally Distorted Speech after MP3 Compression

Autoři: Borský, M., doc. Ing. Petr Pollák, CSc.,
Publikace: 22nd Czech-German Workshop on Speech Communication. Book of Abstracts. 2014. pp. 3-4.
Rok: 2014

Pracoviště: Katedra teorie obvodů
Anotace:
The deployment of automatic speech recognition (ASR) systems into real-life are often met with difficulties of diverse acoustic conditions. This diversity is what forces the necessity to build the systems as robust to ensure their reliable performance regardless of the conditions. The usage of MP3 compression represents one of such conditions, when the property of lossy encoding degrades the quality of extracted features and therefore the recognition. The research of optimized settings for MP3 recognition has been conducted by various authors and different solutions have been proposed. This work presents the analysis of optimized setup which was focused on blocks of feature extraction and acoustic modeling. The work summarizes the effects of methods proposed the author and other authors, all tested to determine the potential contribution of each method separately as well as in unison.

Robust Neural Network-Based Estimation of Articulatory Features for Czech

Autoři: Mizera, P., doc. Ing. Petr Pollák, CSc.,
Publikace: Neural Network World. 2014, 24(5), 463-478. ISSN 1210-0552.
Rok: 2014

DOI: 10.14311/NNW.2014.24.027
Odkaz: https://doi.org/10.14311/NNW.2014.24.027
Pracoviště: Katedra teorie obvodů
Anotace:
The article describes a neural network-based articulatory feature (AF) estimation for the Czech speech. First, the relationship between AFs and a Czech phone inventory is defined, and then the estimation based on the MLP neural networks is done. The usage of several speech representations on the input of the MLP classifiers is proposed with the purpose to obtain a robust AF estimation. The realized experiments have proved that an ANN- based AF estimation works very reliably especially in a low noise environment. Moreover, in case the number of neurons in a hidden layer is increased and if the temporal context DCT-TRAP features are used on the input of the MLP network, the AF classification works accurately also for the signals collected in the environments with a high background noise.

Speech reduction in Czech

Autoři: Kolman, A., doc. Ing. Petr Pollák, CSc.,
Publikace: LabPhone 14. The 14th Conference on Laboratory Phonology. Tokyo: National Institute for Japanese Linguistics in Tokyo, 2014, Available from: http://www.ninjal.ac.jp/labphon14/LP14_FINAL_20140708.pdf
Rok: 2014

Pracoviště: Katedra teorie obvodů
Anotace:
The present study contributes to this research by investigating speech reduction in Czech. Our study is based on the Nijmegen Corpus of Casual Czech, which was recorded in Prague in November 2008, and consists of 39 hours of casual conversations between 26 groups of three friends. We studied speech reduction in this corpus by focusing on a number of frequent words and frequent phoneme sequences. First, we see patterns that have also been observed in other languages. Second, Czech also shows clear effects of morphology, which has not been attested for other languages so far. A third interesting topic concerns syllabic consonants and we find that the segment's probability to be absent is modulated by the complexity of the resulting consonant cluster. Our study of Czech clearly shows that it is worthwhile to extend the study of reduction to typologically different languages.

The Nijmegen Corpus of Casual Czech

Autoři: Ernestus, M., Kockova-Amortova, L., doc. Ing. Petr Pollák, CSc.,
Publikace: Proceedings of the 9th Language Resources and Evaluation Conference. Paris: ELRA - European Language Resources Association, 2014. ISBN 978-2-9517408-8-4.
Rok: 2014

Pracoviště: Katedra teorie obvodů
Anotace:
This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which contains more than 30 hours of high-quality recordings of casual conversations in Common Czech, among ten groups of three male and ten groups of three female friends. All speakers were native speakers of Czech, raised in Prague or in the region of Central Bohemia, and were between 19 and 26 years old. Every group of speakers consisted of one confederate, who was instructed to keep the conversations lively, and two speakers naive to the purposes of the recordings. The naive speakers were engaged in conversations for approximately 90 minutes, while the confederate joined them for approximately the last 72 minutes. The corpus was orthographically annotated by experienced transcribers and this orthographic transcription was aligned with the speech signal. In addition, the conversations were videotaped. This corpus can form the basis for all types of research on casual conversations in Czech, including phonetic research and research on how to improve automatic speech recognition. The corpus will be freely available.

The optimization of PLP feature extraction for LVCSR recognition of MP3 data

Autoři: Borský, M., doc. Ing. Petr Pollák, CSc.,
Publikace: 19th International Conference on Applied Electronics 2014. Pilsen: University of West Bohemia, 2014. p. 55-58. ISSN 1803-7232. ISBN 978-80-261-0276-2.
Rok: 2014

Pracoviště: Katedra teorie obvodů
Anotace:
This paper analyses the contribution of optimized PLP feature extraction setup and application of feature normalization to improve the performance of automatic speech recognition system for data compressed by MP3 algorithm. The experimental study performed on loop-digit recognition and large vocabulary continues speech recognition task showed that proper setup can negate the effect of lower compression rates which can achieve results comparable with higher rates. The second finding is that the normalization techniques contribute significantly to overall performance, specially for shorter windows/shifts and lower compression rates. The acoustic models trained on 160kbits/s, 32kbits/s and 16kbits/s data performed at 34.17%, 41.88% and 36.4% WER respectively on LVCSR task. In comparison the noncompressed acoustic models performed at 28.56% WER.

Accuracy of HMM-Based Phonetic Segmentation Using Monophone or Triphone Acoustic Model

Autoři: Mizera, P., doc. Ing. Petr Pollák, CSc.,
Publikace: Applied Electronics - 2013 International Conference on Applied Electronics. Pilsen: University of West Bohemia, 2013. pp. 181-184. ISSN 1803-7232. ISBN 978-80-261-0166-6.
Rok: 2013

Pracoviště: Katedra teorie obvodů
Anotace:
The paper compares the accuracy of HMM-based automatic phonetic segmentation using various signal representation same as acoustic models of various complexity, i.e. acoustic models of monophones or word-internal triphones with various number of mixtures. The precision of automatic phonetic segmentation was measured on the basis of comparison with manually segmented speech data. The analysis showed that the segmentation with acoustic models of word-internal triphones yielded to a better target accuracy. The best results of automatic phonetic segmentation were attained for acoustic models of word-internal triphones with four mixures. In this case average values of shift of phone boundaries and change of phone length was about 5.9~ms and 0.2~ms .

Noise and Channel Normalized Cepstral Features for Far-Speech Recognition

Autoři: Borský, M., Mizera, P., doc. Ing. Petr Pollák, CSc.,
Publikace: Speech and Computer. Cham: Springer International Publishing AG, 2013. pp. 241-248. Lecture Notes in Artificial Intelligence. ISSN 0302-9743. ISBN 978-3-319-01930-7.
Rok: 2013

DOI: 10.1007/978-3-319-01931-4_32
Odkaz: https://doi.org/10.1007/978-3-319-01931-4_32
Pracoviště: Katedra teorie obvodů
Anotace:
The paper analyses suitable features for distorted speech recognition. The aim is to explore the application of command ASR system when the speech is recorded with far-distance microphones with a possible strong additive and convolutory noise. The paper analyses feasible contribution of basic spectral subtraction coupled with cepstral mean normalization in minimizing of the influence of present distortion in such far-talk channel. The results are compared with reference close-talk speech recognition system. The results show the improvement in WER for channels with low or medium SNR. Using the combination of these basic techniques WERR of 55.6% was obtained for medium distance channel and WERR of 22.5% for far distance channel.

Optimized State-Tying for Triphone-Based HMMs under Training Data Deficiency

Autoři: Borský, M., doc. Ing. Petr Pollák, CSc.,
Publikace: Applied Electronics - 2013 International Conference on Applied Electronics. Pilsen: University of West Bohemia, 2013. pp. 45-48. ISSN 1803-7232. ISBN 978-80-261-0166-6.
Rok: 2013

Pracoviště: Katedra teorie obvodů
Anotace:
This paper deals with an optimization of state-tying for triphone-based HMM in the case of training data deficiency. The main goal is to analyse the importance of stopping threshold for criterial function in tree-based clustering. The log-likelihood measure was used as the criterial function, when a varying threshold with different sizes of training set was evaluated. Tied- state triphone HMMs with multiple Gaussian mixtures were trained under various setups. Realized experiments showed that the more complex AMs with less mixtures added could achieve better results that less complex models with more mixtures. The same conclusion was proved for even significantly reduced amount of training data.

Various Approaches of Small Vocabulary Speech Recognizer Implementation Using HTK Toolkit

Autoři: Borský, M., doc. Ing. Petr Pollák, CSc.,
Publikace: POSTER 2013 - 17th International Student Conference on Electrical Engineering. Prague: Czech Technical University, 2013. pp. 1-5. ISBN 978-80-01-05242-6.
Rok: 2013

Pracoviště: Katedra teorie obvodů
Anotace:
This paper presents the construction of the small vocabulary recognizer using publicly available the HTK toolkit. There are available two decoders, HVite and HDecoder, for which the aprroaches of recognizer creation are described com- monly with the description of proper acoustic model cre- ation because slightly different kinds of subword acoustic models are required by these two tools. In the experimental part, both decoders were evaluated on thebasis of loop-digit recognition task with word and cross-word triphone based AMs. The computational costs of described approaches are compared as well.

ANALÝZA POTLAČOVÁNÍ AKUSTICKÉHO ECHA A DTD DETEKCE V CHYTRÝCH TELEFONECH

Autoři: Klapuch, J., doc. Ing. Petr Pollák, CSc.,
Publikace: 20th Annual Conference Proceeding's Technical Computing Bratislava 2012. Praha: Humusoft, 2012. pp. 1-8. ISBN 978-80-970519-4-5.
Rok: 2012

Pracoviště: Katedra teorie obvodů
Anotace:
Článek se zabývá testováním algoritmů pro potlačování akustického echa a detekcí společné promluvy v prostředí chytrých mobilních telefonů. Akustické echo společně se šumem prostředí představuje obecně velmi nežádoucí jev v telekomunikacích a zejména v mobilních telefonech. Konstrukce současných mobilních telefonů a jejich použití způsobuje vysokou úroveň echa i zkreslení, čímž vzniká náročnější úloha než v případě klasických pevných telefonních systémů. Hlavní úlohou bylo především vytvořit vhodný detektor DTD, jenž je významným prvkem při potlačování echa. Byl analyzován detektor založený na průměrné koherenci signálů reproduktoru a mikrofonu s využitím kepstrální detekce řeči v obou kanálech. K tomuto a dalším účelům byla vytvořena GUI aplikace v MATLABu, umožňující provádět analýzu a testování algoritmů. Simulace byly prováděny na nově vytvořené řečové databázi, jenž reprezentovala mobilní komunikaci. Vyhodnocení DTD bylo prováděno statisticky jako míra chybné detekce v celém signálu a zvlášť jako míra chybné detekce v kritických segmentech signálu, která jsou důležitá pro správný běh systému AEC.

Knowledge-Based and Automated Clustering in MLLR Adaptation of Acoustic Models for LVCSR

Autoři: Borský, M., doc. Ing. Petr Pollák, CSc.,
Publikace: 2012 International Conference on Applied Electronics. Pilsen: University of West Bohemia, 2012. pp. 33-36. ISSN 1803-7232. ISBN 978-80-261-0038-6.
Rok: 2012

Pracoviště: Katedra teorie obvodů
Anotace:
This paper describes the analysis of the performance of MLLR-based speaker adaptation in a large vocabulary continuous speech recognition system. Two different approaches of clustering in MLLR-adaptation with more regression classes, knowledge-based clustering and automatic clustering were analysed. The contribution of mentioned acoustic model adaptation using these two clustering approaches were compared based on the word error rate ratio (WERR) of target LVCSR. Realized study proved that the knowledge-based clustering may bring improvement comparable to the tree-based clustering, when only a few transformation classes are manually defined.

Odhad základního tónu řeči s lokalizací hlasivkových pulsů a pitch-synchronní segmentace

Autoři: Mizera, P., doc. Ing. Petr Pollák, CSc.,
Publikace: 20th Annual Conference Proceeding's Technical Computing Bratislava 2012. Praha: Humusoft, 2012. pp. 1-8. ISBN 978-80-970519-4-5.
Rok: 2012

Pracoviště: Katedra teorie obvodů
Anotace:
Článek prezentuje výsledky analýzy přesnosti standardních algoritmů odhadu základní frekvence a vliv jednotlivých algoritmů PDA na přesnost lokalizace hlasivkových pulsů v realizovaném algoritmu PMA. Analýza je zaměřena na algoritmy na bázi autokorelační funkce (ACF), rozdílové funkce (AMDF) a bázi kepstra. V článku je prezentován postup pitch-synchronní segmentace a následná resyntéza z rozložených segmentů umožňující změny prozodických charakteristik.

Small and Large Vocabulary Speech Recognition of MP3 Data under Real-Word Conditions: Experimental Study

Autoři: doc. Ing. Petr Pollák, CSc., Borský, M.
Publikace: Communications in Computer and Information Science. 2012, 314 409-419. ISSN 1865-0929.
Rok: 2012

DOI: 10.1007/978-3-642-35755-8_29
Odkaz: https://doi.org/10.1007/978-3-642-35755-8_29
Pracoviště: Katedra teorie obvodů
Anotace:
This paper presents the study of speech recognition accuracy both for small and large vocabulary task with respect to different levels of MP3 compression of processed data. The motivation behind the work was to evaluate the usage of ASR system for off-line automatic transcription of recordings collected from standard present MP3 devices under different levels of background noise and channel distortion. Although MP3 may not be an optimal compression algorithm, the performed experiments have prooved that it does not distort speech signal significantly for higher compression rates. Realized experiments showed also that the accuracy of speech recognition (both small- and large-vocabulary) decreased very slowly for the bit-rate of 24 kbps and higher. However, slightly different setup of speech feature computation is necessary for MP3 speech data, mainly PLP features give significantly better results in comparison to MFCC.

Accuracy of MP3 Speech Recognition Under Real-World Conditions. Experimental Study

Autoři: doc. Ing. Petr Pollák, CSc., Běhunek, M.
Publikace: Proceedings of SIGMAP 2011 - International Conference on Signal Processing and Multimedia Applications.. Sevilla: University of Seville, 2011. pp. 5-10. ISBN 978-989-8425-72-0.
Rok: 2011

Pracoviště: Katedra teorie obvodů
Anotace:
This paper presents the study of speech recognition accuracy with respect to different levels of MP3 compression. Special attention is focused on the processing of speech signals with different quality, i.e. with different level of background noise and channel distortion. The work was motivated by possible usage of ASR for offline automatic transcription of audio recordings collected by standard wide-spread MP3 devices. The realized experiments have proved that although MP3 format does not distort speech significantly especially for high or moderate bit rates and high quality of source data. The accuracy of connected digits ASR decreased very slowly up to the bit rate 24 kbps. For the best case of PLP parameterization in close-talk channel just 3% decrease of recognition accuracy was observed while the size of the compressed file was approximately 10% of the original size. All results were slightly worse under presence of additive background noise and channel distortion.

ASR systems in Noisy Environment: Analysis and Solutions for Increasing Noise Robustness

Autoři: Rajnoha, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Radioengineering. 2011, 20(1), 74-84. ISSN 1210-2512.
Rok: 2011

Pracoviště: Katedra teorie obvodů
Anotace:
This paper deals with the analysis of Automatic Speech Recognition (ASR) suitable for usage within noisy environment and suggests optimum configuration under various noisy conditions. The behavior of standard parameterization techniques was analyzed from the viewpoint of robustness against background noise. It was done for Mel-frequency cepstral coefficients (MFCC), Perceptual linear predictive (PLP) coefficients, and their modified forms combining main blocks of PLP and MFCC. The second part is devoted to the analysis and contribution of modified techniques containing frequency-domain noise suppression and voice activity detection. The above-mentioned techniques were tested with signals in real noisy environment within Czech digit recognition task and AURORA databases. Finally, the contribution of special VAD selective training and MLLR adaptation of acoustic models were studied for various signal features.

Coverage of Spontaneous Conversational Speech from Nijmegen Corpus of Casual Czech by General ASR Language Models

Autoři: Procházka, V., doc. Ing. Petr Pollák, CSc.,
Publikace: Workshop Production and Comprehension of Conversational Speech. Radboud University Nijmegen, 2011. pp. 34-35.
Rok: 2011

Pracoviště: Katedra teorie obvodů
Anotace:
The Large Vocabulary Continuous Speech Recognition (LVCSR) as one of the frequent applications of speech technology is being applied nowadays in growing number of applications in everyday human life. Consequently, also the need of spontaneous speech recognition arises, however, such speech has strongly different character in comparison to non-spontaneous speech. Then such specific phenomena are not supposed to be covered by standard general Language Model (LM). In this contribution we will analyze Nijmegen Corpus of Causal Czech (NCCCz) from the point of view of several LMs which are publicly available. We will analyze the rate of Out-Of-Vocabulary (OOV) words, the rate of word fractions, repetitions, or repeated starts, the perplexity computed at text level above transcription of NCCCz, LVCSR performance above recordings using above mentioned LMs.

Performance of Czech Speech Recognition with Language Models Created from Public Resources

Autoři: Procházka, V., doc. Ing. Petr Pollák, CSc., Žďánský, J., Nouza, J.
Publikace: Radioengineering. 2011, 40(4), 1002-1008. ISSN 1210-2512.
Rok: 2011

Pracoviště: Katedra teorie obvodů
Anotace:
In this paper, we investigate the usability of publicly available n-gram corpora for the creation of language models (LM) applicable for Czech speech recognition systems. N-gram LMs with various parameters and settings were created from two publicly available sets, Czech Web 1T 5-gram corpus provided by Google and 5-gram corpus created from the Czech National Corpus. We tested also a LM made of a large private resource of newspaper and broadcast texts collected by a Czech media mining company. The LMs were analyzed and compared via their perplexity rates and when employed in large vocabulary continuous speech recognition systems. Our study show that the Web1T-based LMs, even after intensive cleaning and normalization procedures, cannot compete with those made of smaller but more consistent corpora. The experiments done on large test data also illustrate the impact of Czech as highly inflective language on the perplexity, OOV, and recognition accuracy rates.

Analysis of Czech Web 1T 5-gram corpus and its comparison with Czech National Corpus Data

Autoři: Procházka, V., doc. Ing. Petr Pollák, CSc.,
Publikace: Lecture Notes in Artificial Intelligence. 2010, 6231(2010933819), 181-188. ISSN 0302-9743.
Rok: 2010

DOI: 10.1007/978-3-642-15760-8_24
Odkaz: https://doi.org/10.1007/978-3-642-15760-8_24
Pracoviště: Katedra teorie obvodů
Anotace:
In this paper, newly issued Czech Web 1T 5-grams corpus created by Google and LDC is analysed and compared with reference n-gram corpus obtained from Czech National Corpus. Original 5-grams from both corpora were post-processed and statistical trigram language models of various vocabulary sizes and parameters were created. The comparison of various corpus statistics such as unique and total word and n-gram counts before and after post-processing is presented and discussed, especially with the focus on clearing Web 1T data from invalid tokens. The tools from HTK Toolkit were used for the evaluation and accuracy, OOV rates and perplexity were measured using sentence transcriptions from Czech SPEECON database.

Detekce řečové aktivity na bázi HMM a GMM modelování

Autoři: Tatarinov, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Akustické listy. 2010, 16(2-3), 5-13. ISSN 1212-4702.
Rok: 2010

Pracoviště: Katedra teorie obvodů
Anotace:
Článek popisuje několik algoritmů detekce řečové aktivity, která představuje významnou úlohu řešenou v oblasti zpracování řeči a která je stále intenzivně rozvíjena v současném výzkumu v celosvětovém měřítku. V tomto článku jsou popisované algoritmy na bázi gaussovských směsí (GMM) a skrytých Markovových modelů (HMM) včetně analýzy použití vhodných parametrů. Studované detektory byly srovnány s referenčními energetickými a kepstrálními detektory a také s lagoritmem na bázi normy ITU-T G.729. Experimenty byly provedeny se signály z databáze CZKCC obsahující promluvy z jedoucího automobilu, kde se prokázal významný přínos použití GMM a HMM detektorů, zejména pro silněji zarušené signály.

Multi-Channel Database of Spontaneous Czech with Synchronization of Channels Recorded by Independent Devices

Autoři: doc. Ing. Petr Pollák, CSc., Rajnoha, J.
Publikace: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10). Paris: ELRA, 2010. ISBN 2-9517408-6-7.
Rok: 2010

Pracoviště: Katedra teorie obvodů
Anotace:
This paper describes Czech spontaneous speech database of lectures collected at Czech Technical University in Prague, commonly with the procedure of its recording and annotation. In this article, special attention is paid to the description of time synchronizations of signals recorded by two independent devices. This synchronization is based on cross-correlation analysis with simple automated selection of suitable short signal subparts. The database contains 21.7 hours of speech material recorded in 4 channels with 3 principally different microphones. The annotation of the database is composed from basic time segmentation, orthographic transcription, pronunciation lexicon, session and speaker information, and the documentation. The collection and annotation of this database is complete and its availability via ELRA is currently under preparation.

Příprava a analýza Českého Web 1T 5-gram korpusu pro použití v jazykovém modelu

Autoři: Procházka, V., doc. Ing. Petr Pollák, CSc.,
Publikace: Analýza a zpracování řečových a biologických signálů - sborník prací 2010. Praha: České vysoké učení technické v Praze, 2010. pp. 67-73. ISBN 978-80-01-04680-7.
Rok: 2010

Pracoviště: Katedra teorie obvodů
Anotace:
V této práci je popsán postup analýzy českého Web 1T 5-gram korpusu. Korpus byl analyzován a byly vyhodnoceny jeho základní charakteristiky před a v průběhu zpracování. Při zpracování byl slovník korpusu filtrován různými metodami, tak aby pokud možno obsahoval pouze smysluplná slova. Z pročištěného korpusu byly vygenerovány jazykové modely pro Large Vocabulary Continuous Speech Recognition (LVCSR) a spočítána jejich perplexita. Pro srovnání stejnými filtrovacími postupy byl také zpracovaný 5- gramový korpusu založený na SYN2006PUB korpusu který sestavil Český národní korpus (ČNK).

Tvorba rozpoznávače plynulých promluv v českém jazyce standardními nástroji HTK

Autoři: Rajnoha, J., Procházka, V., doc. Ing. Petr Pollák, CSc.,
Publikace: Akustické listy. 2010, 16(1), 5-10. ISSN 1212-4702.
Rok: 2010

Pracoviště: Katedra teorie obvodů
Anotace:
Článek popisuje tvorbu rozpoznávače spojité řeči s velkým slovníkem pro češtinu pomocí nástrojů HTK. Standardní postup prezentovaný v dokumentaci k HTK je doplněn o jazykově závislé zvláštnosti. Článek přináší přehled jednotlivých kroků pro rychlé vytvoření systému pro první experimenty s rozpoznáváním spojité řeči, které sice není optimální z hlediska rychlosti i dosažitelné přesnosti, avšak umožňuje flexibilitu při testování modifikací dílčích modulů rozpoznávače spojité řeči. Článek také popisuje tvorbu trifónových modelů s mezislovním kontextem a základní postup pro vytvoření jazykového modelu. Nakonec jsou prezentovány experimentální výsledky pro vyvážené nastavení dosažitelné rychlosti a přesnosti systému. Daný systém pracuje v současnosti 1,5-2 krát pomaleji, než je požadované minimum pro běh v reálném čase, s akceptovatelnou přesností pro rozpoznávání se středně velkým slovníkem.

Accuracy Analysis of Generalized Pronunciation Variant Selection in ASR Systems

Autoři: Hanžl, V., doc. Ing. Petr Pollák, CSc.,
Publikace: Lecture Notes in Artificial Intelligence. 2009, 5641(2009931057), 399-408. ISSN 0302-9743.
Rok: 2009

DOI: 10.1007/978-3-642-03320-9_37
Odkaz: https://doi.org/10.1007/978-3-642-03320-9_37
Pracoviště: Katedra teorie obvodů
Anotace:
Automated speech recognition (ASR) systems work typically with pronunciation dictionary for generating expected phonetic content of particular words in recognized utterance. But the pronunciation can vary in many situations. Besides the cases with more possible pronunciation variants specified manually in the dictionary there are typically many other possible changes in the pronunciation depending on word context or speaking style, very typical for our case of Czech language. In this paper we have studied the accuracy of proper selection of automatically predicted pronunciation variants in Czech HMM ASR based systems. We have analyzed correctness of pronunciation variant selection in forced alignment of known utterances. Using the proper pronunciation variant were created mainly for the more accurate training of acoustic HMM models. Finally, the accuracy of LVCSR results using different levels of automated pronunciation generation were tested.

Czech Spontaneous Speech Collection and Annotation: The Database of Technical Lectures

Autoři: Rajnoha, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Lecture Notes in Artificial Intelligence. 2009, 5641(2009931057), 377-385. ISSN 0302-9743.
Rok: 2009

DOI: 10.1007/978-3-642-03320-9_35
Odkaz: https://doi.org/10.1007/978-3-642-03320-9_35
Pracoviště: Katedra teorie obvodů
Anotace:
Applying speech recognition into real working systems, spontaneous speech recognition has increasing importance. So the need of spontaneous speech database is evident and this paper describes the collection of Czech spontaneous data recorded within technical lectures. It should be used as a material for the analysis of particular phenomena which appear within spontaneous speech but also as an extension material for training of spontaneous speech recognizers. Speech signals are captured in two different channels with slightly different quality and about 14 hours of speech from 15 different speakers are currently collected and annotated. The first analyses of spontaneous speech related effects in the collected data have been performed and the comparison with read speech databases is presented.

Design and Utilization of Testing Database for VAD Classification

Autoři: Tatarinov, J., doc. Ing. Petr Pollák, CSc.,
Publikace: 19th Czech-German Workshop on Speech Processing. Prague: Institute of Photonics and Electronics AS CR, 2009. pp. 42-47. ISBN 978-80-86269-18-4.
Rok: 2009

Pracoviště: Katedra teorie obvodů
Anotace:
Voice activity detection is one of the important issues addressed in the current speech processing research. Different VADs are under development by many authors and the need for their objective comparison and evaluation occurres. This article presents the design of VAD testing database together with the description of criteria describing numerically the accuracy of VAD. Recordings used for this database were selected as subsets of CZKCC, CAR2ECS and SPEECON databases. As only transcriptions without time boundaries are in source databases, these boundaries had to be added either manually or using HMMbased forced alignment. Our selections consists of different kind of speech utterances like isolated digits, short commands, names, phonetically rich sentences etc. We have selected also signals recorded in different environments.

Long Recording Segmentation Based on Simple Power Voice Activity Detection with Adaptive Threshold and Post-Processing

Autoři: doc. Ing. Petr Pollák, CSc., Rajnoha, J.
Publikace: SPECOM 2009 Proceedings. St. Petersburg: Institute for Informatics and Automation of RAS (SPIIRAS), 2009. pp. 55-60. ISBN 978-5-8088-0442-5.
Rok: 2009

Pracoviště: Katedra teorie obvodů
Anotace:
This paper describes the method of long recording segmentation based on Voice Activity Detection (VAD). Power based detection using an adaptive threshold derived from power dynamics is the core of presented approach. Simple post-processing based on long time sub-segmentation is used for smoothing of primary VAD output to obtain target start-point and end-point detection of particular utterances within long recordings. Because the algorithm is based on simple power VAD it can be much more easily implemented in comparison to approaches based on speech recognition. Though presented approach is so simple it gives quite robust and satisfactory results for pure segmentation task. The tests with two different data types proved satisfactory results same as practical usage during the creation of new speech corpora.

Robust Speech Recognition in Car Environment Combining Noise Reduction and Acoustic Model Adaptation

Autoři: Rajnoha, J., doc. Ing. Petr Pollák, CSc.,
Publikace: 19th Czech-German Workshop on Speech Processing. Prague: Institute of Photonics and Electronics AS CR, 2009. pp. 27-34. ISBN 978-80-86269-18-4.
Rok: 2009

Pracoviště: Katedra teorie obvodů
Anotace:
This paper presents the study of proper front-end signal processing in combination with model adaptation focused on ASR application in car environment. We present firstly the application of our noise suppression technique within standard feature extraction and analysis of the results achieved with such features under different car conditions. Proposed technique significantly reduces the influence of noisy background. Quite low WER for digit recognition task can be then achieved for noisy data especially in close-talk channel. Secondly, further improvement is reached by the application of MLLR adaptation technique studied mainly with respect to the adaptation to continuously changing background in car environment.

The Dynamic Dimension of the Global Speech-Rhythm Attributes

Autoři: Volín, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Proceedings of Interspeech 2009. Grenoble: International Speech Communication Association, 2009. pp. 1543-1546. ISSN 1990-9772. ISBN 978-1-61567-692-7.
Rok: 2009

Pracoviště: Katedra teorie obvodů
Anotace:
Recent years have revealed that certain global attributes of speech rhythm can be quite successfully captured with respect to consonantal and vocalic intervals in spoken texts. One of the problems of this approach lies in complex syllabic structures. Unless we make an a-priori phonological decision, sonorous consonants may contribute to either vocalic or consonantal part of the speech signal in post-initial and prefinal positions of syllabic onsets and codas. A procedure is offered to avoid phonological dilemmas together with tedious manual work. The method is tested on continuous Czech and English texts read out by several professionals.

Detektory řečové aktivity na bázi perceptivní kepstrální analýzy

Autoři: Rajnoha, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Technical Computing Prague 2008. Praha: Humusoft, 2008. pp. 1-9. ISBN 978-80-7080-692-0.
Rok: 2008

Pracoviště: Katedra teorie obvodů
Anotace:
Tento článek se zabývá popisem a implementací detektoru řečové aktivity (VAD) založeného na perceptivní kepstrální analýze řečového signálu. Kepstrální detek- tory vykazují zvýšenou robustnost vůči šumovému pozadí řeči v porovnání s jed- noduššími algoritmy, např. energetickými. Perceptivní analýza řečového signálu realizovaná použitím vhodné banky filtrů s nelineární frekvenční osou pak lépe extrahuje příznaky řečového signálu použitelné pro tuto detekci. Článek popisuje jednotlivé kroky algoritmu detekce s podrobnějším popisem významných bloků a jejich implementacemi v prostředí MATLAB. Práce srovnává použitý detektor se standardním algoritmem používaným v hlasovém kodeku G.729. V závěru je diskutována možnost využití detektoru v různých aplikacích s příkladem použití detektoru v úloze robustního rozpozná- vání řeči, které přineslo zlepšení úspěšnosti rozpoznání řeči téměř o 50%.

HMM and EHMM Based Voice Activity Detectors and Design of Testing Platform for VAD Classification

Autoři: Tatarinov, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Digital Technologies 2008. Žilina: Žilinská universita, Elektrotechnická fakulta, 2008. pp. 1-4. ISBN 978-80-8070-953-2.
Rok: 2008

Pracoviště: Katedra teorie obvodů
Anotace:
The usage of LR and ergodic Markov models in voice activity detection and VAD testing platform is presented in this article. These detectors based on HMMs and EHMMs reach better results than traditional energy or cepstral detectors. The testing of suggested algorithms were realized with data recorded in running car and the contribution is evident especially in this very noisy environment. Commonly with the results of experiment the selection of the data and the design of the VAD testing platform is described in this paper. Used speech records consists of isolated digits, different commands, names and were recorded in environment of quiet car without engine, running car or standing car with running engine.

Phone Segmentation Tool with Integrated Pronunciation Lexicon and Czech Phonetically Labelled Reference Database

Autoři: doc. Ing. Petr Pollák, CSc., Volín, J., Skarnitzl, R.
Publikace: 6th International Conference on Language Resources and Evaluation. Paris: ELRA - European Language Resources Association, 2008. p. 1-5. ISBN 2-9517408-4-0.
Rok: 2008

Pracoviště: Katedra teorie obvodů
Anotace:
Phonetic segmentation is the procedure which is used in many applications of speech processing, both as a subpart of automated systems or as the tool for an interactive work. In this paper we are presenting the latest development in our tool of automated phonetic segmentation. The tool is based on HMM forced alignment realized by publicly available HTK toolkit. It is implemented into the environment of Praat application and it can be used with several optional settings. The tool is designed for segmentation of the utterances with known orthographic records while phonetic contents are obtained from the pronunciation lexicon or from orthoepic record generated by rules for new unknown words. Second part of this paper describes small Czech reference database precisely labelled on phonetic level which is supposed to be used for the analysis of the accuracy of automatic phonetic segmentation.

Problems and Solutions in the Creation of Czech and Slovak Lexica for Speech Technology Applications: General Experiences and LC-Star2 Lexica

Autoři: doc. Ing. Petr Pollák, CSc., Hanžl, V., Černocký, J., Smrž, P.
Publikace: Digital Technologies 2008. Žilina: Žilinská universita, Elektrotechnická fakulta, 2008. pp. 1-5. ISBN 978-80-8070-953-2.
Rok: 2008

Pracoviště: Katedra teorie obvodů
Anotace:
This paper presents results of interdisciplinary research which is devoted to design and collection of lexica for speech technology applications. Such lexica are required by automated speech recognizers (ASR), text-to-speech synthesis systems (TTS), or translation systems. For the design and creation of such lexica, linguistics or phonetics solutions are sometimes constrained by the nature of ASR or TTS systems. Within this paper, we would like to present our general experiences in this field and also some experiences from creation of Czech and Slovak LC-Star2 Lexica.

Řečové detektory využívající ergodické Markovovské modely

Autoři: Tatarinov, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Technical Computing Prague 2008. Praha: Humusoft, 2008. pp. 1-6. ISBN 978-80-7080-692-0.
Rok: 2008

Pracoviště: Katedra teorie obvodů
Anotace:
Tento článek prezentuje využití ergodických Markovovských modelů při detekci řečové aktivity. Tradiční detektory řeší tuto úlohu pomocí klasifikátorů založených na prahování vhodných řečových charakteristik. V~prezentovaném článku je použit přístup založený na statistickém modelování. Byl navržen klasifikátor a na jeho základě byl sestrojen detektor řečové aktivity. Detektor byly otestován a zhodnocen na signálech z~databáze CAR2CS. Detektor využívající ergodické skryté Markovovy modely dosahuje lepších výsledků než tradiční detektory. Největší přínos prezentovaných detektorů spočívá ve zlepšení klasifikace silně zarušených signálu.

Speaker Non-Speech Event Modelling in Recognition of Read and Spontaneous Speech

Autoři: Rajnoha, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Digital Technologies 2008. Žilina: Žilinská universita, Elektrotechnická fakulta, 2008. pp. 1-6. ISBN 978-80-8070-953-2.
Rok: 2008

Pracoviště: Katedra teorie obvodů
Anotace:
Modelling of non-speech events brings the necessary robustness in the recognition of natural or spontaneous utterances which are usually full of such acoustic disfluencies. This paper presents the solution of speaker non-speech event modelling commonly with the analyzes how efficiently these events are modelled. Firstly, the procedure for efficient training of non-speech event models on read speech data is presented. The results of experiments with simple ASR achieved 26\,\% decrease of word error rate and a~significant decrease of insertion rate with these models. Secondly, the extension of training data with spontaneous speech collection is described. It contributes to the availability of more natural data for training purposes and mainly to the better training of non-speech event models, which is demonstrated by the experiment on filled pause recognition.

Accuracy analysis of phonetic segmentation with multiple word-pronunciation variants and segmetnation tool in Praat environment

Autoři: doc. Ing. Petr Pollák, CSc., Volín, J., Skarnitzl, R.
Publikace: Speech Processing. Prague: Institute of Photonics and Electronics AS CR, 2007. pp. 37-42. ISBN 978-80-86269-00-9.
Rok: 2007

Pracoviště: Katedra teorie obvodů
Anotace:
The paper describes further activities in the research of automated phonetic segmentation of Czech speech. We are dealing with looking for phone boundaries of known utterances with given orthographic record within this research. Realized work presented in this paper was focused on using more pronunciation variants related to available orthography which should yield to automated determination of real phonetic contents of analyzed utterance followed by phone boundaries setting. We are also presenting the tool for phonetic segmentation realized in Praat environment. The designed tool is now used for automated pre-segmentation before further precise labelling on phonetic level.

Automatická fonetická segmentace řečového signálu na bázi HMM a její implementace v prostředí programu Praat

Autoři: doc. Ing. Petr Pollák, CSc.,
Publikace: Acta Universitatis Carolinae: Philologica. 2007, 48(2), 117-129. ISSN 0323-0767.
Rok: 2007

Pracoviště: Katedra teorie obvodů
Anotace:
Fonetická segmetnace je úloha s častými použitím a technologiích zpracování řeči. Typické řešení použité i v tomto případě je založeno na zarovnání HMM modelů jednotlivých fónů. To je použito v nástroji využijícím HTK Toolkit, který je dále vložen do prostředí programu Praat. Uvedený systém je volně dostupný a je použitelný automatickou segmentaci dat velkého rozsahu i pro interativní práci, zejména při následné manuální segmentaci. to bylo hlavní motivací pro spolupráci s odbroníky Fonetického ústavu na Karlově Universite, pod vedením prof. Palkové, v rámci které byla analyzována detailně přesnost segmetnačního algoritmu, která je v průměru asi 10 ms.

HMM-Based Phonetic Segmentation in Praat Environment

Autoři: doc. Ing. Petr Pollák, CSc., Volín, J., Skarnitzl, R.
Publikace: The XII International Conference Speech and Computer - SPECOM 2007. Moscow: Moskovskij gosudarstvennyj universitet im. M. V. Lomonosova, 2007. pp. 537-541. ISBN 6-7452-0110-X.
Rok: 2007

Pracoviště: Katedra teorie obvodů
Anotace:
Phonetic segmentation is required in many applications of current speech technologies. One of the most frequently used methods is based on forced alignment of trained Hidden Markov models of phones. This approach is used in our phonetic segmentation tool which is constructed on the basis of HTK toolkit and integrated with the Praat environment. The system is currently used for Czech language and the required input is speech of known content, i.e. with its orthographic record. The system creates regular orthoepic transcription which is obtained by conversion rules. Exceptions from regular pronunciation can be marked by simple syntax so that forced alignment is finally provided on real phonetic contents of the utterance. The system is available for public usage.

Modified Feature Extraction Methods in Robust Speech Recognition

Autoři: Rajnoha, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Proceedings of 17th International Conference Radioelektronika 2007. Piscataway: Institute of Electrical and Electronic Engineers, 2007. pp. 521-524. ISBN 1-4244-0821-0.
Rok: 2007

DOI: 10.1109/RADIOELEK.2007.371488
Odkaz: https://doi.org/10.1109/RADIOELEK.2007.371488
Pracoviště: Katedra teorie obvodů
Anotace:
The speech recognisers use a parametric form of the signal to get the most important features in speech for the recognition task. Mel-frequency cepstral coefficients (MFCC) and Perceptual linear prediction coefficients (PLP) belong to the most commonly used methods. There is no rule to decide which one is better to use and it depends mainly on the particular conditions. The tests on taking advantage of different parts of each parametrization process to get the best results in given conditions are presented in this paper. Robust Hidden Markov model-based (HMM) Czech digit recogniser in slightly noisy environment is used for this purpose. The experiments show, that using Bark-frequency scaling, equal loudness pre-emphasis and intensity-loudness power law in the original MFCC method can bring improvement in white noise robustness for particular conditions. The results also uncovered that the LP-based methods tend to generate insertion errors in given environment.

Technologie hlasových komunikací

Autoři: prof. Ing. Jan Uhlíř, CSc., prof. Ing. Pavel Sovka, CSc., doc. Ing. Petr Pollák, CSc., Hanžl, V., prof. Ing. Roman Čmejla, CSc.,
Publikace: Praha: Nakladatelství ČVUT, 2007. ISBN 978-80-01-03888-8.
Rok: 2007

Pracoviště: Katedra teorie obvodů
Anotace:
Monografie přináší ucelenou informaci o metodách číslicového zpracování signálů lidské řeči, a to v oblastech přenosu, syntézy a rozpoznávání. Poslouží jako studijní literatura k předmětu Digitální zpracování signálu řeči, který je zahrnut do magisterského programu strukturovaného studia i předmětu Fonetické signály a jejich kódování, který je ve studijním plánu doktorského programu. Jejím posláním je nejen seznámit širší veřejnost s touto vědní disciplínou všeobecně, ale publikovat i některé vědecké výsledky získané při výzkumu v laboratoři katedry teorie obvodů na Fakultě elektrotechnické ČVUT.

Voice Activity Detection in Small Vocabulary Speech Recognition

Autoři: Rajnoha, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Speech Processing. Prague: Institute of Photonics and Electronics AS CR, 2007. pp. 43-48. ISBN 978-80-86269-00-9.
Rok: 2007

Pracoviště: Katedra teorie obvodů
Anotace:
Experiments on using voice activity detection (VAD) as a part of the frame dropping method for suppressing the influence of background noise in speech recognition are presented in this work. A speaker independent phoneme-based Czech digit sequence recogniser working in real environment was used for this purpose. A parametrization-based VAD is used here and the results are compared under different conditions - noisy environment, distribution level and auditory-based signal parametrization. The experiments show, that VAD-based frame dropping signal processing can bring the improvement to the recognition in terms of decreasing the insertion error and increasing the speech model preciseness, reaching for up to 20% word error rate enhancement. But the need for the universal setting of the detection algorithm for general environmental conditions brings the detection inaccuracy, which takes effect in the recognition results.

Analysis of Glottal Stop Presence in Large Speech Corpus and Influence of Its Modelling on Segmentation Accuracy

Autoři: doc. Ing. Petr Pollák, CSc., Volín, J., Skarnitzl, R.
Publikace: Proceedings of the 16th Czech-German Workshop on Speech Processing. Praha: AV ČR, Ústav radiotechniky a elektroniky, 2006. pp. 98-104. ISBN 80-86269-15-9.
Rok: 2006

Pracoviště: Katedra teorie obvodů
Anotace:
The research within phonetic segmentation of real fluent Czech speech is presented this work. The main goal of this work was to overcome segmentation in-accuracy due to missing modelling of glottal-stop. HMM model for glottal-stop was trained using iterative procedure because we are working with training data with no manually annotated information about glottal-stop presence in particular utterances. Experiments realized within this work confirmed the improvement of phonetic segmentation with glottal-stop modelling and also performed localization of glottal-stop presence was generally successful. Finally, the experiments with changes in HMM structure were realized, however the results confirm the improvement for several phone contexts only.

Data-Driven Design of Front-End Filter Bank for Lombard Speech Recognition

Autoři: Bořil, H., Fousek, P., doc. Ing. Petr Pollák, CSc.,
Publikace: Proceedings of Ninth International Conference on Spoken Language Processing. Rundle Mall: CAUSAL Production, 2006. pp. 381-384. ISSN 1990-9772.
Rok: 2006

Pracoviště: Katedra teorie obvodů
Anotace:
Adverse environments not only corrupt speech signal by additive and convolutional noises, which can be successfully addressed by a number of suppression algorithms, but also affect the way how speech is produced. Speech production variations introduced by a speaker in reaction to a noisy background (Lombard effect) may result in a severe degradation of automatic speech recognition. This paper contributes to the solution of Lombard speech recognition issue by providing a robust filter bank for use in front-ends. It is shown that cepstral features derived from the proposed filter bank significantly outperform conventional cepstral features.

Methodology of Lombard Speech Dabase Acquisition: Experiences with CLSD

Autoři: Bořil, H., Bořil, T., doc. Ing. Petr Pollák, CSc.,
Publikace: Proceedings of 5th International Conference on Language Resources and Evaluation. Paris: ELRA - European Language Resources Association, 2006. p. 1644-1647. ISBN 2-9517408-2-4.
Rok: 2006

Pracoviště: Katedra teorie obvodů
Anotace:
Aim of this paper is to describe the hardware platform, scenarios and recording tool used for the acquisition of CLSD?05. A method for minimization of the speech attenuation introduced to the speaker by headphones is proposed in this paper. Finally, contents and corpus of the database are presented to outline its suitability for analysis and modeling of Lombard effect. The whole CLSD?05 database with a detailed documentation is now released for public use.

Modelling of Speaker Non-speech Events in Robust Speech Recognition

Autoři: Rajnoha, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Proceedings of the 16th Czech-German Workshop on Speech Processing. Praha: AV ČR, Ústav radiotechniky a elektroniky, 2006. pp. 149-155. ISBN 80-86269-15-9.
Rok: 2006

Pracoviště: Katedra teorie obvodů
Anotace:
Experiments on modelling of speaker non-speech events (SNE) using robust speech recogniser based on hidden Markov models (HMM) are presented in this work. A speaker independent spoken Czech digits recogniser based on Czech phoneme modelling in real environment was used for this purpose. Only SNEs which are positioned in between words are modelled, as they can be easily added to the recogniser grammar as they were another word. The recognition results were analysed for two different testing datasets, each derived from the training sets (different in environmental conditions). At the end of the experiment the recognition score increased for about 22% and 11% for the used testing datasets against the results reached without modelling the events. The recogniser was also tested on data with unknown recording conditions. Low number of incorrectly inserted word shows that this modelling seem to be less dependent on recording conditions than pure phoneme model case.

Analysis of Lombard Effect in Several Czech Databases

Autoři: Bořil, H., doc. Ing. Petr Pollák, CSc.,
Publikace: Proceedings of the 16th Conference Joined with the 15th Czech-German Workshop "Speech Processing". Dresden: TU Dresden, 2005. pp. 253-259. ISBN 3-938863-17-X.
Rok: 2005

Pracoviště: Katedra teorie obvodů

Comparison of Three Czech Speech Databases from the Standpoint of Lombard Effect Appearance

Autoři: Bořil, H., doc. Ing. Petr Pollák, CSc.,
Publikace: ASIDE 2005 - Applied Spoken Language Interaction in Distributed Environments - Book of Abstracts. Grenoble: International Speech Communication Association, 2005. ISSN 0908-1224. ISBN 87-90834-85-2.
Rok: 2005

Pracoviště: Katedra teorie obvodů

Confronting HMM-based Phone Labelling with Human Evaluation of Speech Production

Autoři: Volín, J., Skarnitzl, R., doc. Ing. Petr Pollák, CSc.,
Publikace: Interspeech Lisboa 2005. Grenoble: International Speech Communication Association, 2005. pp. 1541-1544. ISSN 1018-4074.
Rok: 2005

Pracoviště: Katedra teorie obvodů

Design and Collection of Czech Lombard Speech Database

Autoři: Bořil, H., doc. Ing. Petr Pollák, CSc.,
Publikace: Interspeech Lisboa 2005. Grenoble: International Speech Communication Association, 2005. pp. 1577-1580. ISSN 1018-4074.
Rok: 2005

Pracoviště: Katedra teorie obvodů

Design of Lombard Effect Speech Database

Autoři: Bořil, H., Bořil, T., doc. Ing. Petr Pollák, CSc.,
Publikace: Radioelektronika 2005 - Conference Proceedings. Brno: VUT v Brně, FEI, Ústav radioelektroniky, 2005. pp. 144-147. ISBN 80-214-2904-6.
Rok: 2005

Pracoviště: Katedra teorie obvodů
Anotace:
Speech recognition efficiency decreases remarkably for speech uttered in adverse conditions. Besides the negative impact of speech signal corruption by ambient noise, Lombard effect (LE) represented by speaker modifications of speech characteristics in an effort to increase communication intelligibility results in significant degradation of clean speech recognizer performance. While a lot of attention has been given to noise suppression in speech signals, LE classification and elimination represents a relatively new task, promising further improvements in natural environment speech recognition accuracy. Some speech databases of Czech language include recordings in noisy conditions, e.g. SPEECON and Temic, but in most cases recorded utterances do not contain LE due to low level of environmental noise and/or lack of speaker effort to react to the actual noise. In this paper, design of LE speech database and recording platform for its collection are presented.

HMM Based VAD Using Token Passing Algorithm and Generalized Speech and Silence Models

Autoři: Tatarinov, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Proceedings of the 16th Conference Joined with the 15th Czech-German Workshop "Speech Processing". Dresden: TU Dresden, 2005. pp. 316-322. ISBN 3-938863-17-X.
Rok: 2005

Pracoviště: Katedra teorie obvodů

Influence of HMM´s Parameters on the Accuracy of Phone Segmentation - Evaluation Baseline

Autoři: doc. Ing. Petr Pollák, CSc., Volín, J., Skarnitzl, R.
Publikace: Proceedings of the 16th Conference Joined with the 15th Czech-German Workshop "Speech Processing". Dresden: TU Dresden, 2005. pp. 302-309. ISBN 3-938863-17-X.
Rok: 2005

Pracoviště: Katedra teorie obvodů

LexEdit: GUI to Czech Pronunciation Lexicon for Speech Recognition Purposes

Autoři: Brada, M., doc. Ing. Petr Pollák, CSc.,
Publikace: Proceedings of the 16th Conference Joined with the 15th Czech-German Workshop "Speech Processing". Dresden: TU Dresden, 2005. pp. 260-266. ISBN 3-938863-17-X.
Rok: 2005

Pracoviště: Katedra teorie obvodů

Methods for Speech SNR Estimation: Evaluation Tool and Analysis of VAD Dependency

Autoři: Vondrášek, M., doc. Ing. Petr Pollák, CSc.,
Publikace: Radioengineering. 2005, 14(1), 6-11. ISSN 1210-2512.
Rok: 2005

Pracoviště: Katedra teorie obvodů
Anotace:
The tool can estimate the SNR of noisy speech signal with or without reference signal. The tool can be also used to create a speech and noise mixture with required SNR.

Voice Activity Detector Based on Sample Synchronous Probability Evaluation Using HMM

Autoři: Tatarinov, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Radioelektronika 2005 - Conference Proceedings. Brno: VUT v Brně, FEI, Ústav radioelektroniky, 2005. pp. 440-443. ISBN 80-214-2904-6.
Rok: 2005

Pracoviště: Katedra teorie obvodů

Czech Speech Database for Consumer Devices (SPEECON): Description and Experiences from Collection

Autoři: doc. Ing. Petr Pollák, CSc.,
Publikace: Speech Processing. Praha: AV ČR, Ústav radiotechniky a elektroniky, 2004. pp. 126-128. ISBN 80-86269-10-8.
Rok: 2004

Pracoviště: Katedra teorie obvodů

Direct Time Domain Fundamental Frequency Estimation of Speech in Noisy Conditions

Autoři: Bořil, H., doc. Ing. Petr Pollák, CSc.,
Publikace: EUSIPCO-2004 - Proceedings. Wien: Technische Universität, 2004. pp. 1003-1006. ISBN 3-200-00165-8.
Rok: 2004

Pracoviště: Katedra teorie obvodů

Experiments in Voice Activity Detection Using Hidden Markov Models

Autoři: Tatarinov, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Speech Processing. Praha: AV ČR, Ústav radiotechniky a elektroniky, 2004. pp. 102-105. ISBN 80-86269-11-6.
Rok: 2004

Pracoviště: Katedra teorie obvodů

Hidden Markov Models in Voice Activity Detection

Autoři: Tatarinov, J., doc. Ing. Petr Pollák, CSc.,
Publikace: Robust2004: Robustness Issues in Conversational Interaction. Brussels: COST Office, 2004.
Rok: 2004

Pracoviště: Katedra teorie obvodů

Orthographic and Phonetic Annotation of Very Large Czech Corpora with Quality Assessment

Autoři: doc. Ing. Petr Pollák, CSc., Černocký, J.
Publikace: LREC 2004 - IV. International Conference on Language Resources and Evaluation. Paris: ELRA - European Language Resources Association, 2004. pp. 595-598. ISBN 2-9517408-1-6.
Rok: 2004

Pracoviště: Katedra teorie obvodů

Additive Noise and Channel Distortion-Robust Parameterization Tool - Performance Evaluation on Aurora 2 & 3

Autoři: Fousek, P., doc. Ing. Petr Pollák, CSc.,
Publikace: EUROSPEECH '03. Berlin: ESCA, 2003. pp. 63. ISSN 1018-4074.
Rok: 2003

Pracoviště: Katedra teorie obvodů

Efficient and Reliable Measurement and Simulation of Noisy Speech Background

Autoři: doc. Ing. Petr Pollák, CSc.,
Publikace: Proceeding of the 11th European Signal Processing Conference. Bretagne: ENST, 2002.
Rok: 2002

Pracoviště: Katedra teorie obvodů

Tool for Czech Pronunciation Generation Combining Fixed Rules with Pronunciation Lexicon and Lexicon Management Tool

Autoři: doc. Ing. Petr Pollák, CSc., Hanžl, V.
Publikace: Proceedings of the Third International Conference on Language Resources and Evaluation. Paris: ELRA - European Language Resources Association, 2002. p. 1264-1269. ISBN 2-9517408-0-8.
Rok: 2002

Pracoviště: Katedra teorie obvodů

Czech Pronunciation Lexicon and Annotation of Very Large Databases

Autoři: doc. Ing. Petr Pollák, CSc.,
Publikace: Speech Processing - 11th Czech-German Workshop. Praha: AV ČR, Ústav radiotechniky a elektroniky, 2001. pp. 44-45. ISBN 80-86269-07-8.
Rok: 2001

Pracoviště: Katedra teorie obvodů

Metody odhadu odstupu signálu od šumu v řečovém signálu

Autoři: doc. Ing. Petr Pollák, CSc.,
Publikace: Akustické listy. 2001, 7(3), 14-21. ISSN 1212-4702.
Rok: 2001

Pracoviště: Katedra teorie obvodů

SNR of Noisy Speech and Methods for Its Estimation

Autoři: doc. Ing. Petr Pollák, CSc.,
Publikace: Polish-Czech-Hungarian Workshop on Circuit Theory, Signal Processing and Telecommunications Network. Budapest: Technical University, 2001. pp. 33-40.
Rok: 2001

Pracoviště: Katedra teorie obvodů

SpeechDat-E: Five Eastern European Speech Databases for Voice-Operated Teleservices Completed

Autoři: Huevel, H., Boudy, J., Bakcsi, Z., Černocký, J., Galunov, V., Kochanina, J., Majewski, W., doc. Ing. Petr Pollák, CSc., Rusko, M., Sadowski, J., Staroniewicz, P., Tropf, H.
Publikace: Eurospeech 2001 Scandinavia. Aalborg: Aalborg University, 2001. pp. 2059-2062. ISBN 87-90834-09-7.
Rok: 2001

Pracoviště: Katedra teorie obvodů

SpeechDat(E) - Eastern European Telephone Speech Databases

Autoři: doc. Ing. Petr Pollák, CSc., Černocký, J., Boudy, J., Choukri, K., van den Heuvel, H., Vicsi, K., Virag, A., Siemund, R., Maiewski, W., Staroniwicz, P., Tropf, H., Kochanina, J., Ostrouchov, A., Rusko, M., Trnka, M.
Publikace: XLDB - Very Large Telephone Speech Databases. Paris: European Language Recources Association (ELRA), 2000. pp. 20-25.
Rok: 2000

Pracoviště: Katedra teorie obvodů

ASR with Noisy Speech Pre-processing and Phoneme Model Re-estimation

Autoři: Vopička, J., doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc., prof. Ing. Jan Uhlíř, CSc.,
Publikace: Proceedings of Robust Methods for Speech Recognition in Adverse Conditions. Brussels: COST Office, 1999. pp. 151-154.
Rok: 1999

Pracoviště: Katedra teorie obvodů

CAR2 - Czech Database of Car Speech

Autoři: doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc., Hanžl, V., Vopička, J.
Publikace: Radioengineering. 1999, 8(4), 1-6. ISSN 1210-2512.
Rok: 1999

Pracoviště: Katedra teorie obvodů
Anotace:
This paper presents new Czech language twochannel (stereo) speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch reestimation were realised. The noise analysis of the car background environment was done.

Combined Noise Suppresioon System for Monaural Cochlear Implants

Autoři: Svoboda, M., prof. Ing. Pavel Sovka, CSc., doc. Ing. Petr Pollák, CSc.,
Publikace: Proceedings of 6th European Conference on Speech Communication and Technology. Berlin: ESCA, 1999. p. 2635-2638. ISSN 1018-4074.
Rok: 1999

Pracoviště: Katedra teorie obvodů

Czech Language Database of Car Speech and Environmental Noise

Autoři: doc. Ing. Petr Pollák, CSc., Vopička, J., prof. Ing. Pavel Sovka, CSc.,
Publikace: Proceedings of 6th European Conference on Speech Communication and Technology. Berlin: ESCA, 1999. pp. 2263-2266. ISSN 1018-4074.
Rok: 1999

Pracoviště: Katedra teorie obvodů

Czech Telephony Speech Database of 1000 Speakers - SpeechDat(E)

Autoři: doc. Ing. Petr Pollák, CSc., Hanžl, V., Černocký, J.
Publikace: Polish-Hungarian-Czech Workshop on Circuit Theory, Signal Processing, and Application. Praha: České vysoké učení technické v Praze, 1999. pp. 77-80. ISBN 80-01-02047-9.
Rok: 1999

Pracoviště: Katedra teorie obvodů

Generating Phonetically Rich Sentences and Words for Czech SpeechDat

Autoři: Hanžl, V., doc. Ing. Petr Pollák, CSc., Černocký, J.
Publikace: 9th Czech-German Workshop in Speech Processing. Praha: AV ČR, Ústav radiotechniky a elektroniky, 1999. pp. 15-16.
Rok: 1999

Pracoviště: Katedra teorie obvodů

Influence of Parameter Estimation in Kalman Filtering of Speech Signals

Autoři: doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc.,
Publikace: 9th International Czech-Slovak Scientific Conference Radioelektronika 99. Brno: VUT v Brně, 1999. pp. 182-185. ISBN 80-214-1327-1.
Rok: 1999

Pracoviště: Katedra teorie obvodů

Phoneme Model Based ASR of Words in Car Environment

Autoři: Vopička, J., doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc., prof. Ing. Jan Uhlíř, CSc.,
Publikace: Polish-Hungarian-Czech Workshop on Circuit Theory, Signal Processing, and Application. Praha: České vysoké učení technické v Praze, 1999. pp. 89-92. ISBN 80-01-02047-9.
Rok: 1999

Pracoviště: Katedra teorie obvodů

Real-Time Fixed-Point DSP-Implementation of Spectral Substraction Algorithm for Speech Enhancement in Noisy Environment

Autoři: Laengler, A., Gruhler, G., prof. Ing. Pavel Sovka, CSc., doc. Ing. Petr Pollák, CSc., Davídek, V.
Publikace: 9th International Czech-Slovak Scientific Conference Radioelektronika 99. Brno: VUT v Brně, 1999. pp. 186-189. ISBN 80-214-1327-1.
Rok: 1999

Pracoviště: Katedra teorie obvodů

Recording of Czech and Slovak Telephone Databases within SpeechDat-E

Autoři: Černocký, J., doc. Ing. Petr Pollák, CSc., Rusko, M., Hanžl, V., Trnka, M.
Publikace: Proceedings TSD'99. Berlin: Springer, 1999. p. 388-391. ISBN 3-540-66494-7.
Rok: 1999

Pracoviště: Katedra teorie obvodů
Anotace:
The databases of 5 East-European languages: Czech, Slovak, Russian, Polish and Hungarian are being created within the SpeechDat-E project. This paper describes the overall design of SpeechDat-E databases and concentrates on the Czech (1000 speakers) and Slovak (1000 speakers). The item structure and recording specifications are presented. More detailed description is included for the language-specific items. Attention is paid also to the geographic and dialect distribution of speakers. The paper also presents the recruitment strategy.

Czech Car Noisy Speech Database

Autoři: doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc.,
Publikace: Proceedings of the Polish-Czech-Hungarian Workshop on Circuit Theory, Signal Processing and Applications. Warsaw University of Technology, 1998. pp. 55-60.
Rok: 1998

Pracoviště: Katedra teorie obvodů

Database of Car Speech, Analysis of Collected Data, Tools for Automated Labeling

Autoři: doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc.,
Publikace: Proceedings of the 8-th Czech-German Workshop on Speech Processing. Praha: AV ČR, Ústav radiotechniky a elektroniky, 1998. pp. 53-54. ISBN 80-86269-00-0.
Rok: 1998

Pracoviště: Katedra teorie obvodů

Experimental Study of Speech Recognition in Noisy Environment

Autoři: Kreisinger, T., doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc., prof. Ing. Jan Uhlíř, CSc.,
Publikace: Signal Analysis and Prediction. Boston: Birkhaeuser, 1998. p. 455-466. ISBN 0-8176-4042-8.
Rok: 1998

Pracoviště: Katedra teorie obvodů

Suppression of Acoustic Noise in Speech Using Kalman Filtering

Autoři: Chládek, P., doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc.,
Publikace: Proceedings of the 8-th Czech-German Workshop on Speech Processing. Praha: AV ČR, Ústav radiotechniky a elektroniky, 1998. pp. 51-52. ISBN 80-86269-00-0.
Rok: 1998

Pracoviště: Katedra teorie obvodů

Study of Speech Recognition in Noisy Environment

Autoři: Kreisinger, T., prof. Ing. Pavel Sovka, CSc., doc. Ing. Petr Pollák, CSc., prof. Ing. Jan Uhlíř, CSc.,
Publikace: Signal Analysis and Prediction I. Praha: ICT Press, 1997. p. 334-337. ISBN 80-7080-282-0.
Rok: 1997

Pracoviště: Katedra teorie obvodů
Anotace:
Achieving reliable performance in speech recognition is the car for mobile telephony application has been studying intensively for more than one decade. This paper addresses effects of mismatched conditions and their minimization with respect to the performance of speaker-independent isolated. word recognition in the car-noise environment without consideration of Lombard effect. This study is primarily intended to study the dependence of the recognition rate on the SNR of an input signal without and with noise enhancement preprocessing, especially to find conditions under that the modified spectral subtraction call be effectively used for the speech recognition ill a real non-stationary car-noise environment. If as the worst recognition rate. is admitted e.g. 80%, then tile use of spectral subtraction methods enables to use wider interval of input SNRs: for the trainig: made oil a clean speech this interval is (40,6)(1) dB; for the training made on a noisy speech this interval is (40,-2) dB; for tile training performed oil an enhanced speech this interval is (40,-8) dB. The third case gives tile widest interval of SNRs in which a recogniser (with tile final recognition rate ill tile interval of (100,80)%) can be used.

The Problems of Robust LPC Parametrization

Autoři: doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc.,
Publikace: Polish-Czech-Hungarian Workshop on Circuit Theory, Signal Processing and Applications. Budapest: Technical University, 1997. pp. 15-21.
Rok: 1997

Pracoviště: Katedra teorie obvodů

Extended Spectral Subtraction

Autoři: prof. Ing. Pavel Sovka, CSc., doc. Ing. Petr Pollák, CSc., prof. Dr. Ing. Jan Kybic,
Publikace: EUSIPCO '96 Eight European Signal Processing Conference. Trieste: LINT, 1996. pp. 963-966. ISBN 88-86179-83-9.
Rok: 1996

Pracoviště: Katedra teorie obvodů, Katedra kybernetiky

Extended Spectral Subtraction

Autoři: prof. Ing. Pavel Sovka, CSc., doc. Ing. Petr Pollák, CSc.,
Publikace: Czech-Hungarian-Polish Workshop 1996. Praha: České vysoké učení technické v Praze, 1996, pp. 8-11.
Rok: 1996

Pracoviště: Katedra teorie obvodů

Noise Cancellation Systems

Autoři: prof. Ing. Pavel Sovka, CSc., doc. Ing. Petr Pollák, CSc., Davídek, V., prof. Ing. Jan Uhlíř, CSc.,
Publikace: Workshop 96. Praha: České vysoké učení technické v Praze, 1996, pp. 725-726.
Rok: 1996

Pracoviště: Katedra teorie obvodů

Cepstral Speech/Pause Detectors

Autoři: doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc.,
Publikace: Proceedings of IEEE Workshop on Nonlinear Signal and Image Processing. Neos Marmaras: The Institute of Electrical and Electronics Engineers, Inc., 1995. pp. 388-391.
Rok: 1995

Pracoviště: Katedra teorie obvodů

Implementation of Spectral Subtraction

Autoři: Davídek, V., prof. Ing. Pavel Sovka, CSc., doc. Ing. Petr Pollák, CSc.,
Publikace: Elektro 95. Žilina: VŠDS, 1995. pp. 207-210. ISBN 80-7100-251-8.
Rok: 1995

Pracoviště: Katedra teorie obvodů

Speech Detection in the Real Car Environment

Autoři: doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc.,
Publikace: Speech Processing on 5th Czech-German Workshop. Praha: AV ČR, Ústav radiotechniky a elektroniky, 1995. pp. 54-55. ISBN 80-901658-3-4.
Rok: 1995

Pracoviště: Katedra teorie obvodů

Speech/Pause Detection for Real-Time Implementation of Spectral Subtraction Algorithm

Autoři: prof. Ing. Pavel Sovka, CSc., Davídek, V., doc. Ing. Petr Pollák, CSc., prof. Ing. Jan Uhlíř, CSc.,
Publikace: ICSPAT 95. The 6th International Conference on Signal Processing Applications and Technology. Boston: DSP, 1995. pp. 1955-1958.
Rok: 1995

Pracoviště: Katedra teorie obvodů

The Study of Speech/Pause Detectors for Speech Enhancement Methods

Autoři: prof. Ing. Pavel Sovka, CSc., doc. Ing. Petr Pollák, CSc.,
Publikace: Eurospeech 95 Proceedings. Berlin: ESCA, 1995. pp. 1575-1578. ISSN 1018-4074.
Rok: 1995

Pracoviště: Katedra teorie obvodů

Real-time Noise Suppression System on TMS32OC30

Autoři: Davídek, V., doc. Ing. Petr Pollák, CSc.,
Publikace: Speech Processing on 4th Czech-German Workshop. Praha: AV ČR, Ústav radiotechniky a elektroniky, 1994. pp. 10-11. ISBN 80-901658-1-8.
Rok: 1994

Pracoviště: Katedra teorie obvodů

Speech Identification Algorithms in a Noisy Environment

Autoři: doc. Ing. Petr Pollák, CSc.,
Publikace: 31st Conference on Acoustics. Praha: České vysoké učení technické v Praze, 1994. pp. 147-151. ISBN 80-01-01146-1.
Rok: 1994

Pracoviště: Katedra teorie obvodů

Speech Recognition Systems

Autoři: prof. Ing. Jan Uhlíř, CSc., prof. Ing. Pavel Sovka, CSc., doc. Ing. Petr Pollák, CSc., Hanžl, V.
Publikace: CTU Seminar 94. Praha: České vysoké učení technické v Praze, 1994. pp. 197-198.
Rok: 1994

Pracoviště: Katedra teorie obvodů

Noise Suppression System for a Car

Autoři: doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc., prof. Ing. Jan Uhlíř, CSc.,
Publikace: Proceedings of the 3rd European Conference on Speech, Communication and Technology. Berlin: ESCA, 1993. pp. 1073-1076. ISSN 1018-4074.
Rok: 1993

Pracoviště: Katedra teorie obvodů

Noise Suppression System for Speech Degraded in Running Car

Autoři: doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc., prof. Ing. Jan Uhlíř, CSc.,
Publikace: Sprachverarbeitung: 3. Tschechisch-Deutscher Workshop. Praha: AV ČR, Ústav radiotechniky a elektroniky, 1993. pp. 25.
Rok: 1993

Pracoviště: Katedra teorie obvodů

Non-musical Tone Spectral Subtraction

Autoři: prof. Ing. Jan Uhlíř, CSc., doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc.,
Publikace: Proceedings of the Czech-Polish-Hungarian Workshop on Circuits Theory and Application. Praha: České vysoké učení technické v Praze, 1993. pp. 99-103.
Rok: 1993

Pracoviště: Katedra teorie obvodů

One Channel Suppression System for a Car

Autoři: doc. Ing. Petr Pollák, CSc., prof. Ing. Pavel Sovka, CSc., prof. Ing. Jan Uhlíř, CSc.,
Publikace: VIII Symposium Nacional de Union Ceintifica Internacional de Radio. Valencia: URSI, 1993. pp. 409-413.
Rok: 1993

Pracoviště: Katedra teorie obvodů

doc. Ing. Petr Pollák, CSc.

Všechny publikace

Automatic Phonetic Segmentation and Pronunciation Detection with Various Approaches of Acoustic Modeling

Dithering techniques in automatic recognition of speech corrupted by MP3 compression: Analysis, solutions and experiments

Improving of LVCSR for Casual Czech Using Publicly Available Language Resources

KALDI Recipes for the Czech Speech Recognition Under Various Conditions

Advanced Acoustic Modelling Techniques in MP3 Speech Recognition

Analysis and automatic recognition of compressed speech

Improved Estimation of Articulatory Features Based on Acoustic Features with Temporal Context

Phonetic Segmentation Using KALDI and Reduced Pronunciation Detection in Causal Czech Speech

Spectrally Selective Dithering for Distorted Speech Recognition

Estimation of Articulatory Features for Czech Language

Impact of Irregular Pronunciation on Phonetic Segmentation of Nijmegen Corpus of Casual Czech

Recognition of Spectrally Distorted Speech after MP3 Compression

Robust Neural Network-Based Estimation of Articulatory Features for Czech

Speech reduction in Czech

The Nijmegen Corpus of Casual Czech

The optimization of PLP feature extraction for LVCSR recognition of MP3 data

Accuracy of HMM-Based Phonetic Segmentation Using Monophone or Triphone Acoustic Model

Noise and Channel Normalized Cepstral Features for Far-Speech Recognition

Optimized State-Tying for Triphone-Based HMMs under Training Data Deficiency

Various Approaches of Small Vocabulary Speech Recognizer Implementation Using HTK Toolkit

ANALÝZA POTLAČOVÁNÍ AKUSTICKÉHO ECHA A DTD DETEKCE V CHYTRÝCH TELEFONECH

Knowledge-Based and Automated Clustering in MLLR Adaptation of Acoustic Models for LVCSR

Odhad základního tónu řeči s lokalizací hlasivkových pulsů a pitch-synchronní segmentace

Small and Large Vocabulary Speech Recognition of MP3 Data under Real-Word Conditions: Experimental Study

Accuracy of MP3 Speech Recognition Under Real-World Conditions. Experimental Study

ASR systems in Noisy Environment: Analysis and Solutions for Increasing Noise Robustness

Coverage of Spontaneous Conversational Speech from Nijmegen Corpus of Casual Czech by General ASR Language Models

Performance of Czech Speech Recognition with Language Models Created from Public Resources

Analysis of Czech Web 1T 5-gram corpus and its comparison with Czech National Corpus Data

Detekce řečové aktivity na bázi HMM a GMM modelování

Multi-Channel Database of Spontaneous Czech with Synchronization of Channels Recorded by Independent Devices

Příprava a analýza Českého Web 1T 5-gram korpusu pro použití v jazykovém modelu

Tvorba rozpoznávače plynulých promluv v českém jazyce standardními nástroji HTK

Accuracy Analysis of Generalized Pronunciation Variant Selection in ASR Systems

Czech Spontaneous Speech Collection and Annotation: The Database of Technical Lectures

Design and Utilization of Testing Database for VAD Classification

Long Recording Segmentation Based on Simple Power Voice Activity Detection with Adaptive Threshold and Post-Processing

Robust Speech Recognition in Car Environment Combining Noise Reduction and Acoustic Model Adaptation

The Dynamic Dimension of the Global Speech-Rhythm Attributes

Detektory řečové aktivity na bázi perceptivní kepstrální analýzy

HMM and EHMM Based Voice Activity Detectors and Design of Testing Platform for VAD Classification

Phone Segmentation Tool with Integrated Pronunciation Lexicon and Czech Phonetically Labelled Reference Database

Problems and Solutions in the Creation of Czech and Slovak Lexica for Speech Technology Applications: General Experiences and LC-Star2 Lexica

Řečové detektory využívající ergodické Markovovské modely

Speaker Non-Speech Event Modelling in Recognition of Read and Spontaneous Speech

Accuracy analysis of phonetic segmentation with multiple word-pronunciation variants and segmetnation tool in Praat environment

Automatická fonetická segmentace řečového signálu na bázi HMM a její implementace v prostředí programu Praat

HMM-Based Phonetic Segmentation in Praat Environment

Modified Feature Extraction Methods in Robust Speech Recognition

Technologie hlasových komunikací

Voice Activity Detection in Small Vocabulary Speech Recognition

Analysis of Glottal Stop Presence in Large Speech Corpus and Influence of Its Modelling on Segmentation Accuracy

Data-Driven Design of Front-End Filter Bank for Lombard Speech Recognition

Methodology of Lombard Speech Dabase Acquisition: Experiences with CLSD

Modelling of Speaker Non-speech Events in Robust Speech Recognition

Analysis of Lombard Effect in Several Czech Databases

Comparison of Three Czech Speech Databases from the Standpoint of Lombard Effect Appearance

Confronting HMM-based Phone Labelling with Human Evaluation of Speech Production

Design and Collection of Czech Lombard Speech Database

Design of Lombard Effect Speech Database

HMM Based VAD Using Token Passing Algorithm and Generalized Speech and Silence Models

Influence of HMM´s Parameters on the Accuracy of Phone Segmentation - Evaluation Baseline

LexEdit: GUI to Czech Pronunciation Lexicon for Speech Recognition Purposes

Methods for Speech SNR Estimation: Evaluation Tool and Analysis of VAD Dependency

Voice Activity Detector Based on Sample Synchronous Probability Evaluation Using HMM

Czech Speech Database for Consumer Devices (SPEECON): Description and Experiences from Collection

Direct Time Domain Fundamental Frequency Estimation of Speech in Noisy Conditions

Experiments in Voice Activity Detection Using Hidden Markov Models

Hidden Markov Models in Voice Activity Detection

Orthographic and Phonetic Annotation of Very Large Czech Corpora with Quality Assessment

Additive Noise and Channel Distortion-Robust Parameterization Tool - Performance Evaluation on Aurora 2 & 3

Efficient and Reliable Measurement and Simulation of Noisy Speech Background

Tool for Czech Pronunciation Generation Combining Fixed Rules with Pronunciation Lexicon and Lexicon Management Tool

Czech Pronunciation Lexicon and Annotation of Very Large Databases

Metody odhadu odstupu signálu od šumu v řečovém signálu

SNR of Noisy Speech and Methods for Its Estimation

SpeechDat-E: Five Eastern European Speech Databases for Voice-Operated Teleservices Completed

SpeechDat(E) - Eastern European Telephone Speech Databases