Start of 2019
Welcome to my notebook with some of my study in progress. Three tasks in 2019: Exercise Regularly, Read Regularly, Heading to Graduate. Recent courses include, phonetics, corpus linguistics, text information systems, and etc....
Welcome to my notebook with some of my study in progress. Three tasks in 2019: Exercise Regularly, Read Regularly, Heading to Graduate. Recent courses include, phonetics, corpus linguistics, text information systems, and etc....
Web Search Challenges Application scalability parallel indexing & searching (MapReduce) low quality information and spams Spam detection & Robust ranking dynamics of the web -- Opportunities many additional heuristics 啟發 can be leveraged 槓桿原理 to improve search accuracy rich link information, layout, etc link analysis & multi-feature ranking Web → Crawler → Indexer <-> Retriever <-> Browser ← User...
1. Generating acoustic feature files sphinx_fe -argfile ../TrainingSet/acoustic-model/feat.params \ -samprate 16000 -c ted.fileids \ -di . -do . -ei wav -eo mfc -mswav yes audio file format must be mono 16k can use the following bash commend to change the format for filename in *.wav; do ffmpeg -i "$filename" -ac 1 -ar 16000 ./Test1/"$filename"; done Reference Screenshot: 2. Accumulating observation counts ../TrainingSet/bw \ -hmmdir ....
Practice Makes Perfect...
Introduction There are two file formats for speech recognition training and analysis. If we have srt file for initial transcription file, we need to convert it to plain text format. We will cover how to srt to plain text in Section 1, then Section 2 discuss how to covert the plain text file we have from Section 1 to acceptable training format. Section 1 - SRT to Plain Text A ....
Introduction When we are adapting-acoustics-models, there are some utterances that fail to produce phonetic transcription. Those are words not in your model dictionary. Therefore, we would like to extend our dictionary. Using g2p-seq2seq to extend the dictionary To keep the consistency, we use CMUSphinx recommend g2p-seq2seq. Installation git clone the module from the following link: https://github.com/cmusphinx/g2p-seq2seq sudo python setup.py install python setup.py test Remember to update or install python setuptools....
WER Words error rate is a common metric of the performance of a speech recognition. The formula is WER = ( I + D + S) / N. Given an original text, a recognition text with a length of N words, S: number of substitutions D: number of deletions I: number of insertions N: total number of words The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level....
Training Error Identified When I was training Astrom Audio, the following word are not in dictionary: 'week-long' 0 'an-and' 2 'multi-disciplinary' 3 'valkenburg' 4 'ehht' 5 'creatives'' 6 'cross-fertilization' 7 'far-reaching' 9 'it's-it's' 10 '1964' 13 'the-the' 14 '5th' 15 '15th' 16 72% of the sentences fail to produce phonetic transcription because of a single word missing in the dictionary model....
Speech Recognition consists of 3 main models: Acoustic Model : acoustic properties for each senone (HMM) Phonetic Dictionary: a mapping from words to phones Language Model: to restrict word search What is the next words? In spite __ In our application, we are working on Speech-to-Text Auto Captioning for Speeches in NCSA (The National Center for Supercomputing Applications ) Talks. Thus, most of the talks have more related to science fields....
Training Set 50 audio format sentences from Ted 18 audio format sentences from Astrom Model Type Default Ted-HMM Astrom-HMM (num) Ted+Astrom-HMM Ted+Astrom-HMM+g2p Ted+Astom-HMM+openslrG2P Ted 33.1 15.8% - 19.2% 18% 16.8% Astrom 44.9% - 41.4% (35.2%) Ted-HMM follows the adapting acoustics models step using Ted audio as training file Ted+Astrom-HMM+g2p adapted the acoustics models and add extended dictionary according to words not in the first training session...