For you as a video professional, accurate time codes are your blood and vessels. So while it may seem post-editing requires extra work, combining ASR and manual post-editing (”expert in the loop” workflow) still saves you 60%-80% of the time compared to conventional transcription.īut the most important characteristic of AI transcription is the fact that each word has a time-code. Given a WER of 1%-2%, the pos-editing time of an automated transcript, is rough twice the length of the clip, which is still 3 to 4 times faster compared to conventional transcription. Limecraft offers customised ASR solutions with a WER that is significantly lower compared to the standard solutions offered by Microsoft Videoindexer, Google Speech or Amazon Recognition. Hence sometimes it is better to optimise for accuracy rather than for the lowest price per hour, and you may want to look for a complete solution that includes a proper editor for reviewing and correcting the automatically created transcript. While a small WER might be acceptable for indexing purposes, it may be prohibitive when the results are intended for publication. Depending on the audio quality and the ASR engine type, expect a WER of 2%-20%, compared to completely accurate by manual transcription. The quality of automated transcription is often expressed in word error rate (WER). That boils down to at least 24 times difference in turn-around time. AI transcription as offered by Limecraft runs 4 times faster than real-time, whereas manual transcription typically takes 6 to 8 times the length of the clip. More info on editing transcripts on the knowledge base.ĪI transcription and the “Expert in the loop” interface The key differences between manual and AI TranscriptionĪI transcription is executed in a fraction of the time compared to manually creating a transcript. This is called the “Expert in the loop” workflow. For professional use, it is important to make the results of automated transcription editable in an interface that highlights confidence scores, and that allows the user to modify words, speakers of timing. Good AI transcription services include accurate speaker segmentation (sometimes referred to as “diarisation”), proper interpunction including exclamation and question marks. Training involves learning language models by processing millions of web pages, and learning acoustic models by listening to labelled data sets of real human speech.
0 Comments
Leave a Reply. |