Automatic speech recognition and translation: faster working and learning

Researchers at KIT further develop intelligent speech recognition and translation system: automatic text segmentation, summarization and intelligent query possible

Automatische Spracherkennung und -uebersetzung Schnelleres Arbeiten und Lernen

Automatic speech recognition and translation systems such as the Lecture Translator from the Karlsruhe Institute of Technology (KIT) can convert what is spoken in lectures into a text in several languages in real time. Such systems thus improve access to information for students with disabilities and foreign students. They also generally promote faster working and learning through intelligent post-processing and archiving of spoken texts. To take this further, KIT researchers have added new functions to the Lecture Translator. Automatic recognition of the spoken word in several languages simultaneously, text segmentation and title generation in real time, summaries and links to technical terms and queries of what has been heard now simplify the understanding and efficient processing of lectures.

"With the Lecture Translator's automatic simultaneous translation, we have brought spoken lectures closer to an international audience. However, this usually only makes up 15 percent of the audience. With the new AI tools, we want to break down not only language barriers, but also barriers to understanding," says Alexander Waibel, Professor of Computer Science at KIT. "Automatically transcribed texts of spoken language are often difficult to read, as they appear too quickly as one long text without paragraphs and subheadings - exactly as the lecture was delivered orally." The processing of the lecture is also laborious, as you have to search the lecture for gaps in understanding, says Waibel.

Better overview of documents

The further development of the Lecture Translator provides a remedy here. The researchers have developed several new automatic functions such as "Smart Chaptering", "Summarization", "Q&A" and "Auto-Links". A new type of artificial intelligence (AI), which automatically recognizes the language, transforms the spoken text into a transcript in several languages and automatically identifies paragraphs, chapter headings and important key points. It also creates an acoustic rendition in which users can select one of 18 languages. The program also automatically displays links as cross-references to relevant sources in lecture notes or Wikipedia, which students can use to better process the lecture. "With our new AI models, conversations and lectures can be better structured and even videos can be divided into easily navigable chapters," says Waibel. This enables better understanding not only during the lecture, but also after the lecture.

Lecture Translator translates into 18 languages

The research team has integrated the work into the Lecture Translator, which is used at KIT to automatically transcribe lectures in real time. Chapter division, title generation, paragraph layout, summaries with links - which can also be used online and offline - now extend the Lecture Translator service and simplify working with the material. The translation is available in 18 languages. The technology has specific applications for content creators, students, teachers and podcasters, who can structure their audio and video content for the first time. "Users can navigate more efficiently through videos and lectures, find relevant sections more quickly and capture important core content compactly and efficiently - they have a much better general overview and faster access to details," says Waibel.

The research took place as part of the "How is AI Changing Science?" project and was funded by the Volkswagen Foundation for four years. In addition to KIT, the University of Bonn and the University of Vienna were also involved in the project.