There is a need for developing and improving Danish speech technology to enable all Danish-speaking people to make use of it including those who speak a dialect or with an accent. This can be done by recording hours of spoken and read Danish and by developing Danish language datasets and language models.
Speech technology can help understand and reproduce spoken language and to support:
- Speech recognition tools for elderly people, citizens with visual impairment or special needs
- Medical dictation or automatic documentation of conversations with citizens or customers
- Voice and chatbots for IT support for resetting passwords, website navigation etc.
- Better machine translation between Danish and other languages
- Better voice navigation for example for GPS or in public transport
- Decision support with relevant knowledge for the employees
The problem: Danish speech technology is lagging behind
Currently, Danish speech technology has difficulty in understanding women, elderly people, different dialects and accents.
And in order for speech recognition to be used for example in voicebots and assistive tools using voice interfaces, it is necessary that it works for all citizens.
Furthermore, the market for the Danish language is too small for private companies alone to drive the technological development.
The solution: Our aim is to make speech technology available to everyone
Innovation Fund Denmark has granted 14 million Dkr for a project that aims to bring Danish speech technology up to international level. Over the next two years, we will develop a speech dataset called CoRal, which is an abbreviation for Danish Conversational and read-aloud speech dataset.
The data set will contain 1000-1500 hours of conversation and read speech from a broad and representative section of the population with regard to gender, age, Danish dialects and foreign accents. At the same time, we will develop language models that are able to recognize Danish speech and read Danish text aloud.
All data and models will be tested and published regularly, so that developers, companies and public institutions can benefit from them right from the start.
We are looking for speakers
Do you want to make our speech technology better at understanding you?
At the Alexandra Institute we are interested in recruiting speakers from all over Denmark. We especially encourage you to sign up as a speaker if you are a woman, an elderly or if you speak with an accent or dialect.
If you would like to sign up as a speaker or want to know more about the project, please contact Kasper Fænø Bay Noer, Senior Digital Strategist at the Alexandra Institute:
Phone +45 26 83 80 44 firstname.lastname@example.org.
We will contact you with further information when we start recording.
About the project
Official title: Danish Conversational and read-aloud speech dataset (CoRal)
Duration: 2 years and 10 months
Innovation Fund Denmark investment:
DKK 14,217,380 million
Total budget: DKK 22.172.400 million
About the partners
The Alexandra Institute is the only government-approved
Research and Technology Organisation that specialises in IT and digitalisation. The aim of the company is to ensure that the latest digital technologies are made available to Danish businesses and Danish society in general.
Alvenir is a Danish spin-out company from Technical University of Denmark that works on domain-specialized speech recognition for e.g. documentation of financial advisory conversations. In addition, Alvenir plays an active role in the Danish open-source environment, and since its foundation the company has contributed with language models as well as data resources.
Corti is a Danish company that develops speech, sound and text-based AI software to assist patient treatment and documentation in healthcare. With the ground-breaking AI software, Corti saves time and increases quality of the interaction with the individual patient. Corti's software has approx. 100 million annual interactions in Scandinavia and English-speaking countries, primarily in the US.
The Department of Computer Science at the University of Copenhagen (DIKU) is the first computer science department in Denmark and one of Europe’s leading computer science departments. DIKU performs teaching and research within the span of the three corners of Computer Science: Algorithms, Humans, and Data – and the department is actively involved in the development of technological innovation in society through a wide range of collaborations.
The Agency for Digital Government is responsible for the implementation of the government's digital ambitions in the public sector. The agency supports efficiency and flexible digital services through solutions for citizens, private companies and public authorities. To support the development of Danish language technology solutions, the Danish Government, KL – Local Government Denmark and Danish Regions have developed sprogteknologi.dk with the aim of providing easy access to Danish language resources.