There is currently a growing digital divide between languages with sufficient resources and languages with fewer resources, further exacerbating the danger of digital extinction for them. For the majority languages, the process of generating useful tools and resources is much easier due to their large web-presence. However, many minority languages do not have sufficient material and human resources to power the creation of such tools. Lack of state support, public visibility, societal and institutional oppression are direct causes of these languages being deprioritized in the digital spaces of today.
Efforts on preservation of languages focus mainly on language documentation, teaching, and physical community building. One area that is overlooked is creation of tools based on artificial intelligence. Tools like machine translation, speech synthesis, and speech recognition are now important counterparts in creating human-machine interfaces. Also, these tools can help model the knowledge of dying languages and preserve them for future generations.
Who is this document for?
This document is for you if you are:
Language activist who’s interested in extending tools and resources in their language
Linguist who is interested in collecting data for research and building language technology
Natural language processing (NLP) researcher who is interested in augmenting data for their language of interest
Language activist allies who want to support the revitalization of under-resourced languages
Can I contribute?
This is a living document with an open license (CC-BY). Its source file is shared publicly in https://github.com/CollectivaT-dev/language-toolkit where you can pull a version to work on your own and then submit your contribution. It can range from correcting typos to adding a translation, detailing a section and explaining your case study. If you have doubts please feel free to write to us at email@example.com.