Turkey and European Union flags

This project is funded by the European Union.

Languages and the digital age

Digital age has brought many new opportunities and advantages. It has undoubtedly connected humankind in unimaginable ways in a short amount of time. Although, no innovation comes with its challenges and threats. World wide web is a resource accessible (almost) to all but it is dominated by a handful of languages.

English for example is spoken by 15% of the world but it currently holds 54% of all the content on the web. On the other hand, languages like Russian, Chinese, and Spanish each represent around 5 to 6% of the content on the web, in spite of their geopolitical dominance.

Where does this leave the world’s many endangered and dying languages? Sadly, on the very margins of this picture.

Languages thrive in communities and pass onto generations with day-to-day usage. The more our lives are connected through digital mediums, we are less and less exposed to our mother tongues which are not represented online. This eventually leads to a decline of their usage by younger generations.

Note

How? To illustrate, a Kurdish speaker in Turkey, accesses their government health services website and sees that everything is in Turkish, or goes to check the latest online social media platform and sees that it’s by default available in English. These kinds of small encounters lead to thinking that in order to find their way and address their needs, speaking their mother tongue is not enough. You have to know the nearest majority language and many times even more.

Digital extinction

According to UNESCO, around 3,500 languages are expected to be extinct by the end of this century and we cannot deny the role of technology in this. Kornai states that the vast majority (over 95%) of languages have already lost the capacity to ascend digitally. Digital ascent requires use in a broad variety of digital contexts ranging from maintaining a Wikipedia page, to making language classes available and creating language technology data.

We shouldn’t of course fall into the trap of making technology the culprit of language loss. It is merely a representation of already existing power dynamics in the society. States which depriotize or even oppress certain languages start their digital transformation by excluding all these languages in their digital infrastructure. Now, North American and Eurocentric big technology companies follow a English-first approach by default.

Technology can also be utilized to form communities around language preservation, knowledge sharing, language documentation.

Online presence of a language

Traditionally, the responsibilities of a language activist has been actively speaking the language, passing it to younger generations, forming language learning and speaking communities, negotiating with public institutions for the inclusion of their language, collaborating with linguists for the documentation of their language and so on.

Nowadays, the challenge is not just making the language alive in the physical world but also online. It is related in two ways for the survival of a language:

  1. Exchanges and visibility online strikes interest and helps engage existing and new language learners.

  2. What’s stored online in turn is a digital record for language which helps documentation and technology development.

Below, we will describe some ways the internet is becoming multilingual and plural while helping revive endangered languages.

Access to knowledge

One of the most popular initiatives on bringing languages online is Wikipedia. Wikipedia is open an online encyclopedia written and maintained by a community of volunteers through an open collaboration and reviewing system.

Wikipedia’s broad aim is to democratize access to knowledge. Naturally, the culture built around this ethos goes hand in hand with multilinguality. Although it started only with English, it quickly expanded to the world’s many languages. We can probably say that it’s the most language diverse platform on the internet with 326 languages (as of 03.05.22) and counting up.

Note

The first edit on a non-English Wikipedia was made in Catalan on March 16, 2001. Today, it is most notable for its large number of quality articles, which illustrates Catalan language’s important online presence despite being a minority language. Catalan wikipedia is currently the 20th largest Wikipedia.

Relative sizes of different wikipedias

Relative sizes of different wikipedias (source)

Making a new language available in Wikipedia is no easy task (as explained here), but it is definitely a great way to make knowledge accessible online and build a virtual community around it.

Immersion into language

One great example of a language revitalization with the help of technology is Yiddish. After the holocaust, the number of Yiddish speakers decreased drastically from around 10 million speakers. Whoever survived were forced to assimilate the language of their new lands to avoid persecution. In the last century, the use of Yiddish had almost disappeared except for small and dispersed Hasidic communities.

With the rise of the internet and popularity of online forums, Yiddish speakers used these platforms to converse in their language. Over time, the virtual world became the primary meeting point for Yiddish speakers in forums like The Idishe Velt (The Jewish World) and Kave Shtiebel (The Coffee House).

Language documentation projects

The rapid loss of languages in the last century has powered many initiatives for language documentation and revitalization. One of these initiatives is The Endangered Languages Project (https://www.endangeredlanguages.com/) which is a web-based platform that acts as a collaborative hub of language enthusiasts, linguists and industry partners to help strengthen endangered languages. Users of the website act as contributors by uploading language samples in text, audio, link or video format using a unique geotagging system that allows for easy searchability.

Ladino in Endangered Languages Project

Ladino in Endangered Languages Project

Similarly, Wikitongues, which started in 2014, collects recordings and resources of the world’s languages. Currently it holds videos in over 700 languages, lexicons in 200 languages and links to hundreds of external resources.

Ladino videos in Wikitongues

Ladino videos in Wikitongues

“Traditional” technology

Preserving a language isn’t just a matter of recording words or phrases and digitising them to be held in an online vault. Language is inherently about people, culture and identity. In order to keep a language alive, it needs to be spoken by many, immersed in everyday culture and actively passed onto future generations. These days, the internet, social media, software, platforms occupy a large space in our everyday lives. In this section, we will list some of the core tools needed for a technology to thrive digitally.

Unicode-supported font

A digital font is the way computers know how to display the characters in your language. Unicode, formally the Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world’s writing systems. The standard, which is maintained by the Unicode Consortium, defines 144,697 characters covering 159 modern and historic scripts, as well as symbols, emoji, and non-visual control and formatting codes.

You can check if your language is supported by a font by going to Google Noto and searching among fonts that represent more than 500 writing systems. If it is not there, you can create your own with the help from a font designer and install it manually on your computer.

Tifinagh alphabet of Moroccan Berber in Google Fonts

Tifinagh alphabet of Moroccan Berber in Google Fonts

Keyboard

Until the day we will be speaking naturally with computers, the most common interface for interacting with them will be the keyboard. It is a technology easily taken for granted for the world’s many languages but unfortunately it is not available in all of the world’s writing systems. If a keyboard is not present or not developed enough for a language, its speakers tend to prefer other alphabets or even languages to communicate. For example, speakers of Ethiopian languages like Amharic, Tigrinya and Oromo switch to using English as Ge’ez script is not pre-installed in their smartphones. Young arabic speakers in many countries have invented their own chat-alphabet Arabizi consisting of latin characters and numerals to account for the lack of Arabic script support in early mobile and web technology.

If a keyboard is not available, let’s say in your phone or computer, here are some resources to search or help create your own keyboard:

Online dictionary

A dictionary, or lexicon, is a solid way of documenting a language since it acts as a reference of words and their meanings. An online dictionary never goes out-of-print as it is accessible from any device with an internet connection. Also, open-source dictionaries can live and grow collaboratively as a community effort involving both speakers of the language, linguists and technologists.

Living Dictionaries is an online dictionary-builder platform created by Living Tongues Institute for Endangered Languages. It provides comprehensive, free online tech tools that assist language communities in conservation and revitalization efforts. It also allows recording of words and phrases. As of May 2022, it support 237 languages. To initiate your language in Living Dictionaries, you can get use of their Elicitation lists and watch tutorials on their YouTube channel.

SIL Dictionary App Builder “helps you to build customized dictionary apps for Android and iOS smartphones and tablets. You specify the lexicon data file to use, the app name, fonts, colors, the ‘about box’ information, the audio, illustrations and the icons. Dictionary App Builder will package everything together and build the customized app for you. You can then install it on your phone, send it to others by Bluetooth, share it on microSD memory cards and publish it to app stores on the Internet.”

A woman using bambara dictionary on her mobile phone

A woman using Bambara dictionary on her mobile phone (image credit SIL International)

Language learning applications

The availability of online educational platforms has revolutionized the way many people approach language learning today. Even though they don’t replace a teacher, they either complement traditional classes or are the only choice in some languages’ contexts. They also give many advantages like letting people learn on any device (mobile or desktop), at their own pace and schedule. These apps serve language classes and exercises in short, fun and digestible sprints, let the students track their progress, and even chat with or hire language tutors in online community spaces.

Many of the world’s endangered, minority and under-resourced do not yet have a significant presence online or adequate language documentation to create online courses. There are, however, thanks to push from language communities and increased sensibility for learning indigenous languages worldwide, increased interest by companies who develop these apps for investing into endangered and minority languages. Languages like Maori, Scots Gaelic, Hawaiian, Quechua, Navajo and Lakota are making their way into well-known educational platforms like Duolingo, Babbel and uTalk.

We can categorize these platforms in four ways:

  • Module-based: Using these apps feels more or less like taking a class in a school or a college, where users follow a modular curriculum planned by educators. It allows learners to track their progress, receive notifications, and earn points. Some notable examples are: Duolingo , Babbel, uTalk , Master Any Language. Unfortunately, it is not possible for language communities themselves to decide and implement a new language in these platforms. However, it is possible to “lobby” and participate in the creation of new modules through the communities some of these platforms provide. Also, it has to be noted that most of these platforms are in-profit platforms and the work by language communities remain uncompensated.

Content for learning Cree language in Master Any Language platform

Content for learning Cree language in Master Any Language platform

  • Game-based: These applications feed the learner with “question and answer” pairs and can evaluate how well the person memorizes the pairing over time. This method is popular online for visual as well as auditory learners of languages, and also for many other areas of study such as math and science. Memrise may be the most compelling for indigenous and other under-resourced minority languages. It is an online educational platform that uses memory techniques such as flashcards, SRS (spaced repetition software) and visual aids to optimise language learning. Memrise has content available for over 170 languages, a much greater number than most other language-learning platforms. Notably, the site has an impressive, community-created selection of content for indigenous languages and dialects from around the globe such as Yup’ik (see first screenshot below), Cherokee (see second screenshot below), Algonquian, Alutiiq, Choctaw, Greenlandic, Inuktitut, Lakota, Nahuatl, Yucatec Maya, K’iche’, Quechua, Guarani, Ainu, Jeju and many other medium-sized tongues spoken in Europe, Africa, the Middle East, Asia and the Pacific. The Memrise platform has a DIY, democratic and grassroots feeling to it. The platform is innovative because: 1) users can follow existing courses and flashcard sets and seamlessly upload their own mnemonic aids to help recall words and phrases as they progress through a course; 2) the site provides a engaging way for users to connect with language content through repetition, little quizzes, short videos, funny images and recordings made by fluent speakers, and 3) the platform allows community users to easily create their own language courses that others can use as well. Similar platforms include: AnkiApp, Language Drops, and MosaLingua.

Community created Quechua learning exercises in Memrise

Community created Quechua learning exercises in Memrise

  • Chat-based: These apps allow learners to connect with spakers of the language they are interested in through a live interactive chat. This provides a stress-free and social environment for learners. Some examples like HiNative and HelloTalk have recently exploded in popularity especially in Asian contries.

  • Online student-tutor platforms: For those learners who prefer classic teacher-student relationship but do not have access to teachers in their vicinity, platforms like iTalki and Verbling help setup online classes. This also contributes directly to the language community as it generates direct income for the teachers.

Sources


_images/logos.png

This document was created with the financial support of the European Union. The content of this website is the sole responsibility of Col·lectivaT and SKAD and does not necessarily reflect the views of the European Union.