in

MIT Using Artificial Intelligence to Translate Ancient “Dead” Languages


While developing a system for deciphering lost languages, MIT researchers studied the Ugaritic language, which is related to Hebrew and which has previously been analyzed and deciphered by linguists.
Photo credit: SRK Branavan

System developed at WITH CSAIL is designed to help linguists decipher languages ​​that have been lost to history.

Recent research suggests that most languages ​​that ever existed are no longer spoken. Dozens of these dead languages ​​are also considered lost or “not deciphered” – that is, we don’t know enough about their grammar, vocabulary or syntax to actually understand their texts.

Lost languages ​​are more than just an academic curiosity; Without them, we miss quite a bit of knowledge about the people who spoke them. Unfortunately, most of them have so few records that scientists cannot decipher them using machine translation algorithms such as Google Translate. Some do not have well-researched “relative” language to compare to, and traditional dividers such as spaces and punctuation are often absent. (To illustrate, imagine how the cipher is written before the foreign language.)

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) recently made an important development in the field: a new system that has been shown to automatically decipher a lost language without the need for advanced knowledge of its relationship to other languages. They also showed that their system can determine the relationships between languages ​​by itself, and they used it to corroborate recent science suggesting that the Iberian language is not really related to Basque.

You May Also Like:  Moths’ Extraordinarily Sophisticated Wing Design

The team’s ultimate goal is to enable the system to decipher lost languages ​​that linguists have eluded for decades using just a few thousand words.

Under the direction of MIT professor Regina Barzilay, the system relies on several principles based on insights from historical linguistics, such as the fact that languages ​​in general only develop in certain predictable ways. For example, while a particular language rarely adds or deletes an entire sound, certain sound substitutions are likely to occur. A word with a “p” in the mother tongue may change to a “b” in the descendant language, but a change to a “k” is less likely due to the significant pronunciation gap.

Taking these and other language restrictions into account, Barzilay and MIT doctoral student Jiaming Luo developed a decryption algorithm that can cope with the large space of possible transformations and the scarcity of a guide signal in the input. The algorithm learns to embed speech sounds in a multi-dimensional space in which differences in pronunciation are reflected in the distance between corresponding vectors. This design enables them to capture relevant patterns of language change and express them as computational constraints. The resulting model can segment words in an ancient language and map them to counterparts in a related language.

You May Also Like:  Put the Tea in the Coffee Bride #8: We Tell the Story of the Inventor Who Invented the Telephone 10 Years Before Graham Bell

The project builds on a paper that Barzilay and Luo wrote last year that deciphered the dead languages ​​Ugaritic and Linear B, which had previously taken decades to decipher. A key difference from this project, however, was that the team knew that these languages ​​were related to early forms of Hebrew and Greek, respectively.

With the new system, the relationship between languages ​​is derived by the algorithm. This question is one of the biggest challenges in decryption. In the case of Linear B, it took several decades to find the correct known offspring. For Iberian, scholars still cannot agree on the related language: some argue for Basque, while others refute this hypothesis and claim that Iberian does not refer to a known language.

The proposed algorithm can assess the proximity between two languages; When tested on known languages, it can even accurately identify language families. The team applied their algorithm to Iberian, taking Basque into account, as well as to less likely candidates from Romance, Germanic, Turkish, and Uralic families. While Basque and Latin were closer to Iberian than other languages, they were still too different to be considered related.

In future work, the team hopes to expand its work beyond joining texts with related words in a familiar language – an approach known as “related decryption”. This paradigm assumes that such a well-known language exists, but the example of Iberian shows that this is not always the case. The team’s new approach would be to identify the semantic meaning of the words even if they don’t know how to read them.

You May Also Like:  Vitamin D Levels During Pregnancy Linked With Child IQ – Significantly Lower Levels Identified in Black Women

“For example, we can identify any references to people or places in the document, which can then be further investigated in the light of known historical evidence,” says Barzilay. “These ‘entity recognition’ methods are now widely used in various word processing applications and are very accurate. The central research question, however, is whether the task can be carried out without training data in the old language.”

The project was partially supported by the Intelligence Advanced Research Projects Activity (IARPA).

Dikkat: Sitemiz herkese açık bir platform olduğundan, çox fazla kişi paylaşım yapmaktadır. Sitenizden izinsiz paylaşım yapılması durumunda iletişim bölümünden bildirmeniz yeterlidir.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Huami Announces New Smart Watch Amazfit Pop

What is glycerin? What does glycerin do? How is glycerin used?