Antiplagiat Develops a Module for Detecting Translated Borrowings in Texts in 100 Languages


Antiplagiat Company has developed and launched testing of a specialized module for detecting translated borrowings in texts in the one hundred most common languages of the world. In December 2019, the company's project for the pan-linguistic analysis of texts in natural languages won the competitive selection of leading companies within the framework of the national program “Digital Economy of the Russian Federation,” which was operated by RVC.

Machine translation systems have reached a new level in recent years and have become a constant assistant to scientists and students. Simultaneously, the number of attempts to pass off the translated text as the original one has enormously increased. Such attempts are not limited to the apparent direction of translation from English to national. Translations from Russian into the national languages of the CIS countries are regularly found. Besides, other languages can act as “donors”: Chinese, German, French, etc.

The strategic goal of the Antiplagiat development is to make the system detect borrowings, regardless of which language the translation was made from, as well as whether it was made by a person or performed by a machine translator. Even though there is a lot of research in this area in the world, for the most part, they are not focused on obtaining solutions that can work under high loads, that is, process hundreds of documents per minute, when comparing them with multimillion-dollar collections of potential sources.

“We have come to the end of the study of the latest technologies for multilingual vectorization of text fragments. Modern machine learning algorithms will make it possible to compare the semantic content of texts in a hundred languages without an intermediate translation stage. In particular, this family of BERT-based approaches is the hottest topic in the NLP community right now. The research group of our company has been actively following developments in this direction since 2017, which made it possible to develop a module for comparing texts in one hundred languages and launch an active phase of its testing right now,” commented Yuri Chekhovich, Executive Director of Antiplagiat.

The new functionality tests will be carried out, including in an industrial environment, on real users' documents at the end of 2020 and in 2021. The algorithms are tuned to maximize accuracy, not to inconvenience users with false-positive signals in the first step. Then the algorithm settings will be finalized, taking into account the test results. This approach will gradually expand the search's completeness while maintaining a high level of accuracy in detecting borrowings.

Search by name:

Search by date:

Select date in calendar
Select date in calendar