Antiplagiat Company has developed and launched testing of a specialized module for detecting translated borrowings in texts in the one hundred most common languages of the world. In December 2019, the company's project for the pan-linguistic analysis of texts in natural languages won the competitive selection of leading companies within the framework of the national program “Digital Economy of the Russian Federation,” which was operated by RVC.
Machine translation systems have reached a new level in recent years and have become a constant assistant to scientists and students. Simultaneously, the number of attempts to pass off the translated text as the original one has enormously increased. Such attempts are not limited to the apparent direction of translation from English to national. Translations from Russian into the national languages of the CIS countries are regularly found. Besides, other languages can act as “donors”: Chinese, German, French, etc.
The strategic goal of the Antiplagiat development is to make the system detect borrowings, regardless of which language the translation was made from, as well as whether it was made by a person or performed by a machine translator. Even though there is a lot of research in this area in the world, for the most part, they are not focused on obtaining solutions that can work under high loads, that is, process hundreds of documents per minute, when comparing them with multimillion-dollar collections of potential sources.
The new functionality tests will be carried out, including in an industrial environment, on real users' documents at the end of 2020 and in 2021. The algorithms are tuned to maximize accuracy, not to inconvenience users with false-positive signals in the first step. Then the algorithm settings will be finalized, taking into account the test results. This approach will gradually expand the search's completeness while maintaining a high level of accuracy in detecting borrowings.