Synthetic parallel corpora LAD-EN,TR

Col·lectivaT SCCL
/ Created 11/05/2022
/ Updated 14/06/2022

This dataset contains parallel corpora with synthetically produced Ladino sentences. Source sentences were obtained from various ES-TR and ES-EN corpora in OPUS collection. Each dataset has four columns: Source tag, English/Turkish sentence, Spanish sentence, Synthetic Ladino sentence

License: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) https://creativecommons.org/licenses/by-sa/4.0/

Citation and more information: https://arxiv.org/abs/2205.15599

Total size: EN-ES-LAD: 5,748,013 TR-ES-LAD: 4,574,023

This resource is created as part of the "Judeo-Spanish: Connecting the two ends of the Mediterranean" project within the framework of the “Grant Scheme for Common Cultural Heritage: Preservation and Dialogue between Turkey and the EU–II (CCH-II)” implemented by the Ministry of Culture and Tourism of the Republic of Turkey with the financial support of the European Union.

Tags:

Data and Resources