Skip to content
This project is funded by the European Union.
Ladino Data Hub
All data on Sephardic culture and Ladino language in one place
Tatoeba parallel corpus
English, Turkish, Spanish sentences aligned with Ladino. Sentences from
(CC-BY License) extracted in 10.03.2022. ENG-LAD: 1261 SPA-LAD: 671 TUR-LAD: 335
Neural machine translation models
OpenNMT models, training configuration and logs, dev/test sets. Citation and more information:
License: CC-BY-ShareAlike This resource is...
Text-to-speech (TTS) training dataset and models
1987 voice samples totaling to 3.2 hours of audio data. Voice: Karen Şarhon License: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)...
Synthetic parallel corpora LAD-EN,TR
This dataset contains parallel corpora with synthetically produced Ladino sentences. Source sentences were obtained from various ES-TR and ES-EN corpora in OPUS collection. Each...