📐 The Open Persian ASR Leaderboard ranks and evaluates speech recognition models on the Hugging Face Hub.
We report the Average WER (⬇️ lower the better) and Average CER (⬇️ lower the better). Check the 📈 Metrics tab to understand how the models are evaluated.
If you want results for a model that is not listed here, you can submit a request for it to be included ✉️✨.
The leaderboard includes Persian/Farsi ASR evaluation benchmarks.
We created our own high quality evaluation dataset, the Persian ASR Benchmark, which is used to evaluate the models listed here.
Select Datasets to Display
10 | 0.025 | 0.114 | 0.0258 | 0.1623 | 0.0215 | 0.0724 | 0.0106 | 0.0424 | 0.0402 | 0.0933 |
Here you will find details about the speech recognition metrics and datasets reported in our leaderboard.
Metrics
Models are evaluated using Character Error Rate (CER) and Word Error Rate (WER). The CER metric measures the accuracy of a system at the character level, capturing detailed errors such as misspellings, missing letters, or small deviations that WER might miss. A lower CER indicates better accuracy in reproducing the reference transcript character by character.
WER is also reported to provide a word-level perspective, but models are primarily ranked based on their CER, emphasizing fine-grained transcription quality.
For details on reproducing the benchmark numbers, refer to the Persian-ASR-Leaderboard GitHub repository.
Character Error Rate (CER)
Character Error Rate is used to measure the accuracy of automatic speech recognition systems at the character level. It calculates the percentage of characters in the system's output that differ from the reference (correct) transcript. A lower CER value indicates higher accuracy.
Take the following example:
Reference: علی کتاب خواند
Prediction: علی کتاه خاند
| Reference: | د | ن | ا | و | خ | ب | ا | ت | ک | ی | ل | ع | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Prediction: | د | ن | ا | - | خ | ه | ا | ت | ک | ی | ل | ع | ||
| Label: | ✅ | ✅ | ✅ | D | ✅ | ✅ | S | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Explanation of labels:
- S (Deletion): ب Subtituted
- D (Deletion): و Deleted
Total reference characters (**N**) = 14
Errors = 1 substitution (ب→ه) + 1 deletion (و) = **2 errors**
CER = (S + I + D) / N = 2/14
Final CER = 0.14285 (≈ 14.3%)
Word Error Rate (WER)
Word Error Rate is used to measure the accuracy of automatic speech recognition systems. It calculates the percentage of words in the system's output that differ from the reference (correct) transcript. A lower WER value indicates higher accuracy.
Take the following example:
| Reference: | رفتند | مدرسه | به | پارسا | و | آرش |
|---|---|---|---|---|---|---|
| Prediction: | رفتن | مدرسه | بارسا | و | آرش | |
| Label: | S | ✅ | D | S | ✅ | ✅ |
Here, we have:
- 2 substitutions ("پارسا" → "بارسا" and "رفتند" → "رفتن")
- 0 insertions
- 1 deletion ("به" is missing)
This gives 3 errors in total. To get our word error rate (WER), we divide the total number of errors (substitutions + insertions + deletions) by the total number of words in our reference (N), which for this example is 6:
WER = (S + I + D) / N = (2 + 0 + 1) / 6 = 0.5
Giving a WER of 0.5, or 50%.
For a fair comparison, we calculate normalized CER and WER for all model checkpoints, meaning punctuation and casing are removed from the references and predictions. You can find the evaluation code on our GitHub repository
Limitations of WER for Persian
Persian has complex linguistic features that make Word Error Rate (WER) less reliable as a metric.
1. Formal vs. Informal Variations
Persian often has multiple valid forms for the same sentence depending on formality:
- Formal:
کتابم را از علی گرفتم - Informal:
کتابم رو از علی گرفتم
Both sentences are correct, but WER would count the difference between را and رو as a full word error, penalizing the model unfairly.
2. Morphological Complexity
Persian words often include clitics or attached pronouns (e.g., کتابم, رفتم), which can be split differently depending on tokenization. WER can exaggerate errors in these cases.
3. Word Segmentation Ambiguity
Persian does not always use spaces consistently, especially with prepositions, conjunctions, and enclitics. WER is sensitive to such inconsistencies, which can inflate error rates.
Word Error Rate (WER) Calculation
- Substitution: را → رو counts as 1 word error
- Total words in reference: 5
- WER = 1 / 5 ≈ 0.2
Character Error Rate (CER) Calculation
- Character-level difference: ا → و (1 character error)
- Total characters in reference: 21
- CER = 1 / 21 ≈ 0.0476
How to reproduce our results
The ASR Leaderboard will be a continued effort to benchmark open source/access speech recognition models where possible. Along with the Leaderboard we're open-sourcing the codebase used for running these evaluations. For more details head over to our repo at: Persian-ASR-Leaderboard GitHub repository
P.S. We'd love to know which other models you'd like us to benchmark next. Contributions are more than welcome! ♥️
Benchmark datasets
| Dataset | Total Duration (h) | License |
|---|---|---|
| FLEURS | - | CC-BY-4.0 |
| Persian-ASR-Benchmark | - | CC-BY-4.0 |
| common_voice | - | CC0 |
| ManaTTS(only parts 70-77) | - | CC0 |
Dataset and Normalization
During preprocessing, we noticed that some Persian words contained Arabic forms (e.g., دایرة المعارف), which added unnecessary complexity and confused the model. We normalized such words to standard Persian forms to improve consistency and model understanding.
For more information about our normalization methods, please refer to our GitHub page where we describe our preprocessing pipeline in detail.
Since many models do not release their training data, we created an evaluation dataset using audio recorded after the public release dates (2 November 2025) of those models. This ensures fairness and prevents data leakage, as none of these samples were used during training.
✉️ Submit your model to be evaluated
About
This leaderboard showcases benchmark results for speech recognition models.
Data is sourced from local evaluations in Benchmark_data.csv.
Last updated on Oct 14th 2025
For further information, keep in touch:
info@c1tech.group