π LEHJA β build status
Updated 2026-06-13 15:15:01 UTC Β· auto-refresh 60s Β· cron every 5 min
OVERALL (Arabic track + MT): 50%
H100 NVL: 85% GPU Β· active jobs 10 Β· VRAM 45.1 GB used / 48.0 GB free of 93.6 GB
π₯ TRAINING LIVE β model is learning diacritized Quranic recitation (GPU-heavy phase).
π Tasks β main & sub
β
β¦. Urdu voice model (previous milestone) 100% β
CosyVoice2 Urdu LoRA trained + delivered Β· validation PASS Β· median CER 0.065 Β· model on app-server
β
1. Build Quranic-Arabic pipeline (canonical alignment, full tashkeel) 100% β
Diacritic-preserving aligner + mining/prep/train/validate scripts
β
2. Canonical Uthmani Quran text + alignment index 100% β
6236 ayahs, full Uthmani Hafs (zabar/zer/pesh/shadda/tanwin/ghunna)
β
Skeleton search index β rough transcript β exact ayah span
β
3. Mine in-domain Arabic recitation (your QTM corpus) 100% β
whisper-ar VAD-segment + inline Quran align Β· 11,900/12,000 files Β· 8/8 shards done
β
clean recitation clips harvested Β· 11,237 clips Β· 1223.6 min Β· 3,041 distinct ayahs
β
4. Canonical-align β diacritized labels + confidence gate 100% β
perfect diacritized labels (alignment inline per clip)
β
high-confidence gate (matchβ₯85) + dedup Β· 7,238 train-grade clips
β¬ 5. Train + validate + package Quranic Arabic LoRA 0% π CosyVoice2 fine-tune (learn diacritized recitation) Β· 25 epochs planned
β¬ validate: recites correct Arabic? (whisper-readback)
β¬ package CosyVoice2-0.5B-ar + download to app-server
β¬ MT. Flagship MT benchmark (Qwen3.6-35B-A3B vs Qwen3-14B) 0% β¬ models downloaded (Qwen3-14B + Qwen3.6-35B-A3B) Β· queued after training
π Arabic data harvest
Recitation clips mined: 11,237 Β· 1223.6 min clean audio
Train-grade (matchβ₯85): 7,238 clips Β· 3,041 distinct ayahs covered
Source: your QTM Quran-recitation corpus (30,165 Arabic files available)
Approach: every clip's rough transcript is matched to the canonical Uthmani Quran text, so labels carry perfect diacritics (zabar/zer/pesh/tashdeed/tanwin/ghunna) β and non-recitation (Urdu explanation, repetition) is auto-filtered.