Skip to content

LLM Roadmap

The phased plan for building and deploying custom AI models for Thamizhi.

  • Collect 100,000+ processed Tamil news articles
  • Build labeled dataset (categories, entities, quality scores)
  • Create evaluation benchmark (1000 hand-labeled articles)
  • Export in LLM training format (JSONL, Alpaca format)
  • Fine-tune Llama 3 8B on Tamil news corpus
  • Evaluate against baseline (generic Groq prompts)
  • Deploy as custom Groq model or self-hosted
  • Bilingual model (Tamil + English)
  • Integrated fact-checking capability
  • Replace generic AI prompts
  • Continuous fine-tuning from citizen report feedback
  • Quantized model for edge/on-device
  • Basic inference on Cloudflare Workers
  • Offline capabilities for mobile app