AI Pipeline Overview
How Thamizhi uses AI to process, verify, and enrich news articles.
AI Stack
Section titled “AI Stack”| Component | Provider | Model |
|---|---|---|
| Text processing | Groq | Llama 3.3 70B |
| Fact-checking | Groq | Mixtral 8x7B |
| Translation | Groq | Llama 3 70B |
| Extraction | Groq | Llama 3.3 70B |
What the AI Does
Section titled “What the AI Does”For Scraped News
Section titled “For Scraped News”- Title extraction — Extract clean title in original language + English translation
- Summarization — Generate 50-word and 150-word summaries
- Classification — Assign category, sub-category, district, location
- Entity extraction — Identify people, places, organizations
- Sentiment analysis — Determine tone and sentiment
- Cross-referencing — Match against other articles on same topic
For Citizen Reports
Section titled “For Citizen Reports”All of the above plus:
- Consistency check — Does the report make logical sense?
- Verification scoring — How likely is this to be true?
- Flag generation — Highlight suspicious claims for human review
Batch Processing
Section titled “Batch Processing”After each GitHub Actions scrape completes, a processing job runs:
# Triggered after scrapejobs: process: runs-on: ubuntu-latest steps: - name: Fetch unprocessed articles run: python fetch_pending.py - name: Process with Groq run: python ai_process.py env: GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}Prompt Template Example
Section titled “Prompt Template Example”You are a Tamil Nadu news analyst. Given a news article, extract:
1. Title in Tamil (original)2. Title in English (translated)3. 50-word summary in English4. 150-word summary in English5. Category (one of: crime_against_women, crime_against_children, politics, corruption, accident, health, education, environment)6. Sub-category (more specific)7. District in Tamil Nadu8. Specific location9. Incident date10. Sentiment (positive/negative/neutral)11. Key entities (JSON array of people, places, organizations)
Article: {article_text}