AI Pipeline Overview

How Thamizhi uses AI to process, verify, and enrich news articles.

AI Stack

Component	Provider	Model
Text processing	Groq	Llama 3.3 70B
Fact-checking	Groq	Mixtral 8x7B
Translation	Groq	Llama 3 70B
Extraction	Groq	Llama 3.3 70B

What the AI Does

For Scraped News

Title extraction — Extract clean title in original language + English translation
Summarization — Generate 50-word and 150-word summaries
Classification — Assign category, sub-category, district, location
Entity extraction — Identify people, places, organizations
Sentiment analysis — Determine tone and sentiment
Cross-referencing — Match against other articles on same topic

For Citizen Reports

All of the above plus:

Consistency check — Does the report make logical sense?
Verification scoring — How likely is this to be true?
Flag generation — Highlight suspicious claims for human review

Batch Processing

After each GitHub Actions scrape completes, a processing job runs:

# Triggered after scrape
jobs:
  process:
    runs-on: ubuntu-latest
    steps:
      - name: Fetch unprocessed articles
        run: python fetch_pending.py
      - name: Process with Groq
        run: python ai_process.py
        env:
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}

Prompt Template Example

You are a Tamil Nadu news analyst. Given a news article, extract:

1. Title in Tamil (original)
2. Title in English (translated)
3. 50-word summary in English
4. 150-word summary in English
5. Category (one of: crime_against_women, crime_against_children,
   politics, corruption, accident, health, education, environment)
6. Sub-category (more specific)
7. District in Tamil Nadu
8. Specific location
9. Incident date
10. Sentiment (positive/negative/neutral)
11. Key entities (JSON array of people, places, organizations)

Article: {article_text}

AI Pipeline Overview

AI Stack

What the AI Does

For Scraped News

For Citizen Reports

Batch Processing

Prompt Template Example

Philosophy

User Guide

Technical

Future