Maarten Sukel - Jounai.nl: Playing with New Tech to Reinvent the News | PyData Amsterdam 2024

Explore how Maarten Sukel built Junai.nl, an AI-driven news platform using LLMs for automated content generation, and learn about the technical challenges and solutions involved.

Key takeaways
  • Built Junai.nl - an AI-driven news platform that automatically generates news articles and podcasts from ~50 trusted sources using LLMs, without human intervention

  • Tech stack includes:

    • Backend: Java/Spring Boot
    • Frontend: Vue.js/Nuxt.js
    • Azure Container Apps for deployment
    • OpenAI APIs for content generation
    • Structured LLM output validation using Pandera
  • Cost optimization:

    • Current operational cost ~$80/month on Azure
    • API costs under $2/week
    • Costs reduced by switching to smaller models
    • Image generation discontinued due to high API costs
  • Key technical implementation details:

    • Uses Jaccard similarity for article deduplication
    • Server-side rendering implemented for SEO
    • Automated deployment via GitHub Actions
    • Simple clustering for related content
    • Multiple AI “personalities” for different content types
  • Challenges faced:

    • Cultural localization of AI-generated content
    • Legal considerations around content sourcing
    • Cost management with scaling
    • Quality control of AI outputs
    • Speech synthesis quality in Dutch
  • Lessons learned:

    • Importance of data validation and testing
    • Benefits of experimenting with new technologies
    • Value of small-scale projects for learning
    • Need for careful prompt engineering
    • Importance of structured output validation for LLMs
  • Future considerations:

    • Potential for local model deployment
    • Expectation of decreasing API costs
    • Need for better cost optimization at scale
    • Possibility of multilingual expansion
    • Focus on maintaining truthful and objective reporting