Multi-Modal AI Content Generation

Article-Gen

Gemma 2 9BFlux.1-devLLaVAStreamlitNVIDIA GPULoRA

A multi-modal AI content generation platform that creates professional articles with AI-generated images and publishes them directly to Substack.

Article-Gen auto-generates articles from trending news topics, creates three contextual images per article via Flux.1-dev, and adds AI image captioning with the LLaVA vision model. A custom prompt mode provides full creative control over article topics and style.

The writing style is fine-tunable via LoRA on Gemma 2 9B, letting you train the model on your own writing samples for a personalized voice. One-click publishing to Substack streamlines the entire workflow from ideation to publication.

Gallery

Article-Gen - Auto Mode

Article-Gen - Custom Prompts

Case Study

The Problem

Content marketing requires a relentless publishing cadence (blog posts, newsletters, social updates), but producing quality articles with original imagery is slow and expensive. Stock photos feel generic. Commissioning illustrations doesn't scale. And AI-generated text without images looks like what it is: low-effort filler. The gap is a tool that produces complete, publish-ready content, words and visuals together, without requiring a writer, a photographer, and an editor.

Design Challenge

Fully automated content generation has an obvious trust problem: if a human isn't writing it, how do you maintain voice, accuracy, and relevance? And AI-generated images paired with AI-generated text can compound the uncanny valley effect. The whole piece feels synthetic. The challenge was designing a pipeline where automation handles the labor but a human's style and editorial judgment still shape the output.

Key Design Decisions

The writing style is fine-tunable via LoRA on Gemma 2 9B. Train it on your own published writing and the generated articles carry your voice, not a generic AI tone. Each article gets three contextual images generated via Flux.1-dev, with prompts derived from the article content so the visuals actually match the text. LLaVA then captions each image for accessibility and SEO. The pipeline offers two modes: auto mode (discovers trending topics and writes about them) and custom mode (you provide the topic and framing). One-click publishing to Substack means the output goes from zero to published without copy-pasting between tools.

Outcome

A complete zero-input-to-published pipeline. In auto mode, it discovers trending topics, writes an article in a trained voice, generates three matched images with captions, and publishes directly to Substack. No human in the loop unless you want one. Custom mode gives full editorial control while still automating the production work.

System Architecture

Technical Highlights

Auto Mode

Discovers trending news topics via DuckDuckGo, selects the most relevant, and generates a complete article with zero human input. The system handles topic selection, research, writing, image generation, captioning, and publishing.

Custom Mode

Full editorial control: provide the topic, angle, and tone. The AI handles the writing and image generation while you direct the content strategy.

LoRA-Tuned Writing Style

Fine-tune Gemma 2 9B on your own published writing samples to match your voice. The generated articles carry your tone, phrasing, and editorial perspective rather than generic AI prose.

Contextual Image Generation

Three images per article generated via Flux.1-dev with prompts derived from the article content. LLaVA vision model auto-generates captions for accessibility and SEO. Images match the article context rather than being generic stock.