AI ArchitectureApril 11, 2026

How We Use Claude Haiku to Stop Gemini From Hallucinating (Before It Starts)

Most AI writing tools send a single prompt and hope for the best. Here's the parallel pre-pass architecture Bloomberry uses to eliminate hallucination before generation starts.

By Sadok Hasan

How We Use Claude Haiku to Stop Gemini From Hallucinating (Before It Starts)

When you paste a URL into an AI writing tool, most tools do one thing: send the URL and a prompt to a single model and hope for the best.

The model has to simultaneously fetch the content, extract the story, decide on structure, match your voice, and write — in one pass. That is a lot to ask.

This is why AI writing tools hallucinate. Not because the models are bad. Because the task is too complex for a single unstructured call.

The Bloomberry ice cream launch is a good example. When we tested naive generation against the blog post about Bloomberry launching Bloomberry ice cream at Snowees in Morgan Hill, CA, the model produced content about "AI-generated ice cream flavors." Technically accurate about the topic category. Completely wrong about the actual story.

The model was not lying. It was pattern-matching. The pattern happened to miss the point.

The problem is structural, not model-quality

Most people's instinct when they see AI hallucination is to switch models. GPT-4 hallucinated; try Claude. Claude missed the point; try Gemini.

This is the wrong diagnosis. The failure is not in the model's capability. It is in the architecture of the request.

When a model receives an unstructured input — "here is an article URL, write me LinkedIn and X posts about it" — it has to make a lot of inferences. What is the central claim? What are the specific named facts versus the general topic? What structure should the output follow? What details must appear versus what can be inferred?

When that inference load is high and the prompt provides no structural scaffolding, the model fills the gaps with plausible-sounding output. The output is coherent. It is not accurate.

You do not solve this by asking the model more nicely. You solve it by reducing the inference load before the generation starts.

What Bloomberry built: a parallel pre-pass architecture

Before Gemini Flash ever sees your URL, Bloomberry runs four parallel Claude Haiku calls. Each call is scoped to a single extraction task.

Call 1: Story spine extraction. Three sentences maximum. Who did what, where, and why it matters. Every sentence must contain a real named entity from the source — a person, a place, a product, a number. No generalities allowed. If the source does not contain enough named specifics to fill three sentences, this call returns a thin-source flag that stops the generation before it starts.

Call 2: Concrete anchor extraction. Up to five specific details that can only come from this particular article — not from general knowledge about the topic. Named people. Named places. Specific numbers. Exact quotes. The logic: if none of these anchors appear in the final output, the output is hallucinated, regardless of how fluent it sounds.

Call 3: LinkedIn outline generation. Structure before prose. Central claim, counterintuitive angle, hook options, body development sequence. The outline is the brief the Gemini Flash call writes from — not a loose instruction, but a specific roadmap that tells the model which anchor to use where.

Call 4: X angle extraction. A single distilled claim for tweet generation — the most specific, non-obvious argument the source makes. Not the topic. The argument.

Total Haiku time: roughly 1-2 seconds in parallel. The four calls run simultaneously and their outputs are assembled into a structured brief before any generation starts.

Why Haiku specifically

The obvious question is: why use a smaller, cheaper model for this step instead of the same model doing the final generation?

Three reasons.

Speed. Haiku is fast enough to run four calls in parallel without adding meaningful latency to the user experience. Running four Gemini Pro calls in parallel would be significantly slower.

Cost. Haiku is cheap enough to run on every generation without making the economics unsustainable. The extraction tasks do not require the full capability of a frontier model. You do not need Opus to pull three named facts from an article.

Task fit. Extraction is a different task than generation. Extraction requires precision and constraint — return exactly what is in the source, nothing more. Generation requires creativity and synthesis. Using a highly capable generative model for a constrained extraction task often produces worse results because the model tends to infer, elaborate, and complete patterns rather than just extracting.

The right model for the right task is not always the most powerful model available. For extraction, Haiku's constraint is a feature.

What this prevents

Without the pre-pass: the model reads your URL, infers that the topic is "AI company" and "ice cream," and produces content about AI-generated ice cream flavors. It is plausible. It is fluent. It is completely wrong.

With the pre-pass: the Gemini Flash call receives the story spine, the concrete anchors, and the content plan before it writes anything. It knows that the opening sentence must name Sadok Hasan, the color gradient, and Snowees in Morgan Hill. It knows the causal chain: morning smoothie ritual → berry color → company name → physical product. It knows which hook options were generated and which the outline selected.

The generation model is not inferring the story from the URL. It is writing from a structured brief about the story. The inference load is almost entirely offloaded to the extraction step, where it is constrained by the instruction to return only what is explicitly in the source.

This is not the same as telling the model "don't hallucinate." That instruction reduces but does not eliminate the problem. The pre-pass eliminates the gaps that hallucination fills.

The broader pattern: brief before the brief

This is what we call the "brief before the brief" — the AI writes a content brief about your content before the content gets written.

The pattern generalizes to any multi-step AI pipeline that transforms source material into output. The principle is always the same: separate extraction from generation. Use a fast, cheap, constrained model for extraction. Use a capable generative model for synthesis. Give the generative model a structured brief that it writes from, not a loose instruction that it interprets.

This is also the pattern behind the fidelity judge — a separate verification step that runs after generation, scores the output against the extracted anchors, and flags any output that contains zero source-specific named details. The judge is the quality gate after generation; the pre-pass is the quality gate before.

Both are cheaper than fixing hallucinated content after a user sees it.

What this looks like in production

The Bloomberry ice cream post we used as a test case is a real example. The article is at bloomberry.ai/blog/bloomberry-ice-cream.

Before the pre-pass architecture: generation produced content about AI-generated ice cream. After: generation produced content that named the morning smoothie ritual, the berry-to-lavender color gradient, and Snowees in Morgan Hill — because those were the anchors the Haiku extraction call flagged as required named details.

The model did not get smarter. The architecture got better.

If you want to see the pre-pass working on your own content, try Bloomberry on any URL and watch the generation. The specificity is not a coincidence.

Ready to write sharper?

Bloomberry turns your ideas into publish-ready thought leadership.

Try Bloomberry free

How We Use Claude Haiku to Stop Gemini From Hallucinating (Before It Starts)

The problem is structural, not model-quality

What Bloomberry built: a parallel pre-pass architecture

Why Haiku specifically

What this prevents

The broader pattern: brief before the brief

What this looks like in production

Related Bloomberry tools

Browse examples

Related guides

More from the blog