AI Sentence DNA: A Corpus Study of Recurring AI-Writing Signals Across Vocabulary, Cadence, Structure, and Style
A Bloomberry Research Technical Report · Version 1.0 · June 2026 · Sadok Hasan
Cite as
Bloomberry Research. AI Sentence DNA: A Corpus Study of Recurring AI-Writing Signals Across Vocabulary, Cadence, Structure, and Style. Version 1.0. June 2026. bloomberry.ai/research/ai-writing-patterns
APA: Bloomberry Research. (2026, June). AI Sentence DNA: A corpus study of recurring AI-writing signals… (Version 1.0). Bloomberry AI.
Abstract
Generative AI systems increasingly produce writing that is fluent, coherent, and semantically plausible. Yet across large volumes of AI-assisted writing, recurring linguistic signals emerge: vocabulary that clusters around abstraction, transitions that perform logic without providing evidence, sentence structures that default toward symmetry, hooks that follow predictable formulas, and conclusions that over-resolve the complexity they were tasked with addressing.
This report introduces Bloomberry's AI Sentence DNA corpus: a 7,400+ entry catalogue of AI-writing signal entries assembled from production enforcement lists, prompt-layer guardrails, regex and cadence detectors, hard-banned reply patterns, replacement rules, finite surface-form variants, and source-backed research-tracked signals drawn from external academic, editorial, and community sources. The corpus is designed to catalogue recurring AI-writing patterns, not to determine authorship.
Our central finding is that AI writing does not become recognizable because of any single word or phrase. It becomes recognizable when signals stack: elevated generic vocabulary, smooth but low-information transitions, symmetrical rhetorical forms, cadence uniformity, predictable opening and closing structures, and low-specificity conclusions. We term this compound pattern layer AI Sentence DNA.
External academic research supports the core vocabulary findings. Kobak et al. (2025), analyzing excess vocabulary in over 15 million PubMed abstracts, documented statistically significant post-ChatGPT increases in words including delve, showcasing, underscores, and pivotal — with frequency ratios reaching 28× baseline. The full corpus is maintained as a structured internal research dataset. Representative examples are published publicly. A research-visible machine-readable subset is available for research and enterprise evaluation.
Counts represent catalogued AI-writing signal entries, not unique phrases or authorship determinations.
About this corpus
Counts include production enforcement entries, prompt guardrails, regex/pattern detectors, structural cadence detectors, hard-banned reply patterns, replacement pairs, finite regex surface-form variants, and source-backed research-tracked AI-writing signals from public research sources.
Counts represent catalogued signal entries, not unique phrases or authorship determinations.
These are writing signals, not authorship determinations. Human writers may use the same phrases, and AI systems may produce outputs outside these patterns.
Version 1.0 · June 2026 · Last corpus audit: June 2026
Corpus summary — Version 1.0 · June 2026
7,400+
Catalogued AI-writing signal entries
4,500+
Unique signal entries
12
Structural cadence detectors
17
Hook patterns
400+
Transition / filler entries
700+
Pattern surface forms
287
Replacement pairs
| Count layer | Entries |
|---|---|
| Static ESM corpus raw entries | 6,246 |
| Runtime persona-specific bans | +61 |
| Finite regex surface-form expansion | +703 |
| Reviewed source-backed research entries | +612 |
| Final audited corpus count | 7,622 |
| Public label | 7,400+ |
What are AI writing patterns?
AI writing patterns are recurring phrases, sentence structures, and stylistic habits produced consistently by large language models. These patterns emerge because AI systems trained on large corpora converge toward consistent defaults — vocabulary that works for any topic, transitions that perform logical connection without providing it, and structures that score well on coherence metrics.
The patterns fall into six measurable categories — vocabulary markers, phrase-level clichés, cadence templates, structural patterns, hook formulas, and replacement pairs — and are detectable because they appear at elevated frequency in AI-generated text relative to natural human writing in equivalent contexts.
Understanding these patterns supports AI-writing screening, avoidance systems, and voice calibration — not authorship classification.
Definitions
Vocabulary Patterns4,500+ entries
Words and phrases that appear at elevated frequency in AI-generated text relative to human writing. Several entries — including delve, showcasing, and underscores — are independently validated by Kobak et al. (2025) with frequency ratios up to 28× pre-LLM baselines.
| Phrase / Word | Model | Pattern Type | Frequency |
|---|---|---|---|
| at its core | All models | Framing phrase | Very High |
| when it comes to | All models | Transition filler | Very High |
| let's unpack | ChatGPT | Hook phrase | Very High |
| needless to say | All models | Filler phrase | High |
| delve | ChatGPT / Claude | Vocabulary cliché | High |
| tapestry | Claude | Vocabulary cliché | High |
| move the needle | ChatGPT | Corporate cliché | High |
| paradigm shift | All models | Corporate cliché | High |
| double down on | ChatGPT | Idiom cliché | High |
| speaks volumes | All models | Idiomatic filler | High |
Showing a subset. Full dataset: 7,400+ catalogued AI-writing signal entries.
Cadence Structures12 detectors
Sentence-level rhythm patterns that form the structural backbone of AI-generated text. Each is identified by its repeating shape, not just its words. The corpus contains 12 named structural cadence detectors.
Rhetorical Contrast
Model: All models · Frequency: Very High
Structure
- Negative framing of X
- Pivot word (but / however / it's not just)
- Positive reframe of X as Y
Example
"It's not just about getting more done. It's about doing the right things."
Motivational Cadence
Model: ChatGPT / Open-source LLMs · Frequency: High
Structure
- Short declarative claim
- Brief expansion or evidence
- Imperative or payoff statement
Example
"Most people wait for permission. You don't need it. The choice is yours."
Generic Opener
Model: All models · Frequency: Very High
Structure
- Temporal or world-state frame
- Present-tense generalization
- Transition to main claim
Example
"In today's fast-paced landscape, staying ahead requires more than effort."
Aphorism Pattern
Model: Claude / ChatGPT · Frequency: High
Structure
- Abstract noun or concept
- Simple declarative predicate
- Optional contrasting clause
Example
"Clarity is speed. Less is more. The simplest version often wins."
Hedge-Assertion Pair
Model: All models · Frequency: High
Structure
- Qualifying hedge
- Assertive claim that follows
Example
"While individual cases vary, the evidence suggests the pattern is consistent."
Resolution Closer
Model: All models · Frequency: Very High
Structure
- Brief acknowledgment of tension
- Forward-looking synthesis
- Clean, earned-feeling ending
Example
"The path forward is clear. The companies that adapt will be the ones that lead."
Hook Patterns17 total
Predictable first-line constructions that AI models default to when opening posts or paragraphs. All 17 named hook patterns are verified through direct analysis of AI-generated LinkedIn and professional writing content.
| Pattern Name | Model | Example |
|---|---|---|
| Temporal landscape opener | All models | "In today's landscape…" |
| World-state opener | All models | "In a world where…" |
| Curiosity hook | ChatGPT / Gemini | "Have you ever wondered…" |
| Candor opener | ChatGPT | "Let's be honest…" |
| Reveal setup | ChatGPT | "Here's the thing…" |
Transition & Filler Phrases400+ entries
Connective phrases that appear between ideas without adding meaning. Among the most reliable signal indicators — AI-generated prose uses these at elevated rates because they create the appearance of logical flow without requiring actual logical connection.
| Phrase | Model | Type |
|---|---|---|
| at the end of the day | All models | Summary filler |
| in other words | All models | Restatement bridge |
| ultimately | All models | Resolution filler |
| on the other hand | All models | Contrast bridge |
| needless to say | All models | Filler affirmation |
| the reality is | ChatGPT / Claude | Reframe opener |
| less is more | Claude / ChatGPT | Aphoristic filler |
| in today's world | All models | Temporal filler |
Showing a subset. Full transition/filler dataset: 400+ entries.
Replacement Pairs287 pairs
Structured mappings from AI-coded vocabulary and phrases to more direct, concrete human-language alternatives. The most practically actionable category in the corpus.
| AI-coded expression | Human alternative |
|---|---|
| utilize | use |
| facilitate | help, allow, make possible |
| leverage (verb) | use, apply, draw on |
| navigate the complexities | handle, deal with, work through |
| delve into | look at, examine, explore |
| unpack | explain, break down, examine |
| unlock the potential | enable, release, open up |
| empower | let, allow, enable |
| transformative | major, significant, meaningful |
| seamlessly | smoothly, without friction |
| robust | strong, reliable, consistent |
| holistic | complete, whole, full-scope |
| actionable insights | practical steps, specific recommendations |
| game-changer | major development, significant shift |
| moving forward | from now on, next |
| at its core | fundamentally, essentially |
| in today's digital age | [delete — begin with the actual claim] |
| it is important to note | [delete — state the information directly] |
| in conclusion | [delete — end on a fact or insight] |
| due to the fact that | because |
| in order to | to |
| serves as a | is |
Showing a representative selection. Full corpus: 287 verified replacement pairs.
Key findings
What the corpus reveals about AI-writing patterns
AI writing style is compound, not lexical
No single word or phrase is a reliable indicator of AI-generated writing. The presence of "pivotal" alone indicates nothing. The pattern of co-occurrence — multiple transition fillers, symmetrical tricolon, vague authority attribution, and a resolution closer in the same 200-word piece — begins to constitute a fingerprint. AI Sentence DNA is the accumulation.
AI writing systematically prefers abstraction over specificity
Across all signal categories, AI-generated writing defaults to abstract, reusable language where human writing would use specific, contextual language. Vocabulary like landscape, realm, journey, framework, ecosystem, and transformation could appear in writing about any topic. Concrete human writing is reusable across fewer contexts precisely because it contains specific facts, named people, dates, and observations.
Transition and filler phrases are among the highest-signal indicators
The corpus contains 400+ transition and filler phrase entries. AI-generated prose uses them at elevated rates because they create the appearance of logical flow without requiring the writer to construct actual logical connection. High-signal entries include: furthermore, moreover, additionally (as opener), it is important to note, that being said, in conclusion, ultimately.
Structural symmetry is a reliable AI cadence signal
Multiple cadence templates share a structural characteristic: symmetry. AI-generated prose applies balanced, parallel structure at rates that exceed natural human writing — at the word, phrase, sentence, paragraph, and document levels. Natural human writing produces more asymmetric structures: abrupt endings, unresolved tensions, arguments that concede more than they recover.
Social writing has distinct AI cadence patterns
AI-generated LinkedIn and professional social writing exhibits a distinct pattern that differs from other formats: stacked short lines for false emphasis, the "this is not about X, it is about Y" reframing formula, framework closers that gesture at principles without specifics, and "steal this" phrasing that signals generosity through formula rather than content.
External academic research independently validates core vocabulary signals
Kobak et al. (2025), analyzing 15 million+ PubMed abstracts, documented statistically significant post-ChatGPT frequency increases for words appearing in Bloomberry's independent production corpus — including delve (r=28×), showcasing (r=10.2×), and underscores (r=10.9×). The alignment between independently derived lists constitutes cross-validation of the vocabulary signal layer.
Pattern reduction requires voice modeling, not list application
Applying a banned-word list to AI-generated output reduces surface vocabulary signal but does not address cadence, structure, specificity, or voice. Effective reduction requires voice calibration from the specific writer's actual patterns, structural interruption of default cadences, specificity injection, transition reduction, and resistance to the clean "resolution closer" that AI systems default toward.
Limitations
What this corpus does not claim
These are writing signals, not authorship determinations. Human writers may use the same phrases, and AI systems may produce outputs outside these patterns.
No single entry proves AI authorship
The presence of any corpus entry in a text does not constitute evidence that the text was written by AI. Every word, phrase, and cadence structure in the corpus also appears in human writing. The corpus's diagnostic utility comes from co-occurrence patterns and signal density, not from the presence of any individual entry.
Human writers may exhibit high signal density
Academic writers, corporate communications professionals, and those trained in formal business writing may produce text that scores high on signal metrics without any AI involvement. The corpus was developed for social and professional writing contexts and should not be applied in academic integrity contexts.
Research-tracked entries are not production-validated
The 612 research-tracked entries added in the June 2026 external expansion are staged candidates, not production-validated signals. They were included based on source backing, not through Bloomberry's own empirical validation. Their elevation to production enforcement requires additional review.
Regex counts differ from unique phrase counts
Finite regex surface-form expansion produces entry counts that exceed unique root phrase counts. A corpus count of 7,400+ cannot be read as "7,400 distinct phrases." It must be read as "7,400 catalogued signal entries" — a technically precise but meaningfully different quantity.
Model behavior changes over time
AI-writing patterns change as models are updated. Kobak et al. (2025) documents that some words spiked sharply after ChatGPT's release and then began declining. The corpus reflects patterns observed through June 2026; some signals may become less prevalent as model defaults shift.
Domain specificity applies
Some signals are more domain-specific than their corpus inclusion acknowledges. Words flagged as AI-style in social writing — stakeholders, deliverables, leverage — are standard vocabulary in certain professional domains. Signal review must be calibrated to context.
Download the full AI Sentence DNA research report
The complete technical report includes the full taxonomy, methodology, source tier classifications, count breakdown tables, all appendices, and the cite-as block. Enter your work email to download.
A research-visible machine-readable subset is available for research and enterprise evaluation. The full internal enforcement corpus is not published.
Explore by model
For a deeper explanation of how these patterns form across models — and the theory behind AI Dialects — see our research on AI writing dialects. This page is data. That page is the framework.
Frequently asked questions
References
Kobak, D., González-Márquez, R., Horvát, E.-Á., & Lause, J. (2025). Delving into LLM-assisted writing in biomedical publications through excess vocabulary. Science Advances, 11. https://doi.org/10.1126/sciadv.adt3813
Liang, W., et al. (2025). Quantifying large language model usage in scientific papers. Nature Human Behaviour. https://doi.org/10.1038/s41562-025-02273-8
Wikipedia contributors. (2026, April). Signs of AI writing. Wikipedia. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
Welch, J. (2025, March; updated May 2025). A list of words that AI over-uses. Embryo. https://embryo.com/blog/list-words-ai-overuses/
Antony, A. (2026, March). 300+ ChatGPT words to avoid (overused AI words list 2026). alstonantony.com. https://alstonantony.com/seo-strategy/chatgpt-overused/
Slopwash. (2026). Anti-slop writing ruleset for LLMs. slopwash.com. https://www.slopwash.com
Merriam-Webster. (2025). Slop: Word of the Year 2025. Merriam-Webster. https://www.merriam-webster.com/wordplay/word-of-the-year
Cite this research
Bloomberry Research. AI Sentence DNA: A Corpus Study of Recurring AI-Writing Signals Across Vocabulary, Cadence, Structure, and Style. Version 1.0. June 2026. bloomberry.ai/research/ai-writing-patterns
APA: Bloomberry Research. (2026, June). AI Sentence DNA: A corpus study of recurring AI-writing signals across vocabulary, cadence, structure, and style (Version 1.0). Bloomberry AI. https://bloomberry.ai/research/ai-writing-patterns
Version 1.0 · June 2026 · This corpus is updated as new AI writing patterns emerge. Last audit: June 2026.
Related resources
AI sentence structure
Why AI-generated sentences feel structurally predictable — the 4-beat architectural blueprint.
AI sentence patterns
12 named AI cadence patterns with structural shapes, examples, and human rewrites.
How to spot AI writing patterns
A practical guide to identifying co-occurring vocabulary, cadence, and structural signals.
AI Sentence DNA
The term defined: what AI Sentence DNA means and how the compound signal pattern forms.
AI writing pattern checker
Free tool: paste text and see which AI writing-pattern signals appear.
The Emergence of AI Dialects
The theory behind why AI models developed recognizable writing dialects — and the AI Sentence DNA framework.
AI Writing Fingerprints Vol. 3
The four model archetypes, the 82% cadence finding, and why more capable models have stronger fingerprints.
How AI Detects Your Writing
The four structural fingerprints present in the majority of AI posts — and why better prompts cannot eliminate them.
The Emotional Architecture of AI Writing
Anthropic found 171 functional emotional representations in Claude. What that means for writing patterns.
The ROI of Personal Branding
What Edelman, LinkedIn, and Nielsen research shows about personal brand business outcomes.
AI that writes like you
How Bloomberry trains on your voice to avoid generic AI writing patterns.
AI for executives
Build executive presence without a communications team.
AI writing that learns your voice
The technology behind voice memory and why it matters for authenticity.
Bloomberry vs ChatGPT
Why a purpose-built writing tool outperforms a general assistant for social content.
AI LinkedIn post generator
Generate LinkedIn posts in your voice — not generic AI output.
All Bloomberry research
Explore all reports from the Bloomberry research team.
The Ghostwriter Client Ceiling
Why most ghostwriters plateau long before they expect to — and the role voice management plays in scaling.
Bloomberry detects these patterns in real time and helps rewrite content to sound human.
Every Bloomberry generation runs the live dataset as a filter. Flagged patterns are rewritten against your calibrated voice — not replaced with different clichés.
Bloomberry Research. AI Sentence DNA: A Corpus Study of Recurring AI-Writing Signals Across Vocabulary, Cadence, Structure, and Style. Version 1.0. June 2026. Bloomberry AI. https://bloomberry.ai/research/ai-writing-patterns
APA: Bloomberry Research. (2026, June). AI Sentence DNA: A corpus study of recurring AI-writing signals across vocabulary, cadence, structure, and style (Version 1.0). Bloomberry AI. https://bloomberry.ai/research/ai-writing-patterns
License: CC BY 4.0. Dataset: 7,400+ catalogued AI-writing signal entries. Last audited: June 2026. Related: AI Dialects study.
Why does AI writing sound the same — even when it's supposed to sound like you?
Bloomberry built its Voice Memory Layer specifically to avoid these patterns. Instead of generating from model defaults, it generates from a persistent memory of how each specific person writes — vocabulary, cadence, sentence structure, and tone.