Bloomberry Research · Technical Report · Version 1.0

AI Sentence DNA: A Corpus Study of Recurring AI-Writing Signals Across Vocabulary, Cadence, Structure, and Style

A Bloomberry Research Technical Report  ·  Version 1.0  ·  June 2026  ·  Sadok Hasan

Cite as

Bloomberry Research. AI Sentence DNA: A Corpus Study of Recurring AI-Writing Signals Across Vocabulary, Cadence, Structure, and Style. Version 1.0. June 2026. bloomberry.ai/research/ai-writing-patterns

APA: Bloomberry Research. (2026, June). AI Sentence DNA: A corpus study of recurring AI-writing signals… (Version 1.0). Bloomberry AI.

Abstract

Generative AI systems increasingly produce writing that is fluent, coherent, and semantically plausible. Yet across large volumes of AI-assisted writing, recurring linguistic signals emerge: vocabulary that clusters around abstraction, transitions that perform logic without providing evidence, sentence structures that default toward symmetry, hooks that follow predictable formulas, and conclusions that over-resolve the complexity they were tasked with addressing.

This report introduces Bloomberry's AI Sentence DNA corpus: a 7,400+ entry catalogue of AI-writing signal entries assembled from production enforcement lists, prompt-layer guardrails, regex and cadence detectors, hard-banned reply patterns, replacement rules, finite surface-form variants, and source-backed research-tracked signals drawn from external academic, editorial, and community sources. The corpus is designed to catalogue recurring AI-writing patterns, not to determine authorship.

Our central finding is that AI writing does not become recognizable because of any single word or phrase. It becomes recognizable when signals stack: elevated generic vocabulary, smooth but low-information transitions, symmetrical rhetorical forms, cadence uniformity, predictable opening and closing structures, and low-specificity conclusions. We term this compound pattern layer AI Sentence DNA.

External academic research supports the core vocabulary findings. Kobak et al. (2025), analyzing excess vocabulary in over 15 million PubMed abstracts, documented statistically significant post-ChatGPT increases in words including delve, showcasing, underscores, and pivotal — with frequency ratios reaching 28× baseline. The full corpus is maintained as a structured internal research dataset. Representative examples are published publicly. A research-visible machine-readable subset is available for research and enterprise evaluation.

Counts represent catalogued AI-writing signal entries, not unique phrases or authorship determinations.

About this corpus

Counts include production enforcement entries, prompt guardrails, regex/pattern detectors, structural cadence detectors, hard-banned reply patterns, replacement pairs, finite regex surface-form variants, and source-backed research-tracked AI-writing signals from public research sources.

Counts represent catalogued signal entries, not unique phrases or authorship determinations.

These are writing signals, not authorship determinations. Human writers may use the same phrases, and AI systems may produce outputs outside these patterns.

Version 1.0 · June 2026 · Last corpus audit: June 2026

Corpus summary — Version 1.0 · June 2026

7,400+

Catalogued AI-writing signal entries

4,500+

Unique signal entries

12

Structural cadence detectors

17

Hook patterns

400+

Transition / filler entries

700+

Pattern surface forms

287

Replacement pairs

Count layerEntries
Static ESM corpus raw entries6,246
Runtime persona-specific bans+61
Finite regex surface-form expansion+703
Reviewed source-backed research entries+612
Final audited corpus count7,622
Public label7,400+

7,400+ catalogued AI-writing signal entries across vocabulary, cadence, structure, and style.

Examples are representative of patterns observed across AI-generated content and Bloomberry's repo-verified and source-backed research corpus. These are writing signals, not authorship determinations.

Definitions

AI-writing signal
A word, phrase, cadence structure, rhetorical pattern, or organizational template that appears at elevated frequency in AI-generated or AI-assisted writing relative to natural human writing in equivalent contexts.
Signal entry
One catalogued item in the corpus. Includes vocabulary markers, phrase-level entries, regex pattern detectors, cadence detectors, hook pattern templates, replacement pairs, surface-form variants, and research-tracked candidates.
AI Sentence DNA
The compound stylistic fingerprint produced when multiple AI-writing signals co-occur across sentence structure, vocabulary, rhythm, transitions, and rhetorical framing within a single text or passage.
Production enforcement signal
An entry actively checked, flagged, or screened in Bloomberry's content generation and output screening systems. Directly affects what the system generates or allows through.

Vocabulary Patterns4,500+ entries

Words and phrases that appear at elevated frequency in AI-generated text relative to human writing. Several entries — including delve, showcasing, and underscores — are independently validated by Kobak et al. (2025) with frequency ratios up to 28× pre-LLM baselines.

Phrase / WordModelPattern TypeFrequency
at its coreAll modelsFraming phraseVery High
when it comes toAll modelsTransition fillerVery High
let's unpackChatGPTHook phraseVery High
needless to sayAll modelsFiller phraseHigh
delveChatGPT / ClaudeVocabulary clichéHigh
tapestryClaudeVocabulary clichéHigh
move the needleChatGPTCorporate clichéHigh
paradigm shiftAll modelsCorporate clichéHigh
double down onChatGPTIdiom clichéHigh
speaks volumesAll modelsIdiomatic fillerHigh

Showing a subset. Full dataset: 7,400+ catalogued AI-writing signal entries.

Cadence Structures12 detectors

Sentence-level rhythm patterns that form the structural backbone of AI-generated text. Each is identified by its repeating shape, not just its words. The corpus contains 12 named structural cadence detectors.

Rhetorical Contrast

Model: All models  ·  Frequency: Very High

Structure

  • Negative framing of X
  • Pivot word (but / however / it's not just)
  • Positive reframe of X as Y

Example

"It's not just about getting more done. It's about doing the right things."

Motivational Cadence

Model: ChatGPT / Open-source LLMs  ·  Frequency: High

Structure

  • Short declarative claim
  • Brief expansion or evidence
  • Imperative or payoff statement

Example

"Most people wait for permission. You don't need it. The choice is yours."

Generic Opener

Model: All models  ·  Frequency: Very High

Structure

  • Temporal or world-state frame
  • Present-tense generalization
  • Transition to main claim

Example

"In today's fast-paced landscape, staying ahead requires more than effort."

Aphorism Pattern

Model: Claude / ChatGPT  ·  Frequency: High

Structure

  • Abstract noun or concept
  • Simple declarative predicate
  • Optional contrasting clause

Example

"Clarity is speed. Less is more. The simplest version often wins."

Hedge-Assertion Pair

Model: All models  ·  Frequency: High

Structure

  • Qualifying hedge
  • Assertive claim that follows

Example

"While individual cases vary, the evidence suggests the pattern is consistent."

Resolution Closer

Model: All models  ·  Frequency: Very High

Structure

  • Brief acknowledgment of tension
  • Forward-looking synthesis
  • Clean, earned-feeling ending

Example

"The path forward is clear. The companies that adapt will be the ones that lead."

Hook Patterns17 total

Predictable first-line constructions that AI models default to when opening posts or paragraphs. All 17 named hook patterns are verified through direct analysis of AI-generated LinkedIn and professional writing content.

Pattern NameModelExample
Temporal landscape openerAll models"In today's landscape…"
World-state openerAll models"In a world where…"
Curiosity hookChatGPT / Gemini"Have you ever wondered…"
Candor openerChatGPT"Let's be honest…"
Reveal setupChatGPT"Here's the thing…"

Transition & Filler Phrases400+ entries

Connective phrases that appear between ideas without adding meaning. Among the most reliable signal indicators — AI-generated prose uses these at elevated rates because they create the appearance of logical flow without requiring actual logical connection.

PhraseModelType
at the end of the dayAll modelsSummary filler
in other wordsAll modelsRestatement bridge
ultimatelyAll modelsResolution filler
on the other handAll modelsContrast bridge
needless to sayAll modelsFiller affirmation
the reality isChatGPT / ClaudeReframe opener
less is moreClaude / ChatGPTAphoristic filler
in today's worldAll modelsTemporal filler

Showing a subset. Full transition/filler dataset: 400+ entries.

Replacement Pairs287 pairs

Structured mappings from AI-coded vocabulary and phrases to more direct, concrete human-language alternatives. The most practically actionable category in the corpus.

AI-coded expressionHuman alternative
utilizeuse
facilitatehelp, allow, make possible
leverage (verb)use, apply, draw on
navigate the complexitieshandle, deal with, work through
delve intolook at, examine, explore
unpackexplain, break down, examine
unlock the potentialenable, release, open up
empowerlet, allow, enable
transformativemajor, significant, meaningful
seamlesslysmoothly, without friction
robuststrong, reliable, consistent
holisticcomplete, whole, full-scope
actionable insightspractical steps, specific recommendations
game-changermajor development, significant shift
moving forwardfrom now on, next
at its corefundamentally, essentially
in today's digital age[delete — begin with the actual claim]
it is important to note[delete — state the information directly]
in conclusion[delete — end on a fact or insight]
due to the fact thatbecause
in order toto
serves as ais

Showing a representative selection. Full corpus: 287 verified replacement pairs.

Key findings

What the corpus reveals about AI-writing patterns

1

AI writing style is compound, not lexical

No single word or phrase is a reliable indicator of AI-generated writing. The presence of "pivotal" alone indicates nothing. The pattern of co-occurrence — multiple transition fillers, symmetrical tricolon, vague authority attribution, and a resolution closer in the same 200-word piece — begins to constitute a fingerprint. AI Sentence DNA is the accumulation.

2

AI writing systematically prefers abstraction over specificity

Across all signal categories, AI-generated writing defaults to abstract, reusable language where human writing would use specific, contextual language. Vocabulary like landscape, realm, journey, framework, ecosystem, and transformation could appear in writing about any topic. Concrete human writing is reusable across fewer contexts precisely because it contains specific facts, named people, dates, and observations.

3

Transition and filler phrases are among the highest-signal indicators

The corpus contains 400+ transition and filler phrase entries. AI-generated prose uses them at elevated rates because they create the appearance of logical flow without requiring the writer to construct actual logical connection. High-signal entries include: furthermore, moreover, additionally (as opener), it is important to note, that being said, in conclusion, ultimately.

4

Structural symmetry is a reliable AI cadence signal

Multiple cadence templates share a structural characteristic: symmetry. AI-generated prose applies balanced, parallel structure at rates that exceed natural human writing — at the word, phrase, sentence, paragraph, and document levels. Natural human writing produces more asymmetric structures: abrupt endings, unresolved tensions, arguments that concede more than they recover.

5

Social writing has distinct AI cadence patterns

AI-generated LinkedIn and professional social writing exhibits a distinct pattern that differs from other formats: stacked short lines for false emphasis, the "this is not about X, it is about Y" reframing formula, framework closers that gesture at principles without specifics, and "steal this" phrasing that signals generosity through formula rather than content.

6

External academic research independently validates core vocabulary signals

Kobak et al. (2025), analyzing 15 million+ PubMed abstracts, documented statistically significant post-ChatGPT frequency increases for words appearing in Bloomberry's independent production corpus — including delve (r=28×), showcasing (r=10.2×), and underscores (r=10.9×). The alignment between independently derived lists constitutes cross-validation of the vocabulary signal layer.

7

Pattern reduction requires voice modeling, not list application

Applying a banned-word list to AI-generated output reduces surface vocabulary signal but does not address cadence, structure, specificity, or voice. Effective reduction requires voice calibration from the specific writer's actual patterns, structural interruption of default cadences, specificity injection, transition reduction, and resistance to the clean "resolution closer" that AI systems default toward.

Limitations

What this corpus does not claim

These are writing signals, not authorship determinations. Human writers may use the same phrases, and AI systems may produce outputs outside these patterns.

No single entry proves AI authorship

The presence of any corpus entry in a text does not constitute evidence that the text was written by AI. Every word, phrase, and cadence structure in the corpus also appears in human writing. The corpus's diagnostic utility comes from co-occurrence patterns and signal density, not from the presence of any individual entry.

Human writers may exhibit high signal density

Academic writers, corporate communications professionals, and those trained in formal business writing may produce text that scores high on signal metrics without any AI involvement. The corpus was developed for social and professional writing contexts and should not be applied in academic integrity contexts.

Research-tracked entries are not production-validated

The 612 research-tracked entries added in the June 2026 external expansion are staged candidates, not production-validated signals. They were included based on source backing, not through Bloomberry's own empirical validation. Their elevation to production enforcement requires additional review.

Regex counts differ from unique phrase counts

Finite regex surface-form expansion produces entry counts that exceed unique root phrase counts. A corpus count of 7,400+ cannot be read as "7,400 distinct phrases." It must be read as "7,400 catalogued signal entries" — a technically precise but meaningfully different quantity.

Model behavior changes over time

AI-writing patterns change as models are updated. Kobak et al. (2025) documents that some words spiked sharply after ChatGPT's release and then began declining. The corpus reflects patterns observed through June 2026; some signals may become less prevalent as model defaults shift.

Domain specificity applies

Some signals are more domain-specific than their corpus inclusion acknowledges. Words flagged as AI-style in social writing — stakeholders, deliverables, leverage — are standard vocabulary in certain professional domains. Signal review must be calibrated to context.

Download the full AI Sentence DNA research report

The complete technical report includes the full taxonomy, methodology, source tier classifications, count breakdown tables, all appendices, and the cite-as block. Enter your work email to download.

A research-visible machine-readable subset is available for research and enterprise evaluation. The full internal enforcement corpus is not published.

Explore by model

For a deeper explanation of how these patterns form across models — and the theory behind AI Dialects — see our research on AI writing dialects. This page is data. That page is the framework.

Frequently asked questions

What does 7,400+ AI-writing signal entries mean?+

The 7,400+ count refers to catalogued signal entries — not unique phrases. Signal entries include vocabulary markers, multi-word phrase clichés, regex-pattern detectors, structural cadence detectors, hook pattern formulas, replacement pairs, finite regex surface-form variants, and source-backed research-tracked signals. A regex pattern matching four inflected forms counts as four entries, not one. The precise audited count is 7,622; "7,400+" is the conservative public floor.

Is AI Sentence DNA a detection tool?+

No. AI Sentence DNA is a pattern taxonomy, not an authorship detector. Recurring patterns can be identified when multiple signals co-occur, but these signals should not be used to make authorship determinations. Human writers use these patterns, and AI systems produce writing that avoids them. The corpus is designed to catalogue recurring AI-writing signals and support screening and avoidance systems — not to classify text as human or AI.

What is the difference between a production enforcement signal and a research-tracked signal?+

A production enforcement signal is actively checked, flagged, or screened in Bloomberry's content generation and output screening systems — it directly affects what the system generates or allows through. A research-tracked signal is catalogued in the research corpus based on external source backing but not yet integrated into production enforcement. Of the 7,622 total audited entries, 612 are research-tracked; the remainder are production-validated.

How does Bloomberry use this corpus?+

Bloomberry uses the AI Sentence DNA corpus as the empirical foundation for its AI-writing screening and avoidance systems. The corpus informs vocabulary-level bans, cadence detectors, hard-banned reply patterns, prompt-layer guardrails, and replacement pairs. These systems work together to generate writing that avoids identifiable AI-style fingerprints and stays calibrated to individual user voice.

When was the corpus last updated?+

The corpus was last audited in June 2026, resulting in the Version 1.0 count of 7,622 total entries. The June 2026 audit included a structured cross-reference of externally published AI-writing signal lists, adding 612 source-backed research-tracked entries. The corpus is versioned: significant additions or methodology changes will increment the version number.

References

Kobak, D., González-Márquez, R., Horvát, E.-Á., & Lause, J. (2025). Delving into LLM-assisted writing in biomedical publications through excess vocabulary. Science Advances, 11. https://doi.org/10.1126/sciadv.adt3813

Liang, W., et al. (2025). Quantifying large language model usage in scientific papers. Nature Human Behaviour. https://doi.org/10.1038/s41562-025-02273-8

Wikipedia contributors. (2026, April). Signs of AI writing. Wikipedia. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

Welch, J. (2025, March; updated May 2025). A list of words that AI over-uses. Embryo. https://embryo.com/blog/list-words-ai-overuses/

Antony, A. (2026, March). 300+ ChatGPT words to avoid (overused AI words list 2026). alstonantony.com. https://alstonantony.com/seo-strategy/chatgpt-overused/

Slopwash. (2026). Anti-slop writing ruleset for LLMs. slopwash.com. https://www.slopwash.com

Merriam-Webster. (2025). Slop: Word of the Year 2025. Merriam-Webster. https://www.merriam-webster.com/wordplay/word-of-the-year

Cite this research

Bloomberry Research. AI Sentence DNA: A Corpus Study of Recurring AI-Writing Signals Across Vocabulary, Cadence, Structure, and Style. Version 1.0. June 2026. bloomberry.ai/research/ai-writing-patterns

APA: Bloomberry Research. (2026, June). AI Sentence DNA: A corpus study of recurring AI-writing signals across vocabulary, cadence, structure, and style (Version 1.0). Bloomberry AI. https://bloomberry.ai/research/ai-writing-patterns

Version 1.0 · June 2026 · This corpus is updated as new AI writing patterns emerge. Last audit: June 2026.

Related resources

AI sentence structure

Why AI-generated sentences feel structurally predictable — the 4-beat architectural blueprint.

AI sentence patterns

12 named AI cadence patterns with structural shapes, examples, and human rewrites.

How to spot AI writing patterns

A practical guide to identifying co-occurring vocabulary, cadence, and structural signals.

AI Sentence DNA

The term defined: what AI Sentence DNA means and how the compound signal pattern forms.

AI writing pattern checker

Free tool: paste text and see which AI writing-pattern signals appear.

The Emergence of AI Dialects

The theory behind why AI models developed recognizable writing dialects — and the AI Sentence DNA framework.

AI Writing Fingerprints Vol. 3

The four model archetypes, the 82% cadence finding, and why more capable models have stronger fingerprints.

How AI Detects Your Writing

The four structural fingerprints present in the majority of AI posts — and why better prompts cannot eliminate them.

The Emotional Architecture of AI Writing

Anthropic found 171 functional emotional representations in Claude. What that means for writing patterns.

The ROI of Personal Branding

What Edelman, LinkedIn, and Nielsen research shows about personal brand business outcomes.

AI that writes like you

How Bloomberry trains on your voice to avoid generic AI writing patterns.

AI for executives

Build executive presence without a communications team.

AI writing that learns your voice

The technology behind voice memory and why it matters for authenticity.

Bloomberry vs ChatGPT

Why a purpose-built writing tool outperforms a general assistant for social content.

AI LinkedIn post generator

Generate LinkedIn posts in your voice — not generic AI output.

All Bloomberry research

Explore all reports from the Bloomberry research team.

The Ghostwriter Client Ceiling

Why most ghostwriters plateau long before they expect to — and the role voice management plays in scaling.

Bloomberry detects these patterns in real time and helps rewrite content to sound human.

Every Bloomberry generation runs the live dataset as a filter. Flagged patterns are rewritten against your calibrated voice — not replaced with different clichés.

Cite This Paper

Bloomberry Research. AI Sentence DNA: A Corpus Study of Recurring AI-Writing Signals Across Vocabulary, Cadence, Structure, and Style. Version 1.0. June 2026. Bloomberry AI. https://bloomberry.ai/research/ai-writing-patterns

APA: Bloomberry Research. (2026, June). AI Sentence DNA: A corpus study of recurring AI-writing signals across vocabulary, cadence, structure, and style (Version 1.0). Bloomberry AI. https://bloomberry.ai/research/ai-writing-patterns

License: CC BY 4.0. Dataset: 7,400+ catalogued AI-writing signal entries. Last audited: June 2026. Related: AI Dialects study.

Why does AI writing sound the same — even when it's supposed to sound like you?

Bloomberry built its Voice Memory Layer specifically to avoid these patterns. Instead of generating from model defaults, it generates from a persistent memory of how each specific person writes — vocabulary, cadence, sentence structure, and tone.

Try Bloomberry freeHow voice learning works →