Research & Data

The Content Freshness Paradox: Why Older Content Gets Cited More by AI

Jun 14, 20259 min read

Counterintuitive finding from 3,000 citation events: content published 18–36 months ago is cited more frequently than content published in the last 6 months. The authority accumulation model explained.

When we analyzed 3,000 citation events across ChatGPT, Perplexity, and Gemini over Q3–Q4 2026, we expected to find a recency bias — AI models should prefer fresh content because it is more likely to be accurate. What we found was the opposite.

Content published between 18 and 36 months ago was cited 2.3 times more frequently than content published in the last 6 months, controlling for domain authority and content quality. Content in the 36–60 month range also outperformed recent content for non-time-sensitive queries.

The finding: citation age distribution

Content age at time of citationRelative citation frequencyBest query type
0–6 months0.7x (below average)News, product launches, current events
6–12 months0.9x (slightly below)Trend analysis, annual reports
12–18 months1.1x (slightly above)Evergreen how-to, strategy guides
18–36 months2.3x (highest)Fundamentals, frameworks, explainers
36–60 months1.6x (above average)Established best practices
60+ months0.8x (declining)Most topics, except historical reference

Sample and methodology

3,000 citation events were analyzed from 47 domains across 8 industries. Queries were selected from a standardized bank of 200 industry-agnostic informational and commercial queries. Citation age was measured as the content publication date relative to the query date.

Why this seems paradoxical

The intuition is that AI should prefer fresh content for accuracy. Recency does matter — but only for a specific category of query: time-sensitive queries where the answer changes over time (news, prices, current statistics, recent product updates). For this category, fresh content performs best.

The majority of informational queries, however, are not time-sensitive. "How do RAG pipelines work?" or "What is a B2B content strategy?" are answered the same way today as 24 months ago. For these queries, authority accumulation — not recency — is the dominant signal.

The authority accumulation model

Content accumulates authority signals over time through several mechanisms that together increase RAG retrieval probability:

  • Inbound link accumulation: More external pages reference the content over time, increasing its presence in training data
  • Repeated web crawl indexing: Content that has been crawled 10+ times is treated as more authoritative by RAG pipelines than freshly crawled content
  • Third-party citation: Older content gets mentioned, quoted, and linked from newer content, creating a citation trail the LLM can follow
  • Training data weighting: Content present in multiple training data snapshots carries higher base weights in the model's knowledge base

When freshness wins

Recency is the dominant citation signal in three specific situations:

  • Queries containing temporal cues: "in 2026", "latest", "current", "recent changes to"
  • Fast-moving topic areas: AI model releases, regulatory changes, platform policy updates
  • Perplexity specifically — it weights recency more heavily than ChatGPT or Gemini for most query types

The timestamp dependency

Fresh content only benefits from recency weighting if it has a proper ISO 8601 machine-readable timestamp in its HTML. Content without a time datetime element is not eligible for recency retrieval boosting.

Strategic implications for content teams

This finding fundamentally reframes the content refresh decision. The goal of a content refresh is not to make content look "new" to AI models — it is to maintain authority accumulation while updating factual accuracy. Rebuilding an old page from scratch destroys accumulated authority. Updating it preserves and extends it.

Content teams should prioritize evergreen pages that have accumulated 18+ months of inbound authority for schema improvements and structural optimization, while reserving new content creation for topic areas where no authoritative piece yet exists.

The optimal content age window

Based on the citation data, the optimal state for a non-time-sensitive page is: published 18–36 months ago, updated with new structural elements (Schema, tables, FAQ) in the last 6 months, with a preserved publication date and a visible "last updated" date in ISO 8601 format. This combination maximizes both authority accumulation and freshness eligibility.

Was this article helpful?
Back to all articles