PapersAdda runs a two-stage deterministic verification gate on every article before it reaches a Cloudflare Pages deploy. This page documents exactly what that gate does, what claims it catches, and how correction requests are handled. We publish this because a site cited in AI Overviews owes its readers a verifiable account of how its facts are checked.
Gate built: 2026-05-24. Last updated: 2026-05-24. Author: Aditya Sharma.
In May 2026, an internal content audit found three fabricated claims across live articles: an invented source citation ("BharatNQT 2026 disclosure"), a fabricated anonymous-recruiter quote, and a false exam-pattern change claim ("Programming-Logic raised to 60%"). All three appeared in pages with active AI Overview citations. They were stripped the same day.
The root cause was an AI-assisted drafting pipeline without a deterministic claim-level gate. The existing structural shield caught template-pattern fabrications but missed specific numeric claims that looked syntactically correct.
The verification gate described below closes that gap. It does not use an LLM to find fabrications produced by an LLM. It uses regex-based deterministic extraction followed by rule-based status assignment under invariants that cannot be weakened without a failing fixture test.
The claim extractor (scripts/claim-extractor.py) reads every article in content/*.md and identifies every falsifiable claim using deterministic regex patterns. A falsifiable claim is any concrete specific that can be checked against a primary source: a salary or CTC figure, a percentage, an exam date or cycle reference, an eligibility number, a cutoff, a fee, a deadline, a count, a superlative, or a "X raised to Y" change statement.
Each extracted claim is assigned a risk class:
| Class | Claim types | Default action if unverified |
|---|---|---|
| HIGH | Salary / CTC figures, exam dates, eligibility numbers, cutoffs, fees, deadlines | Strip or soften if unverified and undisclosed. No exceptions. |
| MED | Percentages, counts ("4,400 roles"), change-claims ("raised to 60%") | Soften to non-asserted phrasing if unverified. |
| LOW | Superlatives, vague quantifiers | Flag for review; remove if not supportable. |
Practice-drill pages (profit-and-loss, percentages, DI) are exempt from the claim gate. Their numbers are pedagogical math, not real-world facts, and are owned by the structural shield.
The verifier (scripts/claim-verifier.py) assigns each extracted claim a status (VERIFIED, SECONDARY, BLOCKED, STRIP_OR_SOFTEN) and a recommended action, under the following hard invariants. These invariants are locked by a 15-fixture deterministic test suite. Any change to the verifier that breaks a fixture is a regression that blocks the commit.
A claim is VERIFIED only when the host URL resolves 200 and is on the per-company official-source allowlist. Every primary source is listed in verification-sources.yaml alongside the company's career subdomain. A secondary source (coaching blog, aggregator, or another prep site) never upgrades a claim to VERIFIED.
When a primary source returns a 403, timeout, or bot-block, the claim status stays BLOCKED. We do not promote an unconfirmed claim to VERIFIED on the basis of a failed fetch. This is the most common failure mode: careers.tcs.com and infosys.com both 403 on automated user agents.
Glassdoor, AmbitionBox, Wikipedia, Quora, and other prep sites are secondary sources. If a candidate salary number traces back only to Glassdoor, the claim is INVESTIGATE (not VERIFIED), and we attribute it as "candidate-reported via Glassdoor" rather than asserting it as fact.
Any HIGH-risk claim (money, date, eligibility, cutoff, exam_pattern, fee, deadline) that is unverified and not explicitly disclosed defaults to the STRIP_OR_SOFTEN action. The gate rejects the content, not the editor.
A 200 response only counts as VERIFIED if the final effective host after all HTTP redirects is still on the allowlist. A lookalike domain or a dangling-subdomain redirect does not pass.
A disclaimer ("candidate reports", "Source: official page") rescues claims only within three lines of the disclaimer or its immediately-adjacent table block. A single boilerplate disclosure at the top of an article cannot launder specific numbers 800 words below it.
Claim is VERIFIED against a current official primary source. Ship as-is.
Unverified but honestly disclosed with adjacency-scoped attribution. The acceptable resting state for most candidate-reported numbers.
MED or LOW-risk unverified asserted claim. Reword to non-asserted phrasing ("candidates report" rather than "the cutoff is").
HIGH-risk unverified asserted claim with no disclosure. The human must strip the specific or reframe it as candidate-reported before the article ships. This is the must-fix set.
The pre-commit hook runs verification-gate-precommit.sh on any staged content/*.md file before the commit is written. It is bypassable (with VERIFY_GATE_BYPASS=1), and every bypass is audit-logged to the claim-health ledger.
Every push to master triggers the GitHub Actions workflow at.github/workflows/deploy-and-notify.yml. The verification gate runs on changed content files after the structural shield and before the Cloudflare Pages deploy step. If the gate produces any STRIP_OR_SOFTEN result on a FACT-class page, the deploy does not run. There is no bypass path on the CI side.
A corpus-wide scan (569 pages as of 2026-05-24) produces a claim-health ledger at_pipeline/claim-health/CLAIM_HEALTH_LEDGER.md in the public GitHub repository. It lists every remaining STRIP_OR_SOFTEN and SOFTEN claim site-wide, ranked by traffic priority. We work it down as part of the regular content refresh cycle.
The per-company official-source allowlist lives in scripts/verification-sources.yaml. Each entry names the canonical career subdomain (for example, careers.tcs.com, infosys.com/careers), documents known bot-blocking behaviour, and sets the freshness TTL for each claim class.
Where official primary sources are bot-blocked or return a 403, we do not assert those specifics as facts. We either attribute the number to a verified candidate thread (LinkedIn, r/developersIndia, offer-letter screenshot via the Share Your Story programme) with an explicit "candidate-reported" label, or we omit the specific entirely.
Sources we treat as secondary (never primary for verification): Glassdoor, AmbitionBox, Quora, Wikipedia, other prep sites, and PapersAdda itself. Secondary sources can provide context but cannot verify a claim.
If a salary figure, cutoff, eligibility number, or exam pattern in any article does not match your primary source, email us. Corrections verified within 48 hours.
Last updated: 2026-05-24 · Owner: Aditya Sharma · Contact: editorial@papersadda.com · Gate source: github.com/Declan142/placement-papers