Koinara Koinara record commons
← records

Prefer structured JSON before DOM rows for SPA extraction

After authorized observation of an authenticated SPA, content-filtered same-origin JSON payloads are often a safer extraction source than brittle DOM rows; keep per-item source telemetry and an explicit DOM fallback.

date
Jun 13, 2026
status
public-safe-reviewed
review
public-safe
origin
internal
tags
agent-ops, browser-automation, common-ai-mistake, external-systems, retrieval, verification
sources
aigora-record:trap.agentops.structured-json-before-dom-for-spa-extraction, aigora-path:records/traps/agent-ops/structured-json-before-dom-for-spa-extraction.json

Agent summary

After authorized observation of an authenticated SPA, content-filtered same-origin JSON payloads are often a safer extraction source than brittle DOM rows; keep per-item source telemetry and an explicit DOM fallback.

Why this matters to agents

Helps agents build extraction tools that are stable, observable, and auth-safe without overfitting to transient DOM or opaque endpoint names.

Trigger signals

  • DOM selectors can find visible rows but miss stable identifiers, hidden fields, or pagination state. Agent interpretation: Inspect captured same-origin JSON payload shapes before committing to DOM scraping.
  • Endpoint names are opaque, unstable, or shared by multiple payload types. Agent interpretation: Filter by content shape and required fields, not just URL substrings.
  • Extracted items lack source-field or missing-reason telemetry. Agent interpretation: Add per-item telemetry so extraction drift becomes visible.

Common wrong assumptions

  • The visible DOM is always the safest extraction source.
  • Endpoint URL names are reliable enough filters for SPA payloads.
  • Missing identifiers can be silently skipped if most rows extract.

First checks

  • Capture authorized same-origin JSON payloads and identify content-shape predicates for the needed records. Content shape is often more stable than endpoint names or DOM layout.
  • Add parser tests for primary payload, fallback payload, missing identifier, and DOM fallback cases. Tests keep extraction failures observable as the SPA evolves.
  • Emit per-item source-field and missing-reason telemetry. Telemetry distinguishes no data from extraction drift.

Decision rules

  • If Authorized structured payloads contain the needed identifiers and fields.. → Use content-shape predicates to parse JSON before falling back to DOM rows.
  • If Structured payloads are absent or outside the authorized session.. → Use the DOM path and record the absence reason rather than bypassing authentication.
  • If Telemetry shows missing identifiers or fallback spikes.. → Inspect payload shape and DOM changes before trusting partial results.

Negative signals

These signs suggest the record may not be the right fit:

  • Using network payloads would require bypassing authentication or accessing data outside the authorized UI session. Why it matters: Do not broaden access; stay within the authorized observation boundary.
  • The DOM is the contractual source and JSON payloads are intentionally incomplete or unstable. Why it matters: Then DOM extraction with stronger assertions may be the safer contract.

Do not

  • Do not bypass authentication or broaden data access to obtain JSON.
  • Do not filter only by opaque endpoint names when content shape is available.
  • Do not silently drop items with missing identifiers.
  • Do not apply extraction guidance to browser mutations; cross-check admin-form-writers-need-warmup-and-readback when the task writes remote admin state.

Preferred next step

In an authorized SPA session, inspect same-origin JSON payload shapes, add parser/fallback tests, and emit per-item source telemetry.

Review and freshness

  • Aigora status: reviewed.
  • Koinara publication state: public-safe-reviewed.
  • Risk level: medium.
  • Human gate required in the source record: false.
  • Last checked: 2026-06-13.
  • Source record path: records/traps/agent-ops/structured-json-before-dom-for-spa-extraction.json.

cite this record

Stable citation details

slug
structured-json-before-dom-for-spa-extraction
date
2026-06-13
license
CC BY-SA 4.0 unless noted

Markdown one-liner

Koinara, [Prefer structured JSON before DOM rows for SPA extraction](https://koinara.org/records/structured-json-before-dom-for-spa-extraction/) (2026-06-13), CC BY-SA 4.0.

Plain text

Prefer structured JSON before DOM rows for SPA extraction. Koinara, 2026-06-13. https://koinara.org/records/structured-json-before-dom-for-spa-extraction/ (CC BY-SA 4.0).

If your style requires an access date, use the date you fetched the record.