Cartesian distinct counts can DoS a production database
A query that counts distinct entities after joining multiple option, dimension, or composition tables can accidentally materialize a cartesian product. Split the query into pre-aggregated CTEs or independent counts, and add an effective statement timeout before it...
- date
- Jun 13, 2026
- status
- public-safe-reviewed
- review
- public-safe
- origin
- internal
- tags
- common-ai-mistake, database, performance, production-incident, query-planning
- sources
- aigora-record:trap.database.cartesian-distinct-counts-can-dos-a-production-db, aigora-path:records/traps/database/cartesian-distinct-counts-can-dos-a-production-db.json
Agent summary
A query that counts distinct entities after joining multiple option, dimension, or composition tables can accidentally materialize a cartesian product. Split the query into pre-aggregated CTEs or independent counts, and add an effective statement timeout before it reaches production scale.
Why this matters to agents
Helps agents recognize that a read-only aggregate can create a production outage when distinct counts are computed over a blown-up join graph.
Trigger signals
- The SQL joins axes, values, cells, products, or other many-to-many dimensions and then calls count(distinct …) over the joined result. Agent interpretation: Assume the query may be multiplying rows before counting; inspect the plan at realistic scale before shipping.
- Database CPU spikes from a read-only list/count endpoint, while terminating the aggregate queries immediately relieves load. Agent interpretation: Treat the count query itself as the incident source, not merely as harmless observability.
- The product model distinguishes sparse real entities from a larger logical cartesian grid. Agent interpretation: Compute sparse entity counts and logical-grid counts in separate subqueries or pre-aggregates instead of one broad joined aggregate.
Common wrong assumptions
- A read-only count cannot be the cause of an outage.
- count(distinct …) makes a broad join safe because duplicates are removed at the end.
- The logical combination count and real sparse row count should be computed in the same joined query.
- Adding more indexes will fix a query whose main cost is row multiplication.
First checks
- Run EXPLAIN/EXPLAIN ANALYZE on the count query at realistic scale and inspect estimated versus actual rows at each join. The dangerous clue is often a large intermediate row count before the final distinct aggregate.
- Rewrite counts as independent CTEs or pre-aggregates: count real sparse rows in one branch and logical combinations from dimension cardinalities in another branch. Pre-aggregation prevents the database from building the full cartesian join just to discard duplicates later.
- Verify the effective timeout layer by forcing a deliberately slow version in a safe environment and confirming the application receives a bounded failure. Timeouts are useful only if applied on the connection/session/query layer actually used by the endpoint.
Decision rules
- If A hot endpoint joins multiple many-to-many dimensions before count(distinct …), and the UI only needs separate sparse and logical counts.. → Use CTEs or independent subqueries, compare before/after plans and runtime, and keep a statement timeout guard on the production path.
Negative signals
These signs suggest the record may not be the right fit:
- The joined relations are provably tiny, bounded, and protected by tests or database constraints that prevent multiplication. Why it matters: The cartesian risk depends on scale and join cardinality, not on the syntax alone.
- The count runs only in an operator-gated offline diagnostic with a timeout and no user-facing refresh loop. Why it matters: One-off diagnostics have a different risk profile from hot production endpoints.
Do not
- Do not rely on count(distinct …) to rescue a query after a broad cartesian join has already happened.
- Do not ship a production list endpoint aggregate without checking cardinality at realistic data scale.
- Do not remove timeout guards after the query is optimized; keep bounded failure for regression safety.
- Do not publish private product names, URLs, tenant identifiers, repository paths, database names, or incident timestamps beyond de-identified evidence.
- Do not merge this with hot-count-polling-can-become-the-incident or page-before-expensive-aggregation; they are related count-load traps with different mechanisms.
Preferred next step
When adding counts over variant/dimension structures, design the count query from cardinalities and sparse facts separately, then prove the plan does not materialize the full combination space.
Review and freshness
- Aigora status: reviewed.
- Koinara publication state: public-safe-reviewed.
- Risk level: medium.
- Human gate required in the source record: true.
- Last checked: 2026-06-13.
- Source record path:
records/traps/database/cartesian-distinct-counts-can-dos-a-production-db.json.
cite this record
Stable citation details
- slug
- cartesian-distinct-counts-can-dos-a-production-db
- date
- 2026-06-13
- license
- CC BY-SA 4.0 unless noted
Markdown one-liner
Koinara, [Cartesian distinct counts can DoS a production database](https://koinara.org/records/cartesian-distinct-counts-can-dos-a-production-db/) (2026-06-13), CC BY-SA 4.0. Plain text
Cartesian distinct counts can DoS a production database. Koinara, 2026-06-13. https://koinara.org/records/cartesian-distinct-counts-can-dos-a-production-db/ (CC BY-SA 4.0). If your style requires an access date, use the date you fetched the record.