Skip to main content

Atomic Duplicate Guard

Status: Shipped 2026-05-17 on dev, on master once verified. Replaces the previous 2-min find-then-insert guard which had a TOCTOU race that allowed concurrent dupes through.

TL;DR

  • Every POST /api/messages claims an atomic lock on (workspaceId, recipientPhone, recipientEmail, exact body text) before sending.
  • If a second identical POST arrives within 15 minutes, it gets the original message back with duplicateOnly: true and no new send fires.
  • Works WITHOUT an Idempotency-Key header — automatic, server-side, no caller change required.
  • Idempotency-Key is still preferred when your client controls retries (24-hour window, exact-request match), but this guard exists for callers that don’t set one.

Why this exists

Real-world incident, 2026-05-15 → 2026-05-17 (Frank Sondors’ Mailforge integration → Tuco workspace org_3C059wfibDQCpSfA0rbXVFsoJUk):
Failure modeCount in 7 daysCause
Concurrent identical POSTs ~90-350 ms apart9 of 12Mailforge fired the same request twice; old guard’s findOne returned null for both before either inserted, so both proceeded.
Identical POST retried 2-15 min later3 of 12Upstream retry-with-backoff. Old 2-min window expired before the retry.
Identical POST replayed 14 h later1 of 12Outside any reasonable window — needs Idempotency-Key.
The fix closes the first two failure modes for every caller without requiring them to send Idempotency-Key.

What gets blocked vs. allowed

Blocked: two POSTs with byte-identical (workspaceId, recipientPhone, recipientEmail, message body) within 15 minutes. Allowed (NOT blocked):
ScenarioWhy it’s allowed
Bulk send to 100 leads using the same templateEach lead has a different recipientPhone/recipientEmail → 100 distinct hashes
Same recipient, body differs by one characterDifferent hash
Same recipient + same body, >15 min apartFirst lock expired via TTL
Same recipient + same body, different workspaceDifferent workspaceId → different hash
Same body to two different leads who happen to share a phoneSame hash, blocked (this is the right call — one human, one phone)
Failed-validation requests (404 lead, 400 attachments, etc.)Guard runs AFTER validation, so failed requests never hold a lock

How it works under the hood

  1. POST arrives, passes auth + validation.
  2. Tuco computes hash = sha256(workspaceId + recipientPhone + recipientEmail + body).
  3. Tuco attempts insertOne into message_dedup_locks with _id = hash and expiresAt = now + 15 min.
  4. Mongo’s built-in unique _id index decides the race:
    • Insert succeeds → caller owns the slot. Proceeds with the send. The resulting message ID is attached to the lock so future retries get the right sibling back. Lock is auto-released if the send hard-fails before the message doc lands.
    • Insert fails with E11000 → another request beat us. We look up the sibling message in messages (workspace + recipient + body, last 15 min) and return it with duplicateOnly: true. If the sibling hasn’t been written yet (the original is still in flight), we poll for up to 3 s, then return duplicateOnly: true, message: null rather than risk a duplicate send.
    • Insert fails with any other error → guard logs the error and falls through. The request proceeds without dedup protection. The guard is built to never break the send path.
  5. A TTL index on expiresAt (with expireAfterSeconds: 0) clears the lock 15 minutes after acquisition. Mongo’s TTL monitor sweeps every ~60 s, so worst-case lock lifetime is ~16 min.

Response shapes

Normal send

{
  "success": true,
  "status": "sent",
  "message": { /* full message doc */ },
  "leadId": "...",
  "ghlContactId": "...",
  "ghlLocationId": "...",
  "hsContactId": null,
  "hsPortalId": null
}
HTTP 201 (or 200 when an existing lead was reused).

Duplicate caught — sibling found

{
  "success": true,
  "duplicateOnly": true,
  "duplicateMessage": "Same message text already sent to this contact 42s ago — returning existing message instead of sending again",
  "message": { /* the original sibling message doc */ },
  "leadId": "...",
  "ghlContactId": "...",
  "ghlLocationId": "...",
  "hsContactId": null,
  "hsPortalId": null
}
HTTP 200. No new send fires. message._id is the original sibling — your CRM workflow can branch on duplicateOnly === true and treat it as a no-op success.

Duplicate caught — original still in flight

{
  "success": true,
  "duplicateOnly": true,
  "duplicateMessage": "A previous identical request is still being processed — refusing to send again. Retry once the original completes.",
  "message": null,
  "leadId": null,
  "ghlContactId": null,
  "ghlLocationId": null,
  "hsContactId": null,
  "hsPortalId": null
}
HTTP 200. The guard saw an active sibling lock but the original’s message doc hadn’t landed within the 3-s poll window. Safer to refuse than risk a duplicate — when the original finishes, a fresh retry from your side will receive the standard “duplicate caught — sibling found” response.

Known sharp edges

  • Failed sends still return the failed sibling on retry within 15 min. If your first send ended in status: "failed" and you immediately retry with the same body, you’ll get the failed sibling back with duplicateOnly: true. To force a fresh send: change the body, wait 15 min for the lock to TTL, or manually delete the lock (db.collection('message_dedup_locks').deleteOne({ _id: '<hash>' })).
  • Polling cost. A duplicate request that arrives before the original’s message doc lands holds the connection for up to 3 s. Real-world: createMessage inserts the message doc within ~50 ms of acquiring the lock, so most polls resolve on the first iteration.
  • Lock leak on process crash. Mitigated by the TTL — locks always expire 15 min after acquisition regardless of process state.

How it interacts with Idempotency-Key

The Idempotency-Key header check at /api/messages runs before this guard. If your client sends an Idempotency-Key, retries of the same request return the original response from idempotency_keys (24-hour window) without ever reaching the dedup guard.
You set Idempotency-Key?Behavior
Yes, same key on retryOriginal response replayed from idempotency_keys — exact-request semantics, 24 h window
Yes, different keyTreated as new request; dedup guard then catches body-level duplicates within 15 min
NoDedup guard is the only protection. 15 min window on (workspace, recipient, body)
Idempotency-Key is preferred when you control the caller. The dedup guard is the safety net for everyone else.

Loki events for ops

EventWhen it fires
message.api.duplicate_blockedA duplicate was caught and a sibling returned. mode is lock_resolved (sibling found immediately) or lock_resolved_polled (sibling found during the 3-s poll).
message.api.duplicate_pendingLock held but sibling never written within the poll window. If you see a flood of these from one workspace, inspect cp_outbound_idempotency and message_dedup_locks for stale rows.
message.api.duplicate_guard_errorNon-E11000 Mongo error on the lock insert. Request proceeds without dedup protection — investigate Mongo health.

What’s NOT changing

  • (workspaceId, contactIdentifiers) unique index on leads. One phone = one lead per workspace.
  • One conversation thread per (workspaceId, recipientPhone).
  • Idempotency-Key semantics on /api/messages (still 24 h, still preferred when callers can set it).
  • HubSpot/GHL workflow plugin behavior — the guard fires identically for every caller because it sits on the shared /api/messages endpoint.

See also