Atomic Duplicate Guard

Status: Shipped 2026-05-17 on dev, on master once verified. Replaces the previous 2-min find-then-insert guard which had a TOCTOU race that allowed concurrent dupes through.

TL;DR

Every POST /api/messages claims an atomic lock on (workspaceId, recipientPhone, recipientEmail, exact body text) before sending.
If a second identical POST arrives within 15 minutes, it gets the original message back with duplicateOnly: true and no new send fires.
Works WITHOUT an Idempotency-Key header — automatic, server-side, no caller change required.
Idempotency-Key is still preferred when your client controls retries (24-hour window, exact-request match), but this guard exists for callers that don’t set one.

Why this exists

Real-world incident, 2026-05-15 → 2026-05-17 (Frank Sondors’ Mailforge integration → Tuco workspace org_3C059wfibDQCpSfA0rbXVFsoJUk):

Failure mode	Count in 7 days	Cause
Concurrent identical POSTs ~90-350 ms apart	9 of 12	Mailforge fired the same request twice; old guard’s `findOne` returned null for both before either inserted, so both proceeded.
Identical POST retried 2-15 min later	3 of 12	Upstream retry-with-backoff. Old 2-min window expired before the retry.
Identical POST replayed 14 h later	1 of 12	Outside any reasonable window — needs `Idempotency-Key`.

The fix closes the first two failure modes for every caller without requiring them to send Idempotency-Key.

What gets blocked vs. allowed

Blocked: two POSTs with byte-identical (workspaceId, recipientPhone, recipientEmail, message body) within 15 minutes. Allowed (NOT blocked):

Scenario	Why it’s allowed
Bulk send to 100 leads using the same template	Each lead has a different `recipientPhone`/`recipientEmail` → 100 distinct hashes
Same recipient, body differs by one character	Different hash
Same recipient + same body, >15 min apart	First lock expired via TTL
Same recipient + same body, different workspace	Different `workspaceId` → different hash
Same body to two different leads who happen to share a phone	Same hash, blocked (this is the right call — one human, one phone)
Failed-validation requests (404 lead, 400 attachments, etc.)	Guard runs AFTER validation, so failed requests never hold a lock

How it works under the hood

POST arrives, passes auth + validation.
Tuco computes hash = sha256(workspaceId + recipientPhone + recipientEmail + body).
Tuco attempts insertOne into message_dedup_locks with _id = hash and expiresAt = now + 15 min.
Mongo’s built-in unique _id index decides the race:
- Insert succeeds → caller owns the slot. Proceeds with the send. The resulting message ID is attached to the lock so future retries get the right sibling back. Lock is auto-released if the send hard-fails before the message doc lands.
- Insert fails with E11000 → another request beat us. We look up the sibling message in messages (workspace + recipient + body, last 15 min) and return it with duplicateOnly: true. If the sibling hasn’t been written yet (the original is still in flight), we poll for up to 3 s, then return duplicateOnly: true, message: null rather than risk a duplicate send.
- Insert fails with any other error → guard logs the error and falls through. The request proceeds without dedup protection. The guard is built to never break the send path.
A TTL index on expiresAt (with expireAfterSeconds: 0) clears the lock 15 minutes after acquisition. Mongo’s TTL monitor sweeps every ~60 s, so worst-case lock lifetime is ~16 min.

Response shapes

Normal send

{
  "success": true,
  "status": "sent",
  "message": { /* full message doc */ },
  "leadId": "...",
  "ghlContactId": "...",
  "ghlLocationId": "...",
  "hsContactId": null,
  "hsPortalId": null
}

HTTP 201 (or 200 when an existing lead was reused).

Duplicate caught — sibling found

{
  "success": true,
  "duplicateOnly": true,
  "duplicateMessage": "Same message text already sent to this contact 42s ago — returning existing message instead of sending again",
  "message": { /* the original sibling message doc */ },
  "leadId": "...",
  "ghlContactId": "...",
  "ghlLocationId": "...",
  "hsContactId": null,
  "hsPortalId": null
}

HTTP 200. No new send fires. message._id is the original sibling — your CRM workflow can branch on duplicateOnly === true and treat it as a no-op success.

Duplicate caught — original still in flight

{
  "success": true,
  "duplicateOnly": true,
  "duplicateMessage": "A previous identical request is still being processed — refusing to send again. Retry once the original completes.",
  "message": null,
  "leadId": null,
  "ghlContactId": null,
  "ghlLocationId": null,
  "hsContactId": null,
  "hsPortalId": null
}

HTTP 200. The guard saw an active sibling lock but the original’s message doc hadn’t landed within the 3-s poll window. Safer to refuse than risk a duplicate — when the original finishes, a fresh retry from your side will receive the standard “duplicate caught — sibling found” response.

Known sharp edges

Failed sends still return the failed sibling on retry within 15 min. If your first send ended in status: "failed" and you immediately retry with the same body, you’ll get the failed sibling back with duplicateOnly: true. To force a fresh send: change the body, wait 15 min for the lock to TTL, or manually delete the lock (db.collection('message_dedup_locks').deleteOne({ _id: '<hash>' })).
Polling cost. A duplicate request that arrives before the original’s message doc lands holds the connection for up to 3 s. Real-world: createMessage inserts the message doc within ~50 ms of acquiring the lock, so most polls resolve on the first iteration.
Lock leak on process crash. Mitigated by the TTL — locks always expire 15 min after acquisition regardless of process state.

How it interacts with `Idempotency-Key`

The Idempotency-Key header check at /api/messages runs before this guard. If your client sends an Idempotency-Key, retries of the same request return the original response from idempotency_keys (24-hour window) without ever reaching the dedup guard.

You set `Idempotency-Key`?	Behavior
Yes, same key on retry	Original response replayed from `idempotency_keys` — exact-request semantics, 24 h window
Yes, different key	Treated as new request; dedup guard then catches body-level duplicates within 15 min
No	Dedup guard is the only protection. 15 min window on `(workspace, recipient, body)`

Idempotency-Key is preferred when you control the caller. The dedup guard is the safety net for everyone else.

Loki events for ops

Event	When it fires
`message.api.duplicate_blocked`	A duplicate was caught and a sibling returned. `mode` is `lock_resolved` (sibling found immediately) or `lock_resolved_polled` (sibling found during the 3-s poll).
`message.api.duplicate_pending`	Lock held but sibling never written within the poll window. If you see a flood of these from one workspace, inspect `cp_outbound_idempotency` and `message_dedup_locks` for stale rows.
`message.api.duplicate_guard_error`	Non-E11000 Mongo error on the lock insert. Request proceeds without dedup protection — investigate Mongo health.

What’s NOT changing

(workspaceId, contactIdentifiers) unique index on leads. One phone = one lead per workspace.
One conversation thread per (workspaceId, recipientPhone).
Idempotency-Key semantics on /api/messages (still 24 h, still preferred when callers can set it).
HubSpot/GHL workflow plugin behavior — the guard fires identically for every caller because it sits on the shared /api/messages endpoint.

Overview

Webhooks

Errors and status codes

iMessage Availability

Messages

Voice / Calling

Replies

Lines & Leads

Campaigns

Analytics

Goal Bots

API Tools

Atomic Duplicate Guard — POST /api/messages

Atomic Duplicate Guard

TL;DR

Why this exists

What gets blocked vs. allowed

How it works under the hood

Response shapes

Normal send

Duplicate caught — sibling found

Duplicate caught — original still in flight

Known sharp edges

How it interacts with `Idempotency-Key`

Loki events for ops

What’s NOT changing

See also

​Atomic Duplicate Guard

​TL;DR

​Why this exists

​What gets blocked vs. allowed

​How it works under the hood

​Response shapes

​Normal send

​Duplicate caught — sibling found

​Duplicate caught — original still in flight

​Known sharp edges

​How it interacts with Idempotency-Key

​Loki events for ops

​What’s NOT changing

​See also

Atomic Duplicate Guard

TL;DR

Why this exists

What gets blocked vs. allowed

How it works under the hood

Response shapes

Normal send

Duplicate caught — sibling found

Duplicate caught — original still in flight

Known sharp edges

How it interacts with `Idempotency-Key`

Loki events for ops

What’s NOT changing

See also