Atomic Duplicate Guard
Status: Shipped 2026-05-17 ondev, onmasteronce verified. Replaces the previous 2-min find-then-insert guard which had a TOCTOU race that allowed concurrent dupes through.
TL;DR
- Every
POST /api/messagesclaims an atomic lock on(workspaceId, recipientPhone, recipientEmail, exact body text)before sending. - If a second identical POST arrives within 15 minutes, it gets the original message back with
duplicateOnly: trueand no new send fires. - Works WITHOUT an
Idempotency-Keyheader — automatic, server-side, no caller change required. - Idempotency-Key is still preferred when your client controls retries (24-hour window, exact-request match), but this guard exists for callers that don’t set one.
Why this exists
Real-world incident, 2026-05-15 → 2026-05-17 (Frank Sondors’ Mailforge integration → Tuco workspaceorg_3C059wfibDQCpSfA0rbXVFsoJUk):
| Failure mode | Count in 7 days | Cause |
|---|---|---|
| Concurrent identical POSTs ~90-350 ms apart | 9 of 12 | Mailforge fired the same request twice; old guard’s findOne returned null for both before either inserted, so both proceeded. |
| Identical POST retried 2-15 min later | 3 of 12 | Upstream retry-with-backoff. Old 2-min window expired before the retry. |
| Identical POST replayed 14 h later | 1 of 12 | Outside any reasonable window — needs Idempotency-Key. |
Idempotency-Key.
What gets blocked vs. allowed
Blocked: two POSTs with byte-identical(workspaceId, recipientPhone, recipientEmail, message body) within 15 minutes.
Allowed (NOT blocked):
| Scenario | Why it’s allowed |
|---|---|
| Bulk send to 100 leads using the same template | Each lead has a different recipientPhone/recipientEmail → 100 distinct hashes |
| Same recipient, body differs by one character | Different hash |
| Same recipient + same body, >15 min apart | First lock expired via TTL |
| Same recipient + same body, different workspace | Different workspaceId → different hash |
| Same body to two different leads who happen to share a phone | Same hash, blocked (this is the right call — one human, one phone) |
| Failed-validation requests (404 lead, 400 attachments, etc.) | Guard runs AFTER validation, so failed requests never hold a lock |
How it works under the hood
- POST arrives, passes auth + validation.
- Tuco computes
hash = sha256(workspaceId + recipientPhone + recipientEmail + body). - Tuco attempts
insertOneintomessage_dedup_lockswith_id = hashandexpiresAt = now + 15 min. - Mongo’s built-in unique
_idindex decides the race:- Insert succeeds → caller owns the slot. Proceeds with the send. The resulting message ID is attached to the lock so future retries get the right sibling back. Lock is auto-released if the send hard-fails before the message doc lands.
- Insert fails with E11000 → another request beat us. We look up the sibling message in
messages(workspace + recipient + body, last 15 min) and return it withduplicateOnly: true. If the sibling hasn’t been written yet (the original is still in flight), we poll for up to 3 s, then returnduplicateOnly: true, message: nullrather than risk a duplicate send. - Insert fails with any other error → guard logs the error and falls through. The request proceeds without dedup protection. The guard is built to never break the send path.
- A TTL index on
expiresAt(withexpireAfterSeconds: 0) clears the lock 15 minutes after acquisition. Mongo’s TTL monitor sweeps every ~60 s, so worst-case lock lifetime is ~16 min.
Response shapes
Normal send
Duplicate caught — sibling found
message._id is the original sibling — your CRM workflow can branch on duplicateOnly === true and treat it as a no-op success.
Duplicate caught — original still in flight
Known sharp edges
- Failed sends still return the failed sibling on retry within 15 min. If your first send ended in
status: "failed"and you immediately retry with the same body, you’ll get the failed sibling back withduplicateOnly: true. To force a fresh send: change the body, wait 15 min for the lock to TTL, or manually delete the lock (db.collection('message_dedup_locks').deleteOne({ _id: '<hash>' })). - Polling cost. A duplicate request that arrives before the original’s message doc lands holds the connection for up to 3 s. Real-world:
createMessageinserts the message doc within ~50 ms of acquiring the lock, so most polls resolve on the first iteration. - Lock leak on process crash. Mitigated by the TTL — locks always expire 15 min after acquisition regardless of process state.
How it interacts with Idempotency-Key
The Idempotency-Key header check at /api/messages runs before this guard. If your client sends an Idempotency-Key, retries of the same request return the original response from idempotency_keys (24-hour window) without ever reaching the dedup guard.
You set Idempotency-Key? | Behavior |
|---|---|
| Yes, same key on retry | Original response replayed from idempotency_keys — exact-request semantics, 24 h window |
| Yes, different key | Treated as new request; dedup guard then catches body-level duplicates within 15 min |
| No | Dedup guard is the only protection. 15 min window on (workspace, recipient, body) |
Idempotency-Key is preferred when you control the caller. The dedup guard is the safety net for everyone else.
Loki events for ops
| Event | When it fires |
|---|---|
message.api.duplicate_blocked | A duplicate was caught and a sibling returned. mode is lock_resolved (sibling found immediately) or lock_resolved_polled (sibling found during the 3-s poll). |
message.api.duplicate_pending | Lock held but sibling never written within the poll window. If you see a flood of these from one workspace, inspect cp_outbound_idempotency and message_dedup_locks for stale rows. |
message.api.duplicate_guard_error | Non-E11000 Mongo error on the lock insert. Request proceeds without dedup protection — investigate Mongo health. |
What’s NOT changing
(workspaceId, contactIdentifiers)unique index onleads. One phone = one lead per workspace.- One conversation thread per
(workspaceId, recipientPhone). Idempotency-Keysemantics on/api/messages(still 24 h, still preferred when callers can set it).- HubSpot/GHL workflow plugin behavior — the guard fires identically for every caller because it sits on the shared
/api/messagesendpoint.
See also
api-reference/endpoint/send-message— endpoint referencefeatures/send-messages— feature overview- Source:
src/lib/dedupGuard.ts,src/app/api/messages/route.ts(tuco-app)