Salesforce dedupe for teams that want to inspect the mess before they merge it
The buyer problem is not subtle. Duplicate leads keep reappearing, reps merge the wrong contacts, two account records fight over ownership, and the team does not trust the duplicate warnings Salesforce shows at save time. Native duplicate rules help with some entry-time alerts. They do not give you a reviewable cluster queue, a supervised merge plan, or receipts you can verify later.
The Gremlin pattern is audit-first dedupe: preflight the org, cluster probable duplicates with blocking-first matching, route the clusters to human review, then apply only approved operations with a receipt file and a follow-up verify step.
The three things buyers actually need
Most duplicate-cleanup projects fail because the team jumps from "we have duplicates" straight to "run the merge." The safer sequence is: investigate how the duplicates cluster, decide what belongs together, then execute under guardrails.
Audit before you merge
Preflight checks org identity, converted LeadStatus values, duplicate rules, and whether lead conversion can execute cleanly.
Cluster before you decide
The planner groups duplicate candidates into clusters with anchors, confidence, a recommended action, and a suggested survivor.
Operators approve the plan
Review happens in CSV or Sheets with approval_status, reviewer_notes, and override_master_id - not in a hidden score threshold.
Every apply leaves evidence
Dry-runs and live applies can emit receipt files with plan digest, operation counts, skips, failures, and successes.
Why Salesforce duplicate rules stop early
Salesforce duplicate management is built around matching rules and duplicate rules. That is useful for spotting possible duplicates when a record is saved. It is not the same thing as clustering a messy CRM offline and preparing a supervised merge run.
The native model is record-save oriented: a user creates or updates a record, Salesforce checks matching rules, then a duplicate rule decides whether to alert or block. That catches some obvious cases, but it does not hand you a durable review queue with suggested survivors, cluster-level notes, and apply receipts.
Salesforce also makes you think in rules, not in clusters. You can have multiple rules on an object, but the engine still evaluates candidate duplicates through matching-rule logic and duplicate-record-set output. That is a different operating surface from "here are the eight records that collapse into one real customer, now approve or hold the cluster."
Gremlin takes the opposite route. It plans first. Blocking anchors shrink the search space, pair scoring creates duplicate edges, union-find turns those edges into clusters, and the human review queue becomes the real decision point.
Alert-or-block is not cluster review
The four-phase loop in shipped code
This is the actual flow in the Salesforce dedupe path: preflight, cluster, review, merge, then verify.
Audit and preflight
Check org identity, converted LeadStatus values, duplicate rules, and anonymous Apex support before planning or applying anything.
Cluster
Run g-gremlin dedup enterprise-plan to build a MergePlan/v2 with blocking-first candidate generation, cluster types, recommended action, survivor, anchors, and review notes.
Review
Use --cluster-review-output to hand operators a cluster queue with approval_status, override_master_id, reviewer_notes, member IDs, confidence, and golden-record suggestions.
Merge and verify
Dry-run g-gremlin sfdc merge-apply-plan by default, then execute only approved operations with --apply, receipts, resumable state, and follow-up verify checks.
Plan the clusters
Real command names, not pseudocode.
g-gremlin dedup enterprise-plan \
--source Account=accounts.csv \
--source Contact=contacts.csv \
--source Lead=leads.csv \
--profile b2b_person \
--output plan.json \
--review-output review_rows.csv \
--cluster-review-output review_clusters.csv \
--overwriteDry-run, then apply
The CLI defaults to preview mode until --apply is present.
g-gremlin sfdc merge-apply-plan \
--plan plan.json \
--approval-file review_clusters.csv \
--receipt-file receipt.jsong-gremlin sfdc merge-apply-plan \
--plan plan.json \
--approval-file review_clusters.csv \
--receipt-file receipt.json \
--state-file state.json \
--applyWhat blocking-first means in practice
Blocking-first means Gremlin does not start with an all-pairs comparison across the whole CRM export. It first groups records by high-signal anchors, then scores pairs within those candidate components. That keeps the review set smaller and makes the evidence easier to explain.
Exact email anchor used to seed high-confidence candidate components.
Exact phone anchor after digit normalization, typically requiring at least seven digits.
Normalized company domain plus normalized name, excluding consumer domains, to catch business-email duplicates without a full scan.
Normalized company plus normalized name, excluding noisy company values, to catch rows that share employer identity but not a clean email anchor.
Algorithm families available in code
The enterprise planner leans on ensembles by field, while the core engine exposes a broader list of algorithm families. This is why the content should talk about exact anchors plus multiple scoring algorithms, not one magic match rule.
Exact, JaroWinkler, WRatio
Exact, JaroWinkler
TokenSetRatio, WRatio, JaroWinkler
WRatio, TokenSetRatio, PartialRatio
WRatio, QRatio, Ratio, PartialRatio, TokenSetRatio, TokenSortRatio, Jaro, JaroWinkler, Soundex, Exact, Domain, ExactDomain
What the operator actually reviews
The review surface is cluster-first. That matters because people do not approve merges one row at a time in a vacuum. They approve a cluster, inspect the anchor evidence, accept or reject the recommended action, and optionally override the survivor.
Review queue fields
Stable cluster key used in review queues, apply operations, skips, and receipts.
same_object, lead_to_contact, or cross_object.
merge or lead_to_contact_resolution based on object mix inside the cluster.
The blocking evidence that pulled the records together in the first place.
Heuristic confidence derived from anchors such as email, phone, domain_name, or company_name.
Human-readable cautions such as shared inbox risk, missing exact anchors, large clusters, or mixed Lead/Contact review.
approved, hold, rejected, or pending once the operator fills out the queue.
Optional operator override for the survivor record.
Example cluster queue row
Gremlin exports CSV for review, then the same file can gate apply.
cluster_id,cluster_type,recommended_action,merge_confidence,golden_record_id,anchors,review_notes,approval_status,override_master_id
cluster_0042,same_object,merge,0.99,003xx00001AAA,email,"Exact email but names diverge; verify shared inbox",approved,
cluster_0049,lead_to_contact,lead_to_contact_resolution,0.95,003xx00001BBB,phone|company_name,"Mixed Lead and Contact cluster; convert only after human review",hold,Reason surfaces you can talk about
The matched anchor used during blocking or pair scoring, surfaced on pair outputs and cluster payloads.
Pair-level evidence for exact email alignment when that signal is present.
Pair-level name similarity evidence when name similarity clears threshold.
Pair-level evidence for exact company alignment on Lead comparisons.
Lower-level pair outputs can carry a rollup score plus flags such as email_exact, name_strong, company_match, or ensemble_signals.
Receipt fields that matter
The receipt stores the command name, plan path, plan digest, and approval-file digest when approvals are used.
dry_run, resume, org alias, and optional state-file path.
All skipped, succeeded, and failed operations are serialized into the receipt payload.
Verification is separate: merge verify checks master/victim state, and convert-lead verify checks converted status and resulting records.
{
"version": 1,
"command": "g-gremlin sfdc merge-apply-plan",
"plan": "plan.json",
"plan_digest": "8dd3...",
"approval_file": "review_clusters.csv",
"dry_run": true,
"operations_planned": 6,
"operations_skipped": 2,
"operations_succeeded": 0,
"operations_failed": 0
}When vendors are the right answer
Dedupely, Plauti, and Cloudingo are not the enemy here. They are often the better pick when you want broad in-org cleanup, larger admin-facing data quality programs, or native object coverage beyond a supervised plan-review-apply loop. The Gremlin advantage is not that other tools cannot dedupe. It is that the architecture starts with audit and human review.
Salesforce duplicate rules
Entry-time alerts and blocking on fields you can encode into matching rules.
Stopping some obvious duplicates before save and creating duplicate record sets for review.
They are rule-driven and record-save oriented, not offline cluster planning with explicit survivor review, receipts, and staged execution.
Dedupely
Continuous duplicate control with customizable merge rules and filters across native and custom Salesforce objects.
Admin-friendly cleanup, merge controls, and ongoing duplicate maintenance inside the Salesforce data stack.
The core mental model is in-app duplicate management, not an offline audit-first loop with plan artifacts, approval CSVs, receipts, and verify steps.
Plauti
Salesforce-native dedupe with review queues, auto-merge options, merge rules, and broader object coverage.
Accounts, Contacts, Leads, custom objects, large data volumes, and org-native operational workflows.
Plauti is built to resolve duplicates in-platform. Gremlin wins when the buyer wants an export-plan-review-apply path with explicit human gating and CLI artifacts.
Cloudingo
Admin-led Salesforce cleanup, import dedupe, and broader data hygiene tasks with no-code filters and rules.
Find, merge, prevent, import, standardize, and bulk-clean records in Salesforce and adjacent import flows.
It is a full data-cleaning toolchain, not a narrower audit-first cluster review workflow for supervised merge plans and post-run verification.
Gremlin audit-first dedupe
Operators who want to inspect duplicates before merging, review clusters in CSV or Sheets, and keep receipts on supervised apply.
Blocking-first planning, human approval queues, dry-runs, resumable execution, and Salesforce verify checks.
The public workflow is strongest for Salesforce Contact and Lead dedupe. It is not the broadest native object-cleanup platform, and I did not verify a rollback command.
Read the cluster next
Every spoke links back here. Start with the query that matches the problem the buyer typed into Google, then come back to this page for the full operating model.
Why Salesforce duplicate rules do not work
Rule-based matching, duplicate-record-set limits, and why alert-or-block is not the same as reviewable clustering.
Merge duplicate leads in Salesforce safely
How to preflight converted status, dry-run the plan, and keep lead-status side effects in view.
Fix duplicate accounts in Salesforce
A candid page on account duplicate cleanup, what native rules do, and where the public Gremlin workflow stops today.
Audit duplicates before merging in Salesforce
What to inspect before a merge run: blocking anchors, review queues, approvals, dry-runs, and receipts.
Dedupe decision code reference
The operator-facing evidence surfaces: anchors, cluster type, merge confidence, review notes, approvals, and skips.
Salesforce dedupe vs Dedupely vs Plauti
An honest comparison of native and vendor-led cleanup paths versus Gremlin audit-first review and apply.
HubSpot duplicate contacts merge
HubSpot contact and company dedupe constraints, auto-association behavior, and the exact-key Gremlin merge path that ships today.
Salesforce dedupe playbook
The applied loop: export, enterprise-plan, review queue, dry-run, and approved apply.
Salesforce lead-status audit
Lead dedupe often wakes routing and status logic, so check what Lead Status actually controls before live merges.
FAQ
How is this different from Salesforce duplicate rules?
Salesforce duplicate rules look for possible duplicates when a record is created or updated. Gremlin builds an offline merge plan first, groups duplicate rows into clusters, lets a human approve or reject each cluster, dry-runs the apply step by default, and can verify the results afterward. The difference is not just matching logic. It is the review and execution contract.
Does Gremlin auto-merge everything?
No. The shipped Salesforce flow is supervised. The planner writes review queues with approval_status and optional override_master_id. The apply command is dry-run by default and only executes live when you pass --apply. Complex mixed Lead and Contact clusters can still be skipped for manual review.
What can the Salesforce apply layer merge today?
The apply layer supports same-object Account, Contact, and Lead merges. It also supports a narrower Lead-to-Contact conversion path for approved mixed clusters when the cluster is exactly one Lead plus one Contact and a converted-status value is provided.
Is there an undo button?
I did not verify a shipped dedupe-specific rollback command in the Salesforce workflow. What exists today is dry-run by default, resumable state, receipts, and post-run verification. That is safer than blind bulk merge, but it is not the same thing as true undo.
When should I pick Plauti, Dedupely, or Cloudingo instead?
Pick those tools when the real job is broad in-org data cleanup, account and custom-object maintenance inside Salesforce, auto-merge rules, or admin-led bulk operations. Pick audit-first dedupe when the job is to inspect the duplicate problem first, route ambiguity to humans, and keep an explicit plan-review-apply-verify loop with artifacts outside the org.
Keep the conversation going
These pages are meant to help operators solve real problems. If you want the next guide, grab the low-friction option. If you need the implementation, not just the guide, book time.
Get the next guide when it ships
I publish architecture guides grounded in real implementations. No generic AI filler.
Use your work email so I can keep the list useful and relevant.
Need the implementation, not just the guide?
Book a 15-minute working session with Mike right on his calendar. Tooling, consulting, or a mix of both is fine.
Open Mike's calendarIf you want me to come in with context, leave your email and a short note before the call.