GuideSalesforceDedupeAudit-first

Salesforce dedupe for teams that want to inspect the mess before they merge it

Salesforce dedupe is the process of detecting duplicate Salesforce records, grouping them into reviewable clusters, and merging approved records under controlled apply rules.

The buyer problem is not subtle. Duplicate leads keep reappearing, reps merge the wrong contacts, two account records fight over ownership, and the team does not trust the duplicate warnings Salesforce shows at save time. Native duplicate rules help with some entry-time alerts. They do not give you a reviewable cluster queue, a supervised merge plan, or receipts you can verify later.

The Gremlin pattern is audit-first dedupe: preflight the org, cluster probable duplicates with blocking-first matching, route the clusters to human review, then apply only approved operations with a receipt file and a follow-up verify step.

Open the dedupe playbook Read the duplicate-rules spoke

The three things buyers actually need

Most duplicate-cleanup projects fail because the team jumps from "we have duplicates" straight to "run the merge." The safer sequence is: investigate how the duplicates cluster, decide what belongs together, then execute under guardrails.

Audit before you merge

Preflight checks org identity, converted LeadStatus values, duplicate rules, and whether lead conversion can execute cleanly.

Cluster before you decide

The planner groups duplicate candidates into clusters with anchors, confidence, a recommended action, and a suggested survivor.

Operators approve the plan

Review happens in CSV or Sheets with approval_status, reviewer_notes, and override_master_id - not in a hidden score threshold.

Every apply leaves evidence

Dry-runs and live applies can emit receipt files with plan digest, operation counts, skips, failures, and successes.

Why Salesforce duplicate rules stop early

Salesforce duplicate management is built around matching rules and duplicate rules. That is useful for spotting possible duplicates when a record is saved. It is not the same thing as clustering a messy CRM offline and preparing a supervised merge run.

The native model is record-save oriented: a user creates or updates a record, Salesforce checks matching rules, then a duplicate rule decides whether to alert or block. That catches some obvious cases, but it does not hand you a durable review queue with suggested survivors, cluster-level notes, and apply receipts.

Salesforce also makes you think in rules, not in clusters. You can have multiple rules on an object, but the engine still evaluates candidate duplicates through matching-rule logic and duplicate-record-set output. That is a different operating surface from "here are the eight records that collapse into one real customer, now approve or hold the cluster."

Gremlin takes the opposite route. It plans first. Blocking anchors shrink the search space, pair scoring creates duplicate edges, union-find turns those edges into clusters, and the human review queue becomes the real decision point.

Audit-first difference

Alert-or-block is not cluster review

Native duplicate rules surface possible duplicates at save time.

Gremlin surfaces clusters, survivors, anchors, review notes, and approvals.

Native rules do not give you dry-run apply receipts and verify steps.

If lead dedupe touches conversion or routing behavior, inspect Lead Status before you run merges.

The four-phase loop in shipped code

This is the actual flow in the Salesforce dedupe path: preflight, cluster, review, merge, then verify.

Audit and preflight

Check org identity, converted LeadStatus values, duplicate rules, and anonymous Apex support before planning or applying anything.

Cluster

Run g-gremlin dedup enterprise-plan to build a MergePlan/v2 with blocking-first candidate generation, cluster types, recommended action, survivor, anchors, and review notes.

Review

Use --cluster-review-output to hand operators a cluster queue with approval_status, override_master_id, reviewer_notes, member IDs, confidence, and golden-record suggestions.

Merge and verify

Dry-run g-gremlin sfdc merge-apply-plan by default, then execute only approved operations with --apply, receipts, resumable state, and follow-up verify checks.

Plan the clusters

Real command names, not pseudocode.

g-gremlin dedup enterprise-plan \
  --source Account=accounts.csv \
  --source Contact=contacts.csv \
  --source Lead=leads.csv \
  --profile b2b_person \
  --output plan.json \
  --review-output review_rows.csv \
  --cluster-review-output review_clusters.csv \
  --overwrite

Dry-run, then apply

The CLI defaults to preview mode until --apply is present.

g-gremlin sfdc merge-apply-plan \
  --plan plan.json \
  --approval-file review_clusters.csv \
  --receipt-file receipt.json

g-gremlin sfdc merge-apply-plan \
  --plan plan.json \
  --approval-file review_clusters.csv \
  --receipt-file receipt.json \
  --state-file state.json \
  --apply

What blocking-first means in practice

Blocking-first means Gremlin does not start with an all-pairs comparison across the whole CRM export. It first groups records by high-signal anchors, then scores pairs within those candidate components. That keeps the review set smaller and makes the evidence easier to explain.

Exact email anchor used to seed high-confidence candidate components.

phone

Exact phone anchor after digit normalization, typically requiring at least seven digits.

domain_name

Normalized company domain plus normalized name, excluding consumer domains, to catch business-email duplicates without a full scan.

company_name

Normalized company plus normalized name, excluding noisy company values, to catch rows that share employer identity but not a clean email anchor.

Algorithm families available in code

The enterprise planner leans on ensembles by field, while the core engine exposes a broader list of algorithm families. This is why the content should talk about exact anchors plus multiple scoring algorithms, not one magic match rule.

Email ensemble

Exact, JaroWinkler, WRatio

Phone ensemble

Exact, JaroWinkler

Domain ensemble

TokenSetRatio, WRatio, JaroWinkler

Name and company ensemble

WRatio, TokenSetRatio, PartialRatio

Core engine families

WRatio, QRatio, Ratio, PartialRatio, TokenSetRatio, TokenSortRatio, Jaro, JaroWinkler, Soundex, Exact, Domain, ExactDomain

What the operator actually reviews

The review surface is cluster-first. That matters because people do not approve merges one row at a time in a vacuum. They approve a cluster, inspect the anchor evidence, accept or reject the recommended action, and optionally override the survivor.

Review queue fields

cluster_id

Stable cluster key used in review queues, apply operations, skips, and receipts.

cluster_type

same_object, lead_to_contact, or cross_object.

recommended_action

merge or lead_to_contact_resolution based on object mix inside the cluster.

anchors

The blocking evidence that pulled the records together in the first place.

merge_confidence

Heuristic confidence derived from anchors such as email, phone, domain_name, or company_name.

review_notes

Human-readable cautions such as shared inbox risk, missing exact anchors, large clusters, or mixed Lead/Contact review.

approval_status

approved, hold, rejected, or pending once the operator fills out the queue.

override_master_id

Optional operator override for the survivor record.

Example cluster queue row

Gremlin exports CSV for review, then the same file can gate apply.

cluster_id,cluster_type,recommended_action,merge_confidence,golden_record_id,anchors,review_notes,approval_status,override_master_id
cluster_0042,same_object,merge,0.99,003xx00001AAA,email,"Exact email but names diverge; verify shared inbox",approved,
cluster_0049,lead_to_contact,lead_to_contact_resolution,0.95,003xx00001BBB,phone|company_name,"Mixed Lead and Contact cluster; convert only after human review",hold,

Reason surfaces you can talk about

anchors_applied

The matched anchor used during blocking or pair scoring, surfaced on pair outputs and cluster payloads.

Email->Email

Pair-level evidence for exact email alignment when that signal is present.

Name->Name

Pair-level name similarity evidence when name similarity clears threshold.

Company->Company

Pair-level evidence for exact company alignment on Lead comparisons.

evidence_strength and evidence_flags

Lower-level pair outputs can carry a rollup score plus flags such as email_exact, name_strong, company_match, or ensemble_signals.

Receipt fields that matter

command and plan digest

The receipt stores the command name, plan path, plan digest, and approval-file digest when approvals are used.

execution posture

dry_run, resume, org alias, and optional state-file path.

operation evidence

All skipped, succeeded, and failed operations are serialized into the receipt payload.

verification path

Verification is separate: merge verify checks master/victim state, and convert-lead verify checks converted status and resulting records.

{
  "version": 1,
  "command": "g-gremlin sfdc merge-apply-plan",
  "plan": "plan.json",
  "plan_digest": "8dd3...",
  "approval_file": "review_clusters.csv",
  "dry_run": true,
  "operations_planned": 6,
  "operations_skipped": 2,
  "operations_succeeded": 0,
  "operations_failed": 0
}

Where the difference shows up

The public workflow is strongest for Salesforce Contact and Lead dedupe. It is not the broadest native object-cleanup platform, and I did not verify a rollback command.

Read the cluster next

Every spoke links back here. Start with the query that matches the problem the buyer typed into Google, then come back to this page for the full operating model.

Why Salesforce duplicate rules do not work

Rule-based matching, duplicate-record-set limits, and why alert-or-block is not the same as reviewable clustering.

Merge duplicate leads in Salesforce safely

How to preflight converted status, dry-run the plan, and keep lead-status side effects in view.

Fix duplicate accounts in Salesforce

A candid page on account duplicate cleanup, what native rules do, and where the public Gremlin workflow stops today.

Audit duplicates before merging in Salesforce

What to inspect before a merge run: blocking anchors, review queues, approvals, dry-runs, and receipts.

Dedupe decision code reference

The operator-facing evidence surfaces: anchors, cluster type, merge confidence, review notes, approvals, and skips.

Salesforce dedupe vs Dedupely vs Plauti

An honest comparison of native and vendor-led cleanup paths versus Gremlin audit-first review and apply.

HubSpot duplicate contacts merge

HubSpot contact and company dedupe constraints, auto-association behavior, and the exact-key Gremlin merge path that ships today.

Salesforce dedupe playbook

The applied loop: export, enterprise-plan, review queue, dry-run, and approved apply.

Salesforce lead-status audit

Lead dedupe often wakes routing and status logic, so check what Lead Status actually controls before live merges.

FAQ

How is this different from Salesforce duplicate rules?

Salesforce duplicate rules flag possible duplicates at create or update time. Gremlin builds an offline cluster plan, writes review queues, requires human approval, dry-runs apply by default, and verifies results after merge.

Does Gremlin auto-merge everything?

No. The Salesforce flow is supervised. The planner writes review queues with approval_status and optional override_master_id. Live apply requires --apply; complex mixed clusters can be skipped for manual review.

What can the Salesforce apply layer merge today?

The apply layer supports same-object Account, Contact, and Lead merges. It also supports a narrower Lead-to-Contact conversion path for approved mixed clusters when the cluster is exactly one Lead plus one Contact and a converted-status value is provided.

Is there an undo button?

I did not verify a shipped dedupe-specific rollback command in the Salesforce workflow. What exists today is dry-run by default, resumable state, receipts, and post-run verification. That is safer than blind bulk merge, but it is not the same thing as true undo.

When should I pick Plauti, Dedupely, or Cloudingo instead?

Pick those tools for broad in-org cleanup, admin-led bulk operations, account maintenance, custom-object cleanup, or auto-merge rules. Pick audit-first dedupe for a plan-review-apply-verify loop with artifacts outside Salesforce.

Keep the conversation going

These pages are meant to help operators solve real problems. If you want the next guide, grab the low-friction option. If you need the implementation, not just the guide, book time.

Stay in the loop

Get the next guide when it ships

I publish architecture guides grounded in real implementations. No generic AI filler.

Use your work email so I can keep the list useful and relevant.

Book Mike directly

Need the implementation, not just the guide?

Book a 15-minute working session with Mike right on his calendar. Tooling, consulting, or a mix of both is fine.

Open Mike's calendar

If you want me to come in with context, leave your email and a short note before the call.

I'll route new requests into the internal website inquiries inbox so I can follow up fast.