Back to Playbooks

Salesforce Contact & Lead Dedupe

Maintain
~1 hour

Duplicates are piling up across Contacts and Leads. You need to find them, get RevOps to review, and merge only what they approve — with receipts for every operation.

The Slack Message

VP
VP Revenue Operations9:14 AM

We've got duplicates everywhere in Salesforce — Contacts with the same email, Leads that are clearly existing customers. Can you clean this up? But I need to review everything before you merge anything. Give me a spreadsheet I can go through with the team.

The Prompt

Kicked off from the terminal with the full workflow spec.

We have ~15K Contacts and ~10K Leads in Salesforce and duplicates are everywhere.
I need to:
1. Export Contacts and Leads
2. Find all duplicate clusters
3. Get a review queue I can hand to the RevOps team in Sheets
4. After they approve, apply only the approved merges with receipts

Use the b2b_person profile. Don't merge anything without my approval.

The Review Queue Is the Product

This is not “magic merge.” Gremlin finds duplicates, recommends the survivor record, and generates a review queue your team can approve in Sheets. Export → plan → review → approve → apply. Nothing changes until your team says go.

Core Workflow

Export, plan, review, dry-run, apply. Every step is supervised.

1

Export from Salesforce

Pull Contacts and Leads via SOQL snapshot or use existing CSVs.

2

Plan & Cluster

Blocking-first candidate generation, fuzzy scoring, cluster classification, survivor selection.

3

Review Queue

Cluster review CSV with recommended actions, confidence, merge rationale. Open in Sheets.

4

Dry-Run Preview

Preview every merge and conversion operation before anything changes.

5

Apply Approved

Execute only approved operations with receipts, state files, and resume support.

After you paste this prompt, Claude Code:

"15K Contacts and 10K Leads. I'll export both objects, run the enterprise planner with the b2b_person profile to avoid coworker-bridge false positives, then generate the review queue."

"Step 1: Export Contacts and Leads from Salesforce. Step 2: Run enterprise-plan with b2b_person profile. Step 3: Generate cluster review CSV. Step 4: Wait for human approval. Step 5: Dry-run, then apply."

Running
$ g-gremlin sfdc snapshot --object Contact --fields Id,Name,Email,Phone,AccountId,Account.Name,Account.Website --out contacts.csv
Exported 15,001 Contact records
Running
$ g-gremlin sfdc snapshot --object Lead --fields Id,Name,Email,Phone,Company,Website,Status --out leads.csv
Exported 10,000 Lead records

"Good. 25K total records. The b2b_person profile uses email, phone, and company_name blocking — no domain_name blocking, which avoids chaining coworkers who share a domain."

Running
$ g-gremlin dedup enterprise-plan \ --source Contact=contacts.csv \ --source Lead=leads.csv \ --profile b2b_person \ --output plan.json \ --review-output review_rows.csv \ --cluster-review-output review_clusters.csv \ --overwrite
Planning complete: 25,001 records → 8 clusters identified in 7.08s. Largest cluster: 4 members.

"8 clusters from 25K records — that's clean data with a few real duplicates. The review_clusters.csv has cluster_id, recommended_action, merge_confidence, golden_record_id, and review_notes. I'll hand this to the team."

Plan generated. review_clusters.csv ready for RevOps review in Sheets. 8 clusters, 0 coworker-bridge false positives.

The Review Queue

This is what your RevOps team sees. Each cluster has a recommended action, confidence score, and merge rationale. They fill in approval_status and optionally override the master record.

cluster_idrecommended_actionmerge_confidencegolden_record_idreview_notesapproval_status
C-001merge0.97003AExact email matchapproved
C-002lead_to_contact_resolution0.92003BLead matches existing Contactapproved
C-003merge0.78003CMixed cluster: multiple Leads + Contacts — skipped during applyrejected
After you paste this prompt, Claude Code:

"RevOps approved 6 of 8 clusters. 2 mixed clusters (multiple Leads + multiple Contacts) were rejected — those get skipped during apply. Let me dry-run first, then apply."

Running
$ g-gremlin sfdc merge-apply-plan \ --plan plan.json \ --approval-file review_clusters.csv \ --receipt-file receipt.json
Dry-run complete. 6 approved clusters: 4 same-object Contact merges, 2 Lead-to-Contact conversions. 0 errors.

"Dry-run looks clean. 4 Contact merges (2 victims each), 2 Lead conversions. Applying with state file for resume support."

Running
$ g-gremlin sfdc merge-apply-plan \ --plan plan.json \ --approval-file review_clusters.csv \ --receipt-file receipt.json \ --state-file state.json \ --apply \ --converted-status "Qualified" \ --progress-every 1
Applied 6/6 approved operations. 4 Contact merges succeeded. 2 Lead conversions succeeded. 0 failures. Receipt written to receipt.json.

All approved merges applied. 2 clusters remain for manual review. Full receipt in receipt.json with before/after state.

Safety Guarantees

Matching runs locally on your machine. No CRM data leaves your environment.
Dry-run by default. Nothing changes until you pass --apply.
Receipts and resumable state files for every operation. Ctrl-C saves progress.
Review gates stay in front of execution. Complex clusters flagged for manual review.
Writes go to your Salesforce org using your own auth context.
Retries with exponential backoff on lock errors. --fail-fast available if preferred.

Requirements

Salesforce Connected App

Export and merge operations

Required

Gremlin CLI 0.1.21+

Enterprise dedupe commands

Required

Contact and Lead exports

CSV or live SOQL snapshot

Required

Author Apex permission

Required for Lead-to-Contact conversion

Required

Results

25,001
Records scanned
8
Clusters identified
7.08s
Planning time
6
Merges applied
Zero false positives

b2b_person profile eliminated coworker-bridge matches

Full audit trail

Receipt with before/after state for every merged record

Try This Workflow

Start with two small CSVs locally. No Salesforce connection needed to see the review queue.

Playbook: Salesforce Contact & Lead Dedupe | Gremlin CLI | FoundryOps