Back to Playbooks

ICP & Buyer Persona Analysis

Build
~3 days end-to-end

A Series B enterprise SaaS company needed to know exactly which companies to target and which people within those companies to engage. They had six years of HubSpot data (13,000+ deals, 500K+ contacts) but no systematic analysis of what actually predicts a win. We built the full picture in three days.

The Request

From the VP of Sales, through the fund's ops team.

We've been selling for six years. We have thousands of deals in HubSpot but no data-backed answer to two questions: which companies should we target, and who inside those companies should we talk to?

I need:
1. Full ICP analysis — industry, company size, win rates, retention rates, deal values
2. Buyer persona analysis — committee composition, which functions predict wins, seniority patterns
3. All grounded in our actual deal history, not assumptions

Our HubSpot portal has 13,000+ deals and 500K+ contacts. Build me the full picture.

Three Problems That Block a Naive Analysis

The Ghost Contact Problem

Only 61% of closed deals had any contact associated in HubSpot. Sales reps were closing deals but not logging who was involved. Any persona analysis built only on explicit CRM associations would miss nearly 40% of deals.

The 10,000-Record API Cap

HubSpot's engagement API returns a maximum of 10,000 records per type per pull, most-recent-first. A standard extraction captured only recent activity, losing five years of historical engagement data needed for pattern analysis.

Unstructured Titles

With 500K+ contacts, job titles ranged from "VP, Strategic Sourcing & Supplier Diversity" to "admin" to blank fields. No standard taxonomy existed for grouping contacts into functional buyer personas.

The Data Is There — It Just Needs Engineering

Most companies sit on years of CRM data and never analyze it beyond pipeline dashboards. The problem isn't data science — it's data engineering. Beating API caps, resolving ghost contacts, classifying unstructured titles. Once the pipeline is built, the analysis writes itself.

The Pipeline

1

Extract

Pull all deals, contacts, and company data from HubSpot via API

2

Resolve

Build soft associations to connect contacts to deals where CRM links are missing

3

Backfill

Extract 1M+ engagements via monthly time windows to beat the 10K API cap

4

Classify

Map 500K+ job titles into 10 functional personas and 6 seniority levels

5

Analyze

Run Bayesian win-rate analysis, committee signatures, and retention correlations

6

Deliver

Scored ICP matrix, written persona narrative, supporting CSVs, and Google Sheets

After you paste this prompt, Claude Code:

"Six years of HubSpot data. 13K deals, 500K contacts. Three problems to solve before any analysis is possible: ghost contacts, the 10K API cap, and unstructured titles. Let me build this in phases..."

Running
$ g-gremlin hubspot pull deals \ --properties amount,closedate,dealstage,dealtype,pipeline,...
13,247 deals extracted with company associations, industry classification, and employee band mapping.

"Deals are in. But only 61% have contact associations in HubSpot. Reps are closing deals without logging who was involved. I need soft associations to fill the gaps..."

Running
$ python -m src.icp.phase5a_contact_pull --skip-engagement-pull
Loaded 521,474 contacts. Hard associations: 8,774. Building soft associations via 3-tier resolution...

Tier 1: Company match + engagement within deal window. Tier 2: associatedcompanyid link. Tier 3: Email domain match (excluding personal/internal domains). Noise gate: 1 meeting OR 2+ emails. Cap: 20 contacts per deal.

Soft associations built: 25,355 new links across 5,800+ deals. Deal-contact coverage: 61.4% → 74.2%.

After you paste this prompt, Claude Code:

"HubSpot's engagement API caps at 10,000 records per type, most-recent-first. A standard pull only gets the last few months. I need 6 years of history. Solution: iterate monthly time windows..."

Running
$ python -m src.icp.phase5a_backfill_engagements --types emails
Backfilling 74 monthly windows (2020-01 to 2026-02)... Completed: 426,233 emails extracted.
Running
$ python -m src.icp.phase5a_backfill_engagements --types calls
Completed: 240,149 calls extracted.
Running
$ python -m src.icp.phase5a_backfill_engagements --types meetings
Completed: 37,901 meetings extracted.
Running
$ python -m src.icp.phase5a_backfill_engagements --types tasks
Completed: 315,412 tasks extracted.

1,028,551 total engagements extracted (up from 48K with a naive pull). Full history back to 2020.

After you paste this prompt, Claude Code:
Running
$ python -m src.icp.phase5b_persona_map
Classified 34,122 contact-deal associations into 10 functions x 6 seniority levels. 91.5% title fill rate, 62.6% classified into specific functional personas.
Running
$ python -m src.icp.phase5c_persona_analysis
Persona presence, power density, committee signatures, renewal stakeholder analysis complete. 1,816 committee signatures analyzed.
Running
$ python -m src.icp.phase5d_narrative
Written analysis generated: PERSONA_NARRATIVE.md (6 sections). ICP matrix: 163 segments scored.

Full analysis complete. ICP matrix, buyer persona narrative, and supporting data exported.

Key Findings

Finance Is the Hidden Kingmaker

When a Director+ Finance contact was involved in a deal, win rate jumped to 66% vs. 35% baseline — a 1.88x lift. Finance was present on only 5% of won deals, but when they showed up, deals closed at nearly double the rate. Strongest single-persona signal in the entire dataset.

Two Functions Together = 90% Win Rate

Deals with both a core buyer persona AND a Procurement contact had a 90% win rate (n=10). Deals with only one of those functions won at 22-24%. Multi-function committees are a strong predictor — not just who's involved, but the combination.

Legal/Compliance Is an Underrated Positive Signal

Director+ Legal/Compliance presence correlated with a 58% win rate (1.65x lift). This challenges the common assumption that legal involvement slows or kills deals. In this company's sale, legal engagement likely signals procurement maturity and budget commitment.

Technical Stakeholders Predict Retention

IT/Security/Data contacts on renewals showed a 1.51x retention lift (7.1% presence on retained vs. 4.7% on churned). Technical integration depth — having an IT stakeholder engaged — predicts stickiness.

The Dominant Personas Are Not the Winning Ones

The two most common buyer functions (core buyer at 17% and Procurement at 17%) had the lowest win lifts (0.68x and 0.77x). Their presence is table stakes — they're on most deals regardless of outcome. The differentiating signal comes from less common functions like Finance, Legal, and Operations.

Deliverables

ICP Matrix

163 industry x company-size segments scored on win rate, retention rate, average deal value, and cycle efficiency. Bayesian smoothing with sample-size penalties. Tiered into A/B/C/D.

Buyer Persona Narrative

Six-section written analysis: Data Foundation, New Logo Committee Blueprint, Renewal Stakeholder Map, Champion vs. Signer Profiles, Engagement Patterns, Churn Signals.

Supporting Data

All analysis tables delivered as CSVs for reproducibility. Google Sheet with 6 tabs for stakeholder review. Google Doc with formatted narrative.

By the Numbers

7,829
Deals analyzed
521K
Contacts processed
1.03M
Engagements extracted
163
ICP segments scored
Contact-deal associations
34,122
Committee signatures
1,816
Historical coverage
2020–2026

The RevOps Win

In three days, the sales team went from "we think our buyer is Procurement" to a data-backed committee blueprint showing that the highest-win deals involve Finance and Legal alongside the core buyer — functions they weren't systematically engaging. The renewal team learned that technical stakeholder presence predicts retention, giving CSMs a concrete expansion playbook. And leadership got a scored ICP matrix telling them exactly which industry/size segments to double down on.

Try This Workflow

Your CRM has years of answers. Let Gremlin extract them.