ICP & Buyer Persona Analysis
BuildA Series B enterprise SaaS company needed to know exactly which companies to target and which people within those companies to engage. They had six years of HubSpot data (13,000+ deals, 500K+ contacts) but no systematic analysis of what actually predicts a win. We built the full picture in three days.
The Request
From the VP of Sales, through the fund's ops team.
We've been selling for six years. We have thousands of deals in HubSpot but no data-backed answer to two questions: which companies should we target, and who inside those companies should we talk to? I need: 1. Full ICP analysis — industry, company size, win rates, retention rates, deal values 2. Buyer persona analysis — committee composition, which functions predict wins, seniority patterns 3. All grounded in our actual deal history, not assumptions Our HubSpot portal has 13,000+ deals and 500K+ contacts. Build me the full picture.
Three Problems That Block a Naive Analysis
The Ghost Contact Problem
Only 61% of closed deals had any contact associated in HubSpot. Sales reps were closing deals but not logging who was involved. Any persona analysis built only on explicit CRM associations would miss nearly 40% of deals.
The 10,000-Record API Cap
HubSpot's engagement API returns a maximum of 10,000 records per type per pull, most-recent-first. A standard extraction captured only recent activity, losing five years of historical engagement data needed for pattern analysis.
Unstructured Titles
With 500K+ contacts, job titles ranged from "VP, Strategic Sourcing & Supplier Diversity" to "admin" to blank fields. No standard taxonomy existed for grouping contacts into functional buyer personas.
The Data Is There — It Just Needs Engineering
Most companies sit on years of CRM data and never analyze it beyond pipeline dashboards. The problem isn't data science — it's data engineering. Beating API caps, resolving ghost contacts, classifying unstructured titles. Once the pipeline is built, the analysis writes itself.
The Pipeline
Extract
Pull all deals, contacts, and company data from HubSpot via API
Resolve
Build soft associations to connect contacts to deals where CRM links are missing
Backfill
Extract 1M+ engagements via monthly time windows to beat the 10K API cap
Classify
Map 500K+ job titles into 10 functional personas and 6 seniority levels
Analyze
Run Bayesian win-rate analysis, committee signatures, and retention correlations
Deliver
Scored ICP matrix, written persona narrative, supporting CSVs, and Google Sheets
"Six years of HubSpot data. 13K deals, 500K contacts. Three problems to solve before any analysis is possible: ghost contacts, the 10K API cap, and unstructured titles. Let me build this in phases..."
"Deals are in. But only 61% have contact associations in HubSpot. Reps are closing deals without logging who was involved. I need soft associations to fill the gaps..."
Tier 1: Company match + engagement within deal window. Tier 2: associatedcompanyid link. Tier 3: Email domain match (excluding personal/internal domains). Noise gate: 1 meeting OR 2+ emails. Cap: 20 contacts per deal.
Soft associations built: 25,355 new links across 5,800+ deals. Deal-contact coverage: 61.4% → 74.2%.
"HubSpot's engagement API caps at 10,000 records per type, most-recent-first. A standard pull only gets the last few months. I need 6 years of history. Solution: iterate monthly time windows..."
1,028,551 total engagements extracted (up from 48K with a naive pull). Full history back to 2020.
Full analysis complete. ICP matrix, buyer persona narrative, and supporting data exported.
Key Findings
Finance Is the Hidden Kingmaker
When a Director+ Finance contact was involved in a deal, win rate jumped to 66% vs. 35% baseline — a 1.88x lift. Finance was present on only 5% of won deals, but when they showed up, deals closed at nearly double the rate. Strongest single-persona signal in the entire dataset.
Two Functions Together = 90% Win Rate
Deals with both a core buyer persona AND a Procurement contact had a 90% win rate (n=10). Deals with only one of those functions won at 22-24%. Multi-function committees are a strong predictor — not just who's involved, but the combination.
Legal/Compliance Is an Underrated Positive Signal
Director+ Legal/Compliance presence correlated with a 58% win rate (1.65x lift). This challenges the common assumption that legal involvement slows or kills deals. In this company's sale, legal engagement likely signals procurement maturity and budget commitment.
Technical Stakeholders Predict Retention
IT/Security/Data contacts on renewals showed a 1.51x retention lift (7.1% presence on retained vs. 4.7% on churned). Technical integration depth — having an IT stakeholder engaged — predicts stickiness.
The Dominant Personas Are Not the Winning Ones
The two most common buyer functions (core buyer at 17% and Procurement at 17%) had the lowest win lifts (0.68x and 0.77x). Their presence is table stakes — they're on most deals regardless of outcome. The differentiating signal comes from less common functions like Finance, Legal, and Operations.
Deliverables
ICP Matrix
163 industry x company-size segments scored on win rate, retention rate, average deal value, and cycle efficiency. Bayesian smoothing with sample-size penalties. Tiered into A/B/C/D.
Buyer Persona Narrative
Six-section written analysis: Data Foundation, New Logo Committee Blueprint, Renewal Stakeholder Map, Champion vs. Signer Profiles, Engagement Patterns, Churn Signals.
Supporting Data
All analysis tables delivered as CSVs for reproducibility. Google Sheet with 6 tabs for stakeholder review. Google Doc with formatted narrative.
By the Numbers
The RevOps Win
In three days, the sales team went from "we think our buyer is Procurement" to a data-backed committee blueprint showing that the highest-win deals involve Finance and Legal alongside the core buyer — functions they weren't systematically engaging. The renewal team learned that technical stakeholder presence predicts retention, giving CSMs a concrete expansion playbook. And leadership got a scored ICP matrix telling them exactly which industry/size segments to double down on.
Try This Workflow
Your CRM has years of answers. Let Gremlin extract them.