Topline Pro
Internal Ops Report
Pipeline Impact Analysis

Two Methods, Four Outcomes

We have two independent ways to find leads: embedding similarity (content/structure match to Pete's top leads) and GoDaddy HTML scan (text match on GoDaddy signature in page structure). Here's how 156,142 scanned sites split across both.

Feb 2026 Pete List v2 · 23-site centroid 156,142 HTML files scanned in R2 (Jan + Feb)
156,142 scanned sites 15,745 similar (10.1%) 140,397 not similar ~3,166 both methods 12,579 embedding only 14,254 HTML scan only 126,143 neither OVERLAP GD + similar EMBEDDING ONLY not findable by HTML scan HTML SCAN ONLY GD but not similar enough NEITHER not actionable SCANNED SITES EMBEDDING SIMILAR? GODADDY?
Embedding similarity
GoDaddy HTML scan
Found by both
Neither
GoDaddy count is from a full scan of all 156,142 HTML files in R2 (via Cloudflare Worker). 17,420 files contain GoDaddy signatures (11.2%). The overlap (~3,166) is estimated from tier-level GoDaddy rates measured by sampling.
What each method found
Embedding only 12,579
Only findable by embedding
Non-GoDaddy sites with genuine content similarity to Pete's top leads. Built on Wix, WordPress, Squarespace, custom platforms. No HTML signature to scan for — only the embedding catches them.
HTML scan only 14,254
Only findable by HTML scan
GoDaddy sites that didn't pass similarity filters (too few words, no mobile, low similarity score, etc.). The embedding missed them, but a GoDaddy signature scan on the raw HTML surfaces them.
Both methods ~3,166
Found by both embedding + HTML scan
GoDaddy sites that also ranked as similar. Mostly Elite/Strong tier (~2,470) where template matching inflated scores, plus ~696 in lower tiers. Would be found by either method alone.
Neither 126,143
Not actionable via either method
Not GoDaddy and not similar enough to Pete's leads. May still be valid businesses but don't match the current ICP signal.
Actionable leads by method
12,579 embedding only
3.2k
14,254 HTML scan only
Embedding only Both HTML scan only
Total unique actionable sites 12,579 + 3,166 + 14,254 = deduplicated across both methods
~30k
The two methods are mostly complementary. Only ~3.2k sites overlap. The embedding uniquely surfaces 12,579 non-GoDaddy content matches that no HTML scan would find. The HTML scan uniquely surfaces 14,254 GoDaddy sites that fell below similarity thresholds. Together: ~30k actionable leads from 156k scanned.