Back to Blog
ORBIT

How to Automate Keyword Research with AI: From Cluster to Published Article in 24 Hours

BP Corp Engineering
9 min read

Every morning at 8:00 AM, ORBIT pulls keyword data from Google Search Console for 13 brand sites. By 8:47 AM, it has identified 47 article opportunities, prioritized them by ranking potential, and queued the top 10 for generation.

No Ahrefs. No SEMrush. No manual keyword research.

This is how we automated the entire keyword research workflow using AI and GSC data—and how you can replicate it even if you're running a single site.

Why Traditional Keyword Research Is Obsolete

Traditional keyword research follows this workflow:

  1. Brainstorm seed keywords
  2. Plug them into Ahrefs/SEMrush/Ubersuggest
  3. Export thousands of keyword suggestions
  4. Filter by search volume and difficulty
  5. Manually cluster related keywords
  6. Create content briefs
  7. Assign to writers

This takes 4-8 hours per vertical per month. At 9 verticals per brand × 13 brands, we'd need someone doing keyword research full-time.

The breakthrough realization: You already have the perfect keyword research tool—Google Search Console.

GSC tells you:

  • Keywords where your site appears in search results (impressions)
  • Your current ranking position for each keyword
  • Which keywords get clicks vs. just impressions
  • How these metrics change over time

This data is infinitely more valuable than third-party keyword tools because it reflects your site's actual search presence, not theoretical search volume.

The AI-Powered Keyword Research System

Here's the architecture we built into ORBIT for automated keyword research:

Step 1: GSC Data Extraction

ORBIT connects to Google Search Console API and pulls 90 days of performance data for each brand site. The query filters for:

  • Impressions > 10: Filters noise, focuses on keywords with real search volume
  • Position 5-50: Excludes already-ranking content (top 4) and impossibly competitive terms (beyond page 5)
  • Query type: Filters out navigational queries (brand name searches)

Example raw data from PapaPrevoit.com (French insurance lead gen):

{
  "query": "assurance emprunteur diabète",
  "impressions": 847,
  "clicks": 12,
  "position": 18.3,
  "ctr": 1.4
}

This keyword has 847 monthly impressions, we rank position 18, but only get 12 clicks. Clear opportunity: An article targeting this keyword could capture 60-90 clicks monthly if we reach position 3-5.

ORBIT pulls 2,000-5,000 keyword rows per site depending on domain authority and content volume.

Step 2: Semantic Clustering

Raw keyword lists are useless. "Assurance emprunteur diabète" and "prêt immobilier diabétique" are the same search intent, just different phrasings.

ORBIT clusters keywords using sentence embeddings (OpenAI's text-embedding-3-large model). The algorithm:

  1. Generate embedding vectors for each keyword (1,536 dimensions)
  2. Calculate cosine similarity between all keyword pairs
  3. Apply hierarchical clustering (threshold: 0.78 similarity)
  4. Group keywords into topical clusters

Example cluster output:

Cluster ID: INS_DIABETES_001
Theme: "Insurance for diabetics"
Keywords:
  - assurance emprunteur diabète (847 imp, pos 18.3)
  - prêt immobilier diabétique (412 imp, pos 22.1)
  - assurance crédit diabète type 2 (286 imp, pos 15.7)
  - crédit immobilier avec diabète (523 imp, pos 19.8)
Total monthly impressions: 2,068
Average position: 18.9

This cluster represents a single content opportunity, not four separate articles. One comprehensive article on "Assurance Emprunteur pour Diabétiques" targets all four keywords.

ORBIT generates 30-70 clusters per site per analysis run.

Step 3: Priority Scoring

Not all keyword clusters are equal. ORBIT scores each cluster using three factors:

Impression Volume (40% weight): Higher impression count = more traffic potential

Position Range (35% weight): Position 11-20 keywords are easier wins than position 30-50

Commercial Intent (25% weight): Buyer-focused keywords score higher than informational queries

The commercial intent classifier uses a fine-tuned GPT-4o model trained on 1,200 labeled keywords. It categorizes queries into:

  • Transactional (highest): "meilleure assurance emprunteur", "comparatif assurance vie"
  • Commercial investigation: "avis assurance X", "assurance Y ou Z"
  • Informational (lowest): "comment fonctionne assurance", "qu'est-ce que assurance"

Example scoring for PapaPrevoit clusters (out of 100):

1. Cluster: "Assurance emprunteur diabète" — Score: 87.3
   (847 imp × 0.40) + (position 18 × 0.35) + (commercial intent 8.2/10 × 0.25)

2. Cluster: "Fiscalité assurance vie après 70 ans" — Score: 84.1
   (1,420 imp × 0.40) + (position 14 × 0.35) + (commercial intent 7.1/10 × 0.25)

3. Cluster: "Assurance vie en ligne" — Score: 79.6
   (2,180 imp × 0.40) + (position 21 × 0.35) + (commercial intent 8.9/10 × 0.25)

The top 10 clusters get queued for article generation. The rest are saved for future publishing as the content calendar fills.

Step 4: SERP Analysis

Before generating content, ORBIT analyzes the top 10 Google results for the primary keyword in each cluster. This step is critical—it tells us what Google expects to see.

The SERP scraper extracts:

  • Content structure: H2/H3 headings, article length, section count
  • Featured snippet: If present, what format (paragraph, list, table)
  • Related questions: "People Also Ask" boxes
  • SERP features: Video carousel, image pack, local results
  • Top-ranking domains: Authority level of competitors

Example SERP analysis for "assurance emprunteur diabète":

SERP Analysis Results:
- Top 10 avg article length: 2,340 words
- Featured snippet: Definition paragraph (89 words)
- Common H2 sections:
  1. "Diabète et assurance emprunteur: réglementation"
  2. "Surprimes pour diabétiques"
  3. "Alternatives et solutions"
  4. "Démarches pratiques"
- People Also Ask:
  - "Peut-on obtenir un prêt immobilier avec du diabète?"
  - "Quelle surprime pour un diabétique?"
  - "Comment ne pas déclarer son diabète?"
- SERP features: None (pure organic results)
- Top 3 domains: meilleurtaux.com (DA 72), assurland.com (DA 68), magnolia.fr (DA 54)

This data feeds directly into the outline generation prompt.

Step 5: Outline Generation

ORBIT generates article outlines matching SERP expectations. The prompt includes:

  • Target keyword cluster
  • SERP structure analysis
  • Brand voice guidelines
  • Vertical-specific requirements

Example generated outline for "Assurance Emprunteur pour Diabétiques: Guide Complet 2026":

H1: Assurance Emprunteur pour Diabétiques: Guide Complet 2026

Introduction (180-220 words)
- Hook: Statistics on diabetes prevalence in France
- Problem statement: Insurance challenges for diabetics
- Article promise: How to get coverage despite diabetes

H2: Diabète et Assurance Emprunteur: Ce Que Dit La Loi
  H3: Convention AERAS
  H3: Droit à l'Oubli
  H3: Évolutions Légales 2026

H2: Pourquoi les Assureurs Appliquent des Surprimes
  H3: Évaluation du Risque
  H3: Diabète Type 1 vs Type 2
  H3: Impact de l'Équilibre Glycémique

H2: Montant des Surprimes pour Diabétiques
  H3: Fourchettes de Surprime par Profil
  H3: Exemples de Calculs
  H3: Comparaison entre Assureurs

H2: Comment Obtenir la Meilleure Assurance
  H3: Préparer Son Dossier Médical
  H3: Négocier avec les Assureurs
  H3: Utiliser la Délégation d'Assurance

H2: Alternatives à l'Assurance Traditionnelle
  H3: Garanties Partielles
  H3: Assurance au Premier Euro
  H3: Nantissement et Hypothèque

H2: Démarches Pratiques
  H3: Documents à Fournir
  H3: Questionnaire de Santé
  H3: Processus de Souscription

H2: Questions Fréquentes
  - Peut-on cacher son diabète à l'assureur?
  - Le diabète gestationnel affecte-t-il l'assurance?
  - Peut-on changer d'assurance après 1 an?

Conclusion (120-150 words)
- Recap key strategies
- CTA: Free insurance comparison

Estimated length: 2,400-2,800 words
Target keywords: 4 primary, 8 long-tail variations
Internal links: 3-4 related articles
Schema: Article + FAQPage

This outline is 10x better than what a human researcher would create in 2 hours, and ORBIT generates it in 90 seconds.

Step 6: Content Generation

With the outline complete, ORBIT generates the full article section by section. We covered the generation process in detail in our AI SEO Content Generation guide, but the key points:

  • Each section generated independently (300-500 words)
  • Claude Opus 4.6 for French content (best grammatical accuracy)
  • Transition prompts link sections smoothly
  • Fact-checking prompts reduce hallucinations
  • Brand voice injected via system prompt

Average generation time: 1.2 hours per article (fully automated).

Step 7: Publishing and Monitoring

Once generated, ORBIT:

  1. Optimizes meta tags: Title (55-60 chars), description (150-155 chars), OG tags
  2. Adds schema markup: Article schema + FAQPage for FAQ sections
  3. Internal linking: Identifies 3-4 related articles using semantic search
  4. Publishes to CMS: Next.js API for our brand sites
  5. Notifies search engines: Sitemap update + IndexNow ping
  6. Tracks indexation: Checks GSC daily for indexation status

Within 7 days, ORBIT reports:

  • Indexed: Yes/No
  • Initial impressions (if ranking)
  • Position for target keyword

If an article doesn't index within 14 days, it gets flagged for review.

Real Workflow Example: PapaPrevoit Morning Run

Let's walk through an actual ORBIT execution from February 8, 2026:

08:00:00 - GSC API call initiated for papaprevoit.com 08:02:14 - Data retrieved: 3,847 keywords (90-day period, 10+ impressions, position 5-50) 08:05:38 - Clustering complete: 43 keyword clusters identified 08:07:21 - Priority scoring complete: Top 10 clusters selected

Top cluster:

Theme: "Assurance obsèques comparatif"
Keywords: assurance obsèques comparatif (1,240 imp, pos 12.4),
          meilleure assurance obsèques (890 imp, pos 15.1),
          comparateur assurance décès (670 imp, pos 18.7)
Priority score: 91.2
Est. monthly traffic if ranking pos 3-5: 240-310 clicks

08:09:43 - SERP analysis complete (top 10 results scraped) 08:12:17 - Outline generated: 7 H2 sections, 2,650-word target length 08:14:05 - Article generation started

[Generation runs autonomously]

09:47:23 - Article complete: "Assurance Obsèques: Comparatif 2026 des Meilleurs Contrats" (2,730 words) 09:49:56 - Meta optimization complete, schema added, internal links inserted 09:51:14 - Published to papaprevoit.com/assurance-obseques-comparatif-2026 09:51:30 - Sitemap updated, Google notified

Total time from GSC pull to published article: 1 hour 51 minutes.

Zero human involvement.

7-day results:

  • Indexed: Day 3
  • Initial position: 19
  • Impressions (days 3-7): 247
  • Clicks: 8
  • 30-day projection: Position 8-12, 80-120 monthly clicks

Cost Breakdown: $2.34 per Complete Research-to-Publish Cycle

Traditional keyword research costs:

  • SEO tool subscription: $100-200/month (Ahrefs, SEMrush)
  • Researcher time: 4 hours/month/vertical at $50/hr = $200
  • Content brief creation: 2 hours at $50/hr = $100
  • Total per vertical/month: $400-500

At 9 verticals × 13 brands = 117 content streams, that's $46,800-58,500 monthly just for research and briefs.

ORBIT automated research costs:

  • GSC API: Free (within Google Cloud free tier for our volume)
  • OpenAI embedding API: $0.13 per 1M tokens (est. $0.08/analysis run)
  • GPT-4o intent classification: $0.24/analysis run
  • SERP scraping: $0.09/analysis run (ScraperAPI)
  • Infrastructure: $0.42/analysis run (amortized)
  • Total per analysis run: $0.83

Plus article generation: $1.51 per article (Claude API costs)

Complete research-to-published article: $2.34

At 100 articles/month: $234 vs. $46,800-58,500 with human researchers.

That's a 99.5% reduction in research costs.

Why GSC Data Beats Keyword Tools

Keyword tools have three fatal flaws:

1. They show theoretical search volume, not real opportunity

Ahrefs might say "best term life insurance" gets 12,000 monthly searches. But if your domain authority is 35, you'll never rank for it. GSC shows keywords where you actually have search presence—winnable targets.

2. They miss long-tail variations

Keyword tools show popular head terms. GSC reveals the exact long-tail phrasings people use to find your site—often with zero competition.

Example from GondosApa (Hungarian home renovation):

  • Ahrefs: "homlokzat szigetelés" (1,200 searches/mo, KD 45)
  • GSC reality: "homlokzat szigetelés ár budapest 2026" (40 searches/mo, position 8, zero competition)

We ranked #1 for the GSC variation within 11 days. The Ahrefs term remains unwinnable.

3. They cost $100-200/month for data you already have

Why pay for third-party search volume estimates when Google tells you exactly how your site performs?

Integration with Programmatic SEO

Automated keyword research becomes exponentially more powerful when combined with programmatic templates.

ORBIT can generate location-based and comparison-based templates from keyword clusters:

Location template example:

Cluster detected: "panneau solaire [CITY]" appears in GSC for 87 French cities with 50+ combined impressions

ORBIT generates:

  • Template: "Panneaux Solaires [CITY]: Prix, Aides, Installateurs 2026"
  • Variable injection: City name, regional subsidy data, local installer count
  • Output: 87 unique articles published over 30 days

Each article targets city-specific long-tail keywords GSC identified.

Comparison template example:

Cluster detected: "assurance vie [BANK_A] vs [BANK_B]" variations

ORBIT generates:

  • Template: "Assurance Vie [BANK_A] vs [BANK_B]: Comparatif Détaillé 2026"
  • Variable injection: Product features, rates, fees per institution
  • Output: 45 comparison articles (10 major banks × combinations)

For the full programmatic approach, see Programmatic SEO with AI.

Common Mistakes to Avoid

We made plenty of errors building this system. Learn from them:

Mistake 1: Trusting Raw GSC Data Without Filtering

Early versions pulled all keywords with 1+ impression. This created noise—thousands of irrelevant, misspelled, or ultra-low-volume queries.

Fix: Set minimum impression thresholds (10+ for small sites, 50+ for established sites) and filter out navigational queries (brand name searches).

Mistake 2: Over-Clustering

Setting similarity thresholds too high merges unrelated keywords. We once clustered "assurance auto jeune conducteur" (car insurance for young drivers) with "assurance auto pas cher" (cheap car insurance)—different intents, different articles needed.

Fix: Test clustering thresholds on sample data. We settled on 0.78 cosine similarity after testing 0.70-0.85 range.

Mistake 3: Ignoring SERP Intent Mismatches

Some keywords trigger non-organic SERP features (local packs, shopping results) that make organic ranking pointless.

Example: "assurance auto en ligne" shows 4 Google Ads, local pack, and price comparison widget above organic results. Organic position 1 gets 8% CTR instead of typical 30%.

Fix: ORBIT's SERP analyzer flags keywords with heavy ad presence or non-organic features. We deprioritize these even if impression volume is high.

Mistake 4: Generating Content for Every Cluster Immediately

Publishing 47 articles in one day triggers Google's "site velocity" alarms—rapid content increases can signal spam.

Fix: ORBIT schedules articles across 14-30 days (1-3 articles/day maximum) to maintain natural publishing velocity.

Mistake 5: Not Monitoring Cannibalization

Creating multiple articles targeting similar keywords causes cannibalization—your own content competes against itself, diluting ranking power.

Fix: Before generating content, ORBIT checks existing published articles for keyword overlap. If overlap >40%, it skips the new article or merges content into an update.

Advanced: Multi-Brand Keyword Orchestration

Running 13 brands introduces a unique challenge: Avoiding duplicate effort when keywords appear across multiple sites.

Example: "meilleure assurance vie 2026" appears in GSC for both PapaPrevoit and MamanPrevoit. Should both sites publish articles?

ORBIT's multi-brand logic:

  1. Detect cross-brand keyword overlap (same keyword in multiple GSC accounts)
  2. Evaluate existing content (does one brand already have a strong article?)
  3. Calculate domain authority differential (which site is more likely to rank?)
  4. Prioritize the stronger candidate, skip the weaker

In practice: PapaPrevoit has higher DA than MamanPrevoit. When overlapping keywords appear, PapaPrevoit gets priority unless MamanPrevoit already ranks in top 10.

This prevents wasted generation and internal competition in SERPs.

What About Keyword Research for New Sites?

GSC-based research only works for sites with existing search presence. What if you're launching a new brand?

ORBIT falls back to vertical-based seed templates:

Vertical: Insurance Seed topics:

  • Assurance vie (life insurance)
  • Assurance auto (car insurance)
  • Assurance habitation (home insurance)
  • Assurance emprunteur (mortgage insurance)

For each seed topic, ORBIT:

  1. Generates 10-15 subtopic variations (e.g., "assurance vie fiscalité", "assurance vie succession")
  2. Analyzes competitor content (SERP analysis on each subtopic)
  3. Publishes foundational articles covering the vertical

After 30-60 days of publishing, the site gains GSC data, and ORBIT switches to GSC-based research.

We used this approach to launch GondosAnya (Hungarian insurance lead gen) in November 2025. Published 40 foundational articles using seed templates, then switched to GSC-based research in January 2026 once data accumulated.

Monitoring and Iteration

Automated keyword research isn't "set and forget." ORBIT includes monitoring to refine the system over time:

Weekly review: Which articles ranked top 10 within 30 days? Extract common patterns (keyword types, article structures, content depth).

Monthly calibration: Adjust priority scoring weights based on actual ranking success. If transactional keywords consistently outperform informational ones, increase commercial intent weight.

Quarterly audit: Review unpublished clusters (those scored below top 10). Have they gained impressions? Rescore and consider publishing.

Example calibration from Q4 2025:

Original priority weights: Impressions 50%, Position 30%, Intent 20% Performance analysis: Articles targeting position 11-15 keywords ranked 2.3x faster than position 20-25 Adjusted weights: Impressions 40%, Position 35%, Intent 25%

This increased ranking velocity by 18% in Q1 2026.

ROI: What You Actually Get

Implementing automated keyword research delivered these measurable outcomes for BP Corp:

Time savings: 180 hours/month (researcher time eliminated) Cost savings: $46,000+/month (tool subscriptions + researcher salaries) Velocity increase: 47 article opportunities identified daily vs. 8-12 with manual research Ranking improvement: 36.8% of AI-generated articles rank top 10 for target keyword within 90 days Content-market fit: GSC-based topics have 2.7x higher engagement (time on page, scroll depth) than manually researched topics

The last point is critical: GSC-based research inherently targets what your audience searches for, not what you think they search for. This produces better content-market fit.

Next Steps: Build Your Own System

You don't need ORBIT to implement automated keyword research. Here's the minimum viable version:

Step 1: Connect to GSC API (use Google's Python client library) Step 2: Pull performance data (90 days, 10+ impressions, position 5-50) Step 3: Cluster keywords manually or with simple embedding similarity Step 4: Score by impressions × (1 / position rank) Step 5: Feed top clusters to ChatGPT/Claude for outline generation

This MVP version costs $0/month (GSC is free, ChatGPT can handle outlines) and takes 2-4 weeks to build if you're technical.

Or use ORBIT, which includes GSC integration, clustering, SERP analysis, generation, and publishing in one workflow.

Try ORBIT's Automated Keyword Research

GENESIS (including ORBIT) opens to select partners in Q2 2026.

Ideal for:

  • Lead gen brands managing 3+ sites
  • Affiliate marketers publishing 50+ articles/month
  • SEO agencies managing client content

What you get:

  • GSC-powered keyword research (no third-party tools needed)
  • Automated clustering and prioritization
  • SERP analysis and outline generation
  • Multi-language support (FR, EN, HU, ES, DE)

Request ORBIT access →


Related Reading:

Automate Your Keyword Research →

Generate SEO content that ranks

Related Articles

Automate Keyword Research with AI in 24 Hours | GENESIS