We Value Your Privacy

    We use cookies to enhance your browsing experience, serve personalized content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies. You can customize your preferences or learn more in our Cookie Policy.

    Back to Blog
    Business Intelligence
    Featured
    Part 1 of Series

    From Data to Decisions: Building Intelligent KPI Trees with LLMs

    How Large Language Models combined with data engineering and machine learning can automatically discover, validate, and organize KPIs into dynamic, explainable systems

    Finarb Analytics Consulting
    Creating Impact Through Data & AI
    February 3, 2025
    25 min read
    From Data to Decisions: Building Intelligent KPI Trees with LLMs

    Key Takeaways

    • LLMs can understand business intent and automatically suggest relevant KPIs from natural language goals
    • Combining language understanding with statistical validation creates KPI systems that are both interpretable and data-grounded
    • Schema comprehension via LLMs enables automatic semantic mapping of messy enterprise column names
    • Quantitative validation (ML + statistics) ensures proposed KPIs actually predict business outcomes
    • Dynamic KPI trees can continuously evolve, learning from new data and commenting on metric shifts
    • AI-generated metric catalogs provide governance, auditability, and democratized analytics access
    "Most dashboards today measure everything — except what really matters."

    Organizations track hundreds of metrics, yet struggle to answer: "Which KPIs truly move our business outcomes, and how are they connected?"

    The answer lies in transforming static dashboards into LLM-driven KPI systems that can discover, validate, and continuously refine metrics based on real data.

    In this post, we'll show how Large Language Models (LLMs) — combined with data engineering and machine learning — can automatically:

    • Understand business intent and data context
    • Suggest relevant KPIs (even new ones)
    • Validate those KPIs statistically
    • Organize them into a dynamic, explainable KPI tree that evolves as the business changes

    Why LLMs belong in KPI engineering

    An LLM can:

    • Read natural-language goals ("reduce churn", "improve order fulfilment efficiency")
    • Parse schemas and data dictionaries, understanding what each field represents
    • Generate new metric formulas that match intent
    • Evaluate logical relationships between metrics ("On-Time Delivery Rate drives NPS")

    This human-like reasoning ability — when grounded in data — allows AI systems to act like digital management consultants, building metric systems that mirror how executives think.

    Architecture overview

    flowchart TD
      A[Business Intent (text)] --> B[LLM Goal Interpreter
    maps intent → KPI concepts] B --> C[Schema Analyzer
    LLM reads tables & columns] C --> D[KPI Hypothesis Generator
    LLM suggests candidate KPIs + SQL] D --> E[Quantitative Validator
    tests predictiveness & causality] E --> F[KPI Tree Builder
    builds weighted DAG] F --> G[Registry & Governance
    versioned definitions] G --> H[Continuous Monitor
    drift, decay, re-learning]

    Stage 1 – Understanding intent with an LLM

    We start with a user prompt in plain English:

    Goal: Improve customer satisfaction in our e-commerce business.
    Data: We have order tables, shipment logs, and support tickets.

    Using a small prompt template, the LLM translates this to structured KPI intent.

    from openai import OpenAI
    client = OpenAI()
    
    intent_prompt = """
    You are an analytics strategist. The business goal is: "Improve customer satisfaction".
    Given available data tables: orders, shipments, feedback, support.
    List 5 candidate KPIs that could measure or influence this goal.
    For each, explain: purpose, formula (pseudo-SQL), and data dependencies.
    Return JSON.
    """
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role":"user","content":intent_prompt}]
    )
    
    print(response.choices[0].message.content)

    Example LLM output:

    [
      {"kpi":"On_Time_Delivery_Rate",
       "purpose":"Measures delivery reliability",
       "formula":"AVG(CASE WHEN delivered_date <= promised_date THEN 1 ELSE 0 END)",
       "tables":["shipments","orders"]},
      {"kpi":"Support_Tickets_per_Order",
       "purpose":"Captures friction in post-purchase experience",
       "formula":"COUNT(ticket_id)/COUNT(order_id)",
       "tables":["support","orders"]},
      {"kpi":"Stockout_Rate","purpose":"Supply reliability",
       "formula":"AVG(stockout_flag)"},
      {"kpi":"Average_Cycle_Time",
       "purpose":"Operational speed",
       "formula":"AVG(delivered_date - order_date)"},
      {"kpi":"CSAT",
       "purpose":"Outcome KPI","formula":"AVG(rating)"}
    ]

    👉 The LLM has reasoned from both business intent and schema context, something deterministic code alone cannot do.

    Stage 2 – Schema comprehension via natural-language reasoning

    In a real enterprise, column names are messy: del_date, ord_prom_dt, cust_satis_score. An LLM can read metadata or sample data and infer meaning.

    schema_prompt = """
    You are a data engineer. Given these column names:
    ['ord_dt','del_dt','prom_days','stk_flag','csat_score','sup_tkts']
    Map each to a semantic tag (e.g., order_date, delivered_date, promised_days, stockout_flag, csat, support_tickets)
    Return a JSON map.
    """
    
    schema_map = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role":"user","content":schema_prompt}]
    )
    
    print(schema_map.choices[0].message.content)

    The LLM returns:

    {"ord_dt":"order_date","del_dt":"delivered_date","prom_days":"promised_days","stk_flag":"stockout_flag","csat_score":"csat","sup_tkts":"support_tickets"}

    This automated semantic labeling becomes the foundation for dynamic KPI discovery.

    Stage 3 – KPI hypothesis generation

    With the goal and schema understood, the LLM suggests not only which KPIs to track but how to compute them.

    kpi_gen_prompt = """
    Given the goal "Improve customer satisfaction"
    and these mapped columns: order_date, delivered_date, promised_days, stockout_flag, support_tickets, csat.
    Suggest 5 KPI formulas (in SQL) that can be computed to evaluate or drive this goal.
    Return JSON list of {kpi,sql,lower_is_better}.
    """
    print(client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role":"user","content":kpi_gen_prompt}]
    ).choices[0].message.content)

    Typical LLM-generated output:

    [
     {"kpi":"Order_Cycle_Time_Days","sql":"AVG(julianday(delivered_date)-julianday(order_date))","lower_is_better":true},
     {"kpi":"On_Time_Delivery_Rate","sql":"AVG(CASE WHEN (julianday(delivered_date)-julianday(order_date)) <= promised_days THEN 1 ELSE 0 END)","lower_is_better":false},
     {"kpi":"Stockout_Rate","sql":"AVG(stockout_flag)","lower_is_better":true},
     {"kpi":"Support_Tickets_per_Order","sql":"AVG(support_tickets)","lower_is_better":true},
     {"kpi":"CSAT","sql":"AVG(csat)","lower_is_better":false}
    ]

    These now feed into the quantitative validation layer.

    Stage 4 – Quantitative validation (classical ML)

    Here we ensure the proposed KPIs actually track the target outcome.

    import pandas as pd
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import cross_val_score
    
    # assume df = dataset with the above fields
    y = (df["csat"] >= 4).astype(int)
    features = {
      "On_Time_Delivery_Rate":["on_time"],
      "Stockout_Rate":["stockout_flag"],
      "Support_Tickets_per_Order":["support_tickets"],
      "Order_Cycle_Time_Days":["cycle_time_days"]
    }
    
    def validate_kpi(k):
        X = df[features[k]]
        model = RandomForestClassifier(n_estimators=200, random_state=42)
        auc = cross_val_score(model, X, y, cv=5, scoring="roc_auc").mean()
        return auc
    
    validated = {k:validate_kpi(k) for k in features}
    print(validated)

    We keep KPIs with AUC ≥ 0.6 and no strong multicollinearity. This ensures the language-suggested metrics are grounded in evidence.

    Stage 5 – LLM-assisted interpretation

    Once validated, the LLM helps craft the narrative explaining why these KPIs matter — turning dry numbers into human-readable insight.

    insight_prompt = f"""
    We found these KPI correlations with customer satisfaction:
    {validated}
    Explain in 3 sentences what they mean for an operations manager.
    """
    print(client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role":"user","content":insight_prompt}]
    ).choices[0].message.content)

    Example response:

    "Timely delivery and fewer stockouts have the highest positive impact on customer satisfaction. Support interactions show a strong negative correlation, suggesting friction in post-delivery experience. Focusing on logistics reliability and proactive support will likely yield the greatest NPS improvement."

    This interpretive layer is where LLMs excel — contextualizing quantitative outputs into managerial action.

    Stage 6 – Building the KPI tree

    Now we connect the validated KPIs to the business goal, weighting each by its influence score.

    import networkx as nx, numpy as np
    
    weights = {k:(v-0.5)*2 for k,v in validated.items()}  # simple transform
    G = nx.DiGraph()
    G.add_node("Customer_Satisfaction", kind="goal")
    for k,w in weights.items():
        G.add_node(k, kind="driver")
        G.add_edge(k, "Customer_Satisfaction", weight=round(w,2))
    
    nx.nx_pydot.write_dot(G, "kpi_tree.dot")

    Visualized, it forms:

    Customer_Satisfaction
     ├── On_Time_Delivery_Rate (↑ strong)
     ├── Stockout_Rate (↓ medium)
     ├── Support_Tickets_per_Order (↓ strong)
     └── Order_Cycle_Time_Days (↓ weak)

    Every edge weight is earned through statistical validation, not intuition.

    Stage 7 – Continuous learning and evolution

    Once live, this system can re-run periodically:

    • Fetch new data, recompute metrics
    • Re-validate with the latest relationships
    • Let the LLM comment on shifts ("Stockouts now drive CSAT less; maybe customers adjusted expectations")
    • Update governance registry accordingly

    Example periodic summary prompt:

    summary_prompt = """
    Compare last quarter vs previous quarter KPI correlations with CSAT:
    On_Time_Delivery 0.78→0.62
    Support_Tickets -0.72→-0.45
    Summarize insights and possible causes.
    """
    print(client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role":"user","content":summary_prompt}]
    ).choices[0].message.content)

    LLMs thus enable self-commentary on metrics, bridging analytics and decision-making.

    Stage 8 – Governance through an AI-authored KPI registry

    Each KPI's metadata can be automatically written by the LLM:

    registry_prompt = """
    Draft a registry entry for the KPI "On_Time_Delivery_Rate"
    including definition, formula, owner, refresh cycle, and interpretation.
    """
    print(client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role":"user","content":registry_prompt}]
    ).choices[0].message.content)

    Result:

    On_Time_Delivery_Rate

    Definition: Percentage of orders delivered within promised time.

    Owner: Supply Chain Analytics

    Formula: AVG(CASE WHEN delivered_date <= promised_date THEN 1 ELSE 0 END)

    Refresh: Daily

    Interpretation: Indicates reliability of fulfilment; directly influences customer satisfaction and NPS.

    Such entries form the basis of a governed AI-generated metric catalog, ensuring consistency and auditability.

    End-to-end summary

    Step What the LLM does What classic ML does
    Intent Understanding Parses goal text
    Schema Reasoning Maps column names to business meaning
    KPI Generation Creates candidate formulas
    Validation Tests correlation, causality, drift
    Explanation Generates human-readable insights
    Tree Building Structures relationships semantically Computes edge weights
    Continuous Learning Comments on trend shifts Re-trains metrics periodically

    Together they create a closed loop: Language understanding → Data validation → Narrative insight → Governance.

    Why this matters

    Traditional KPI frameworks are static and human-authored. An LLM-driven system can:

    • Continuously adapt as data and priorities change
    • Propose new metrics from emerging behaviors
    • Translate raw analytics into clear executive guidance
    • Democratize analytics by letting anyone ask in natural language "What drives revenue this quarter?"

    At Finarb Analytics, we are applying this framework across healthcare, BFSI, retail, and manufacturing — using enterprise-grade data governance, privacy-compliant LLM integration, and cloud-native deployment. The result is not just faster insight, but intelligent decision systems that think like your best analysts, at scale.

    Conclusion

    LLMs don't replace analysts — they amplify them. By blending semantic understanding (language) with statistical validation (data), we can finally build KPI systems that learn, explain, and evolve with the organization.

    In short: KPIs no longer have to be defined by humans. They can now be discovered, tested, and narrated by AI — grounded in your own data.

    KPI Engineering
    LLM
    Business Intelligence
    Data Analytics
    AI Systems
    Machine Learning
    Enterprise AI

    Share this article

    1 like