Thinking Like a Software Engineer: A Software-Driven Data Modelling Approach to Customer 360 for Sports Organizations

Software-driven data modelling for sports organizations

TL;DR – Software-Driven Data Modelling for Sports Customer 360

  • Most sports organizations operate multiple teams with overlapping, non-calendar-based seasons, making unified modelling a challenge.
  • Traditional column-per-season approaches lead to schema bloat, loss of history, and manual rework every year.
  • We applied software engineering principles (abstraction, data contracts, structure) and used Snowflake VARIANT fields to store nested fan engagement: { team → season → events }.
  • This structure powers:
    • Compact and easy to query Customer 360 models
    • ML-ready features for retention and lead scoring
    • Business-ready derived attributes for targeting
  • Bonus: It simplifies multi-brand modelling, reduces maintenance, and preserves full fan history across seasons.

From Columns to Chaos

Working with sports organizations has taught us that no two teams run on the same calendar. Hockey seasons span October to April. Soccer kicks off in March and wraps up in October. Basketball might start in November and end in June. When you're managing customer data across multiple brands, leagues, and engagement windows, the concept of a "season" doesn't map neatly to a calendar year.

In previous roles, we experienced firsthand how these overlapping timelines made it difficult to produce a unified Customer 360 view. Every January, we found ourselves re-aggregating data, rebuilding logic, and struggling to answer basic questions like: who attended games last season? The problem was we mapped seasons to calendar years, and the moment January 1st arrived, our reports would prematurely flip to a new "season," even though most leagues were still mid-season. We were losing active context and misclassifying fan behaviour.

This time, we wanted to approach the problem differently. We put on our software developer hats and asked: how would we solve this if this were a versioned API or evolving data contract?

In our initial Customer 360 modelling approach, we did what most data teams do. We created a clean, flattened table with one row per fan and a set of easy-to-query columns like this:

attended_{league_championship}_2022,
attended_{brand}_2023,
attended_{brand}_2024,
...

It was simple, self-explanatory, and made segmentation easy for business users.

But then came more seasons. More teams. More engagement dimensions. Suddenly, what started as a tidy schema turned into a maintenance nightmare. Every new campaign or season meant schema changes, backfills, and more logic sprawl across dbt models. We needed a better approach, one that could scale across brands, seasons, and years without sacrificing data integrity while making it easy to query.

We embraced a software-driven data modelling mindset, thinking in terms of abstraction, contracts, and structure, rather than fixed schemas.

From Table Design to Data Contracts

In software, developers often encapsulate complexity behind well-defined interfaces. Rather than expose every field directly, they define structured payloads, APIs, or objects that hold related information together.

We realized we could apply the same principles to our Customer 360 model. Instead of adding a column for every team/season combination, we would store a fan’s entire engagement history in a single VARIANT column called ticket_history. This semi-structured object holds a nested map of brands and seasons, like this:

{
  "BRAND_A": {
    "2023": ["GAME1", "GAME2"],
    "2024": ["GAME4", "GAME8"],
    "2025": ["GAME2"]
  },
  "BRAND_B": {
    "2023": ["GAME1"]
  },
  "LEAGUE_EVENTS": {
    "2025": ["CHAMPIONSHIP"]
  }
}

In this format, the data is compact, complete, and history-preserving. No schema changes are needed when a new team or season is introduced.

Why This Matters for Sports Organizations

For multi-brand sports organizations, this approach solves a very real problem: seasons don't align neatly with calendar years. Hockey may run from October to April, soccer from March to October, and basketball from November to June. Hardcoding logic that flips over on January 1st leads to confusion, incorrect aggregations, and misaligned reporting. By treating each brand and season independently inside a nested structure, we preserve fidelity to how the sports themselves actually operate.

Instead of trying to force a shared temporal structure across different leagues, we let each team define its own season keys. This approach also eliminates the need to restructure models annually, or to anticipate every combination of brand and season ahead of time.

Benefits of the Variant-Based Approach

  • Scalability: New seasons or teams don’t require schema changes.
  • Maintainability: Logic for computing engagement is centralized and reusable.
  • Easy to query: With Snowflake’s native JSON functions, you can still target fans by team and season.
  • Integrity: Since the structure mirrors the underlying fact table, there’s less risk of divergence between features.
  • Seasonal Flexibility: Avoids relying on calendar-based assumptions that don't apply in sports.

Building Business-Ready Layers

Of course, while this approach is great for engineers, business users don’t want to write something like this in a UI:

ticket_history:"BRAND_A":"2023" IS NOT NULL

That’s why we paired our base model with a derived layer: dbt models or views that extract common use cases from the VARIANT field.

SELECT
  fan_id,
  ticket_history,
  
  -- BRAND_A examples
  ticket_history:"BRAND_A":"2023" IS NOT NULL AS attended_brand_a_2023,
  ARRAY_SIZE(ticket_history:"BRAND_A":"2023") AS event_count_brand_a_2023,
  IFF(ticket_history:"BRAND_A":"2022" IS NOT NULL, TRUE, FALSE) AS attended_brand_a_2022,
 
 -- BRAND_B examples
  ticket_history:"BRAND_B":"2023" IS NOT NULL AS attended_brand_b_2023,
  ARRAY_SIZE(ticket_history:"BRAND_B":"2023") AS event_count_brand_b_2023
FROM fan_ticket_history;

These derived flags give marketing teams exactly what they need: easy segmentation, consistent naming, and precomputed logic, without duplicating the underlying data logic.

A Platform Mindset for Customer 360

A particularly strong use case has been retention modelling for season ticket holders. Using this nested ticket_history structure, we've built machine learning-ready features that track a fan's year-over-year activity across brands without needing to maintain separate tables or join-heavy models. For example we will store all events attended, a single record like this:

{
  "BRAND_A": {
    "2022": ["GAME1", "GAME2", "GAME6"],
    "2023": ["GAME2", "GAME5"]
  },
  "BRAND_B": {
    "2023": ["GAME7", "GAME8", "GAME9"]
  }
}

...allows us to calculate features such as:

  • Total events attended in the current or prior season
  • Drop-off in engagement from one season to the next
  • Loyalty across teams (multi-brand activity)
  • First season attended per brand

These features are then fed into our feature store for modelling churn risk, renewal likelihood, and cross-sell potential.

We’ve extended this to build a full feature store pipeline using our ticket_history and related fan-level data. For example, we generate season-level features that can power retention and renewal models.

These are then stored in a feature store with a schema like:

{
  "fan_id": "fan_001",
  "seasonal_features": {
    "BRAND_A": {
      "2022": {
        "attendance_rate": 0.78,
        "ticket_utilization": 0.82,
        "avg_scan_in_time_mins": 24
      },
      "2023": {
        "attendance_rate": 0.85,
        "ticket_utilization": 0.92,
        "avg_scan_in_time_mins": 27
      }
    },
    "BRAND_B": {
      "2023": {
        "attendance_rate": 0.66,
        "ticket_utilization": 0.70,
        "avg_scan_in_time_mins": 33
      }
    }
  }
}

These features are aggregated per fan per brand per season and versioned in a feature store. They can be used by downstream models to understand behavioural changes, predict likelihood of renewal, and trigger proactive outreach.

We're also finding benefits of this software-driven data modelling approach beyond season alignment. When used thoughtfully, this structure improves not just fan targeting but multi-brand analytics, retention modelling, campaign impact measurement and reporting. Because all engagement history is consolidated into one flexible, queryable structure, we're no longer limited to a single team's view or a rigid year-by-year slice. We've applied this in our work with multi-team organizations and found it dramatically reduces the number of derived models, joins, and edge-case logic we used to carry season to season.

Conclusion: Think Like a Platform Engineer

This hybrid approach, a nested base feature with derived targeting columns, frees us from schema bloat while preserving usability. It allows us to treat our Customer 360 table not just as a report, but as a product. We maintain one source of truth and can add new derived dimensions as needed, without rewriting the foundation.

So the next time your team debates whether to add something like this:

attended_brand_a_2025

Consider reaching for software design patterns instead. Your data model (and your future self) will thank you.

The shift toward software-driven data modelling isn’t just about structure. It’s structure, scalability, and usability. Get all three right, and your Customer 360 becomes a living product that grows with your organization.

Need help designing or optimizing your Customer 360 for sports or multi-brand environments? We’re here to help. Contact us to learn more.

More blog posts