What is a Metrics Store?

What is a Metrics Store?

There's been a lot of buzz in the Data Community about "Metrics Stores". It's also the central product in our platform here at Metlo. But what exactly is a metrics store? In this blog post we'll go over the problem with how organizations currently handle metrics and how a metrics store can help!

The Metrics Problem

Let’s take a look at the current way business metrics are defined in an organization. Once data's ingested into the data warehouse by a tool like Fivetran or Airbyte, it’s transformed by a series of Airflow or dbt jobs into a set of fact and dimension tables that contain granular data about your business.

However, data consumers and business stakeholders don't care about the granular tables stored in the warehouse. They care about the metrics and dimensions that measure your business!

For example, everyone from PMs and Engineers to Marketers and Sales folks at a company like Facebook care about hundreds of metrics like daily_active_users, revenue and user_churn.

They want to segment these metrics on dozens of dimensions like user_age, location, and time. Currently, data consumers have to find the right tables which contain the data needed and then write a SQL query to calculate the metrics they want to measure.

Let's walk through an example for the metric daily_active_users – assume that a user is considered active on a given day if they were online for more than 3 minutes, performed 3 or more actions, and visited at least 2 different screens.

The raw data to compute daily_active_users lives in the user_sessions table, so to compute this metric, Facebook employees would write this SQL query:

SELECT
    DATE_TRUNC('day', session_start) as day,
    count(distinct(user_id)) as daily_active_users
FROM user_sessions
WHERE
    session_seconds > 180 AND
    num_actions > 2 AND
    screens_viewed > 1
GROUP BY 1;

If they wanted to query daily_active_users segmented by age they would JOIN on the users dimension table and group by age:

SELECT
    DATE_TRUNC('day', session_start) as day,
    users.age as age,
    count(distinct(user_id)) as daily_active_users
FROM user_sessions
JOIN users ON
    user_sessions.user_uuid = users.uuid
WHERE
    session_seconds > 180 AND
    num_actions > 2 AND
    screens_viewed > 1
GROUP BY 1, 2;

The data consumer would write this query in the tool of their choice. A Business Analyst would create a computed field in Tableau. The data engineering team creates a model in LookML. ML Engineers and Data Scientists copy these SQL Queries into their Jupyter Notebooks. The query would be saved in some documentation somewhere (or in many cases spread through Slack Messages) to be used as a reference for future users.

However, daily_active_users is just one metric (and age is just one dimension)! The average company has hundreds of metrics that need to be segmented by dozens of different dimensions. Only a very small subset of these queries can be documented.

It's also extremely tedious and error prone to update the definition of a metric. What if instead of a user needing to be online for 3 minutes to be considered a daily active user, they have to be online for 5? This change would require you to update the docs and change the definition of this metric in dozens of downstream tools! Eventually, the definitions in each tool diverge, and different people will have different values for the same metric.

Both of these issues make it challenging to find out the true value of a metric, and when you can't find the source of truth values it paralyzes decision-making across the board. Instead of spending time making decisions, time is spent determining if the data can be trusted and what the correct value actually is.

To solve this problem, you could make views for all your metrics segmented on various dimensions (i.e. daily_active_users_by_age), but there are thousands of combinations metrics and dimensions that a business needs insight into. You can only make so many views! This also creates another problem, instead of trying to find the right metric definition, users spend time trying to find the right view.

Metrics can also be defined in an individual tools modeling layer like LookML, but those definitions are limited to just that tool.

A single, unified modeling layer is needed to define business metrics.

Metlo's Metrics Store

A metrics store is a modeling layer that connects to all of your tools! It sits between your data warehouse and your downstream tools to make sure every data consumer has the same metrics.

In Metlo, you can define your metrics and the different dimensions you can slice and dice by once in our modeling language:

name: user_stats
owner: akshay
datasource: bigquery
table: user_sessions

joins:
  - table: users
    sql: user_sessions.user_id = users.uuid

dimensions:
  time: session_start
  user_age: users.age
  
metrics:
  - name: active_users
    description: The total value transferred in wei
    type: count_distinct
    filters:
      - session_seconds > 180
      - num_actions > 2
      - screens_viewed > 1
        
Metlo metrics in YAML, we also allow you to make metrics in our UI!

Once you define a metric in Metlo, data consumers never have to worry about getting the right value for a metric. Metlo takes the definitions you define in our modeling layer and seamlessly sends them to downstream tools. For tools that have a robust modeling layer, Metlo plugs right in. For example, we generate LookML models in Looker and Data Sources for Tableau.

JDBC

For BI applications that use SQL, Metlo offers a JDBC driver so you can use SQL to query your metrics and segment by any dimension. This allows you to connect Metlo to any BI applications you use. No DSLs to learn, no new workflows to train your teams on. It just instantly works.

SELECT
    time_dim(time, day),
    age,
    active_users
FROM metlo.user_stats
GROUP BY 1, 2

Python SDK

Not everyone knows (or wants to write) SQL! This is especially the case for Data Scientists and ML Engineers where Python is their native language. We made a Python SDK so users can pull metrics in Jupyter Notebooks.

metlo.query(
    'active_users',
    groups=['age'],
    time_dimensions=[
        TimeDimension(
            dimension='time',
            granularity=TimeGranularity.DAY,
        )
    ],
)

Rest API

To really operationalize their data, companies build many tools that need to access metrics programatically. Engineers can power these use cases with Melto's REST API.

{
    "metrics": ["active_users"],
    "groups": ["age"],
    "time_dimensions": [
        {
            "dimension": "time",
            "granularity": "day"
        }
    ]
}

Conclusion

A metrics layer in your data stack gives data producers the ability to define metrics and consumers easy access to clean, accurate datasets. At Metlo, we believe the Metrics Store will be a critical piece in every companies data stack and we're extremely excited to share more as we build!