Metrics, Target, Impact & Diagnostics

Better vocabulary for talking about quantitative data

Jul 27, 2020

This is adopted from a piece I wrote for my teams at Dropbox. As context, my teams own sharing which is a core feature. If you’re not working on core product you probably don’t own a large domain with a subsection under active development, so you can skip those sections. Also at Dropbox a collection of teams rolls up into an ‘Area.’

Background

One of the issues we’ve run into when talking about metrics is that everyone keeps using the word “metrics” for any piece of quantitative data and the imprecise language has caused confusion and churn. This doc outlines the new vocabulary we will use as we become more data driven. We basically want to stop using the word ‘metric’ every time we see a number somewhere.

As a core product team, we own a large domain and only parts of which are under active development. We track different types of metrics depending on if it is under active development or just on maintenance

Active Development

Operational Metrics

This is where we will spend the majority of our time. It is what your team is tracking on a project by project basis. Each project should have set of metrics you’re tracking along with goals for those metrics. Sometimes the hardest part is picking which metric you should measure (and if you can measure it). It’s common for projects to have multiple metrics, but you should pick one as the main metrics you’re driving. The rest are metrics should be secondary or guard rail metrics. Guard rail metrics are metrics used to make sure you didn’t make improvements at the expense of breaking something somewhere else. You don’t want to see degradation in guardrail metrics or you want to set a specific range where degradation is acceptable.

Customer Benefit

Ideally the metrics you choose here tightly correlates with some sort of ‘customer benefit.’ It means that you’re trying to tie in how your feature/improvement/project benefits your customer. The idealized version of this for a business productivity tool might time saved or dollars earned. This can be very difficult to measure, so many projects might find something else to measure like files saved because users don’t want to lose their files. This helps you get away from a usage metric like people who used this feature or a metric that benefits the company but doesn’t benefit the user in any way. Sometimes you’re in a situation where you can’t track customer benefit, but those are hopefully the exception.

30 day target

We will have at least your primary operational metric come with a target at 30 days after GA launch. So if your metric is to increase the number of Shares sent with a guard rail metric of share completion rate, you should have a target that is something like 8% more shares sent 30 days after GA launch or 5,000 more shares sent 30 days after GA launch. Whether you express this as a % improvement or a raw number is situational or sometimes stylistic. In general with very small % improvements (like smaller than 3%) I tend towards expressing the target in numbers.

Some people call these “post launch metrics” but we will try to avoid using the word metric every time we measure data.

Standardizing at 30 days

We’re standardizing at 30 days just so we can have consistency across all of our metrics within the Sharing Area. It makes assessment of impact a bit easier at a high level and to have comparisons across projects (both comparing historical projects and projects across teams).

So you can have a 30 day target, but operationally decide you only need 14 days to make a go/no-go decision. Definitely make the call at 14 days! Just call this out.

Soft targets and hard target

There are times where we are missing a lot of data and we have a hard time setting good targets. This can because we don’t have historical data, we have corrupted data or we’re exploring a new area and there simply isn’t data available. In those cases we should still have a 30 day target but we can caveat that it is a soft target where we will have a lot of learning and adjusting along the way. Lack of data shouldn’t discourage us from having targets, we just need to properly set context when we present and discuss these targets.

Hard targets are for areas where we have lots of data. Often times we’ve already run similar experiments that we can use for guidance. These we will shorthand into just ‘targets’

Proportional responses

We should make sure we have proportional responses to our success (or lack of success) in hitting our targets. As much as possible we should have plans for what if we hit or miss our targets. If we miss by a lot we should be prepared to have a large response. This can be making bigger changes in a follow up experiment or being willing to question more assumptions going in. We should also adjust how we set targets going forward.

This means if you have soft targets with lower confidence, you should be ready to potentially have a large response. If you have high confidence in an area it’s ok to spend less energy on such planning.

Company Impact

So most companies have company metrics or business metrics that they’re trying to improve. The gold standard for this is often $dollars or a metric that directly relates to $dollars like ad impressions, subscriber retention or customer support costs. You should know these off the top of your head. We might call these company impact, business impact, annual impact or most often we’ll shorten it to just “impact.”

I’ve seen these sometimes called “North Star Metrics” but again, we’re trying not to use the word metric every time we see some data.

Laddering

Your 30 day targets should ladder into company impact. Aka your projects should help contribute towards company goals. If you tied your operational metrics to customer benefit you’ll have this really nice straight line from helping customers to helping company goals. This has the side benefit of making sure you’re not making money by being a total jerkwad to your users.

Sometimes laddering is super straightforward. If your company goal is to increase virality, your metric of around increasing shares would drive company impact by definition.

Other times it takes a bit more work to ladder up. But you can do something like files saved leads to reduced customer support tickets which saves $$$.

There are times where laddering into something like “90 day retention” is extremely difficult. Make sure there’s buy in for your team to ladder into a metric that we’ve shown to positively influence company impact.

Forecasting impact

We’ll standardize around looking at annual impact at the company level. Going back to our files saved example, it would look something like

Operational Metric: Files saved
30 day Target: 500 files saved
Company Impact: $1M saved per year

In general forecasting impact will be a little difficult and take some assumptions. It isn’t uncommon to take a few “hops”as you ladder from your operational metric up to company impact. Just make sure to show your math and assumptions.

Monitoring / Diagnostics

As a core product team, we own a domain that is broader than what we will have under active development. If you’re outside of core product, your domain may not to be much bigger than what you’re actively developing. If you’re in growth, you often don’t have a domain and you’re developing on someone else’s domain. This means we’ll need to monitor a broader set of metrics than simply what is happening at the project level. This ends up looking like dashboards that we monitor on roughly a biweekly or monthly basis. We should be spending the vast majority of our effort around Operational Metrics and Targets and a minority of our time around monitoring.

Domain Diagnostics

These we expect each team to set. This should be a set of metrics that helps show the health of your domain. In the past we’ve called them Hero Metrics. In this iteration we want diagnostics to:

Help us catch and diagnose the unexpected outages that may occur in our domain. Think SEVs, P0 bugs or their equivalent.
Measure cumulative effort of our teams projects across the year. Or conversely if we’ve struggled, we should be able to pin point what areas have been least effective and formulate a plan to address those gaps.

A common pitfall is to have so many diagnostics to track that it becomes too onerous to check in consistently. Most teams have about 10 per team/domain.

Area Diagnostics

Since Dropbox groups teams in to Areas, Area Diagnostics are a rough measure of our collective effort. We’ll keep our area diagnostics to a very small set of things to track (less than 5). Part of tracking at the Area level means we can deduplicate metrics. So if a users uses multiple sharing features and then upgrades, we don’t count that upgrade more than once.

Let's go far

Discussion about this post