100% voluntary user adoption, 10x increase in page interactions, 3x investigations launched

2019-2020




~140 people
Lightstep is an observability & tracing platform helping developers maintain system performance across deep distributed systems
Acquired by ServiceNow
Engineers, responsible for maintaining service health, need to know what changed in their system after a deployment - but are faced with using noisy alerting systems and endless dashboards
I’m surrounded by my core user
I conducted 20+ interviews with thought leaders in the company, new engineers at Lightstep, new customers, power users, and our sales and customer success teams to understand the problem and the pain points.
Activities:
Created a document to track the research process and the interviews we would conduct.
Deliverables:
Research plan, timeline, interview scripts, scheduled sessions
Activities:
Scheduled and conducted interviews with 20+ participants over three rounds of sessions as we worked to discover the problem.
Deliverables:
Transcripts, notes, feedback
Activities:
Synthesized research and shared what we learned with the team.
Deliverables:
Research summary, insights, recommendations
Activities:
Created a research repo to import all the research data in one place, group it into common themes, tag it, and categorize it for search later.
Deliverables:
Shared findings with the team.
Activities:
Created a document to track the research process and the interviews we would conduct.
Deliverables:
Research plan, timeline, interview scripts, scheduled sessions
Activities:
Scheduled and conducted interviews with 20+ participants over three rounds of sessions as we worked to discover the problem.
Deliverables:
Transcripts, notes, feedback
Activities:
Synthesized research and shared what we learned with the team.
Deliverables:
Research summary, insights, recommendations
Activities:
Created a research repo to import all the research data in one place, group it into common themes, tag it, and categorize it for search later.
Deliverables:
Shared findings with the team.

Deployments are stressful, thats when things break
Engineers, responsible for maintaining service health, need to know what changes have occurred in their system after a deployment - but are faced with using noisy alerting systems and spending many minutes sifting through dashboards. This creates mistrust and wasted productivity.
Engineers need to know right away when an a deployment has an issue
Alerting systems are too noisey and hard to use
Deplpoyments dont happen all at once, they ar a little at a time over various geographic servers
Cardinatliy issue mean that there can be thousands of metrics being monitored
Engineers dont know where to look when an issue occurs
"Just show me what changed"
Instill confidence in developers when new code is deployed that any issue will be detected and surfaced?
DevOps engineers, Site Reliability Engineers, and distributed systems engineers
Although roles vary in focus, they share the responsibility for deploying, scaling, and maintaining applications across multiple servers or environments.
Topline analyzes key operations data to help teams identify the root causes of performance issues in distributed systems.

Detailed chart interactions to kick off a root cause analysis.

Comparing latency of deployment versions

Investigating metric performance

Comparing latency across operations with changes in performance

Birdseye view of system performance
We tested our key design decisions with our design partners. Overall the feedback was positive, but there were still issues.
Users are seeing data they don’t care about, and having trouble finding the operations that matter
Even with the Topline view its hard to find what chagned the most after a deployment
Users want to switch between operations without having to navigate away or hide the other operations
Help users find the operations that matter
Customers weren't seeing the operations that they want to see. This led to a new feature, Key Operations, which allows admin users to select the important operations to monitor per service.
Not only does this solve a major customer pain for many of our biggest customers, it offers us another pricing lever and sell and raise the number of Key Operations being monitored.

Show operations that changed the most after a deployment
"Just show me what changed" - We allowed user to choose key operations to monitor. Not only did this solve a major customer pain for many of our biggest customers, it offers us another pricing lever to sell on.

See a detailed view without losing the context
User should be able to interact directly with the chart without having to navigate away from the page and begin a root cause analysis flow.

100% voluntary user adoption, 10x increase in page interactions, 3x investigations launched
Every user with access to the beta feature chose to use it. Topline became the main jumping off point for our Root Cause Analysis workflows
Having your core users around you is a huge bonus. Share research insights and invite your team to participate
Working with the in-house engineers gave me a lot of in promptu feedback, insights that would have taken a lot of resources to get otherwise - finding participants, scheduling, and facilitating sessions. Inviting the team to go through research findings, participate in interviews and get their thoughts creates alignment that is invaluable for building a cohesive solution quickly.
See more work

Managing a team designing complex ML Data Analytics tools

Designing the foundations of a new company