Lightstep logo

Design lead - Reducing The Stress of Deployments

100% voluntary user adoption, 10x increase in page interactions, 3x investigations launched

Lightstep product interface

My Role

  • Product Research
  • Product Design

Timeline

2019-2020

Team

Kay
Katia
Brandon
Me

Company size

~140 people

Context

Lightstep is an observability & tracing platform helping developers maintain system performance across deep distributed systems

Acquired by ServiceNow

Problem

Engineers, responsible for maintaining service health, need to know what changed in their system after a deployment - but are faced with using noisy alerting systems and endless dashboards

Research

I’m surrounded by my core user

I conducted 20+ interviews with thought leaders in the company, new engineers at Lightstep, new customers, power users, and our sales and customer success teams to understand the problem and the pain points.

1

Drafted research timeline

Activities:

Created a document to track the research process and the interviews we would conduct.

Deliverables:

Research plan, timeline, interview scripts, scheduled sessions

2

Conducted interviews

Activities:

Scheduled and conducted interviews with 20+ participants over three rounds of sessions as we worked to discover the problem.

Deliverables:

Transcripts, notes, feedback

3

Synthesized research

Activities:

Synthesized research and shared what we learned with the team.

Deliverables:

Research summary, insights, recommendations

4

Shared findings

Activities:

Created a research repo to import all the research data in one place, group it into common themes, tag it, and categorize it for search later.

Deliverables:

Shared findings with the team.

Research insights

Research Takeaways

Deployments are stressful, thats when things break

Engineers, responsible for maintaining service health, need to know what changes have occurred in their system after a deployment - but are faced with using noisy alerting systems and spending many minutes sifting through dashboards. This creates mistrust and wasted productivity.

Engineers need to know right away when an a deployment has an issue

+

Alerting systems are too noisey and hard to use

+

Deplpoyments dont happen all at once, they ar a little at a time over various geographic servers

+

Cardinatliy issue mean that there can be thousands of metrics being monitored

+

Engineers dont know where to look when an issue occurs

"Just show me what changed"

- 85% of participants

Core Value Prop

Instill confidence in developers when new code is deployed that any issue will be detected and surfaced?

Target Personas

DevOps engineers, Site Reliability Engineers, and distributed systems engineers

Although roles vary in focus, they share the responsibility for deploying, scaling, and maintaining applications across multiple servers or environments.

Solution

Topline analyzes key operations data to help teams identify the root causes of performance issues in distributed systems.

Lightstep application UI

Detailed chart interactions to kick off a root cause analysis.

Lightstep solution images

Comparing latency of deployment versions

Lightstep solution images

Investigating metric performance

Lightstep solution images

Comparing latency across operations with changes in performance

Lightstep solution images

Birdseye view of system performance

Usability Testing

We tested our key design decisions with our design partners. Overall the feedback was positive, but there were still issues.

Users are seeing data they don’t care about, and having trouble finding the operations that matter

Even with the Topline view its hard to find what chagned the most after a deployment

Users want to switch between operations without having to navigate away or hide the other operations

Iterations

Help users find the operations that matter

Customers weren't seeing the operations that they want to see. This led to a new feature, Key Operations, which allows admin users to select the important operations to monitor per service.

Not only does this solve a major customer pain for many of our biggest customers, it offers us another pricing lever and sell and raise the number of Key Operations being monitored.

revision based on feedback - Top changes

Show operations that changed the most after a deployment

"Just show me what changed" - We allowed user to choose key operations to monitor. Not only did this solve a major customer pain for many of our biggest customers, it offers us another pricing lever to sell on.

revision based on feedback - Key operaitons

See a detailed view without losing the context

User should be able to interact directly with the chart without having to navigate away from the page and begin a root cause analysis flow.

revision based on feedback - Service monitoring

Impact

100% voluntary user adoption, 10x increase in page interactions, 3x investigations launched

Every user with access to the beta feature chose to use it. Topline became the main jumping off point for our Root Cause Analysis workflows

Takeaways

Having your core users around you is a huge bonus. Share research insights and invite your team to participate

Working with the in-house engineers gave me a lot of in promptu feedback, insights that would have taken a lot of resources to get otherwise - finding participants, scheduling, and facilitating sessions. Inviting the team to go through research findings, participate in interviews and get their thoughts creates alignment that is invaluable for building a cohesive solution quickly.

Let's talk about your project

LinkedIn
GitHub
Resume

This site was built with ❤️ and AI by Ryan Brownlow

See more work

Sisu Data

Sisu Data machine learning case study - managing design team for complex ML data analytics tools

Managing a team designing complex ML Data Analytics tools

Korl

Korl AI presentation platform case study - designing the foundations of a new company

Designing the foundations of a new company