Introducing a smarter algorithm in Crashlytics

Find and fix the root causes of issues faster

Crashlytics is rolling out a big update with a new, smarter algorithm for grouping events from your app. You’ll begin to notice new features that make it easier to get to the root of the failures in your app. Read on to learn more!

Crashlytics helps you prioritize the most pervasive problems in your app. We do this by analyzing events from your app – like crashes, non-fatals, and ANRs – and grouping each event into an issue with similar events. Issues with larger event counts are usually impacting your app the most.

With our latest updates, when Crashlytics gets new events from your app, we’ll check if they match to an existing issue. If there’s no match, we’ll automatically apply our smarter event-grouping algorithm to the event – you don’t have to do anything to get this update! Over time, you’ll start seeing new issues and features in your Issues table generated from this algorithm.

New issues now better highlight the various root causes of failures, so it’s easier to understand what problems are impacting your app the most.

Keep reading to learn about what spurred us to improve our grouping algorithm, how the experience is better now, and what you’ll see as Crashlytics gets new events from your app.

An evolving algorithm

When Crashlytics launched over 10 years ago, our event-grouping algorithm worked well. But as app development evolved over the years, we learned from our users and from our own analysis that we had opportunities to evolve, too.

Our old algorithm heavily based its event grouping on a single line of code that we identified as common across events. However, this methodology resulted in some frustrating grouping problems. And if you’re a long-time user of Crashlytics, you probably experienced one or both of these problems.

Under-grouping

Problem: Events with the same root cause were grouped into different issues.

When this happens, we say that Crashlytics is under-grouping – creating too many duplicate issues that should actually all be a single issue. This results in under-prioritization of a problem since events caused by the same root cause were spread out over multiple issues.

For long-time Crashlytics users who had line number changes between releases, we’re sure that you can relate to this under-grouping problem!

Over-grouping

Problem: Events with different root causes were grouped into the same issue in a disorganized way.

Over-grouping resulted in too many non-related events being grouped together without a way to understand if all the events had the same root cause.

Again, long-time Crashlytics users… you probably ran into this grouping problem if your app funnels errors through a common exception handler, which is a common setup that our old algorithm and dashboard features just didn’t handle well.

The big changes

To help fix the grouping problems described above, we’ve made two big changes.

1. Improvements to the fundamental grouping algorithm

To group events into issues, the analysis engine now looks at many aspects of the event, including the frames in the stack trace, the exception message, the error code, and other platform or error type characteristics.

We select the most relevant and actionable frames from your stack traces, based on our knowledge of platforms, common frameworks, and design patterns.

Our new algorithm creates issues where all events in the issue have a common point of failure. However, within this group of events, the stack traces leading to the failure might be different. A different stack trace could mean a different root cause.

To represent this possible difference within an issue, we’re introducing variants within issues – each variant is a sub-group of events in an issue that have the same failure point and a similar stack trace. With variants, you can debug the most common stack traces within an issue and determine if different root causes are leading to the failure.

What’s better now?

For the past several months, early-access partners have been testing our new grouping algorithm (including variant sub-grouping). They’ve reported seeing more ideally grouped issues and easier debugging of complex issues with various root causes.

With the new algorithm and variants, you’ll now spend less time trying to understand the root cause of problems in your app. Instead, we expect you to experience:

  • Revamped metadata displayed within the issue row
    It’s now easier to understand and triage issues in your app.
  • Fewer duplicate issues
    A line number change doesn’t result in a new issue.
  • Easier debugging of complex issues with various root causes
    Use variants to debug the most common stack traces within an issue.
  • More meaningful alerts and signals
    A new issue actually represents a new bug.
  • More powerful search
    Each issue contains more searchable metadata, like exception type and package name.

What’s going to happen to your Issues table?

We know that you’ve already spent valuable time triaging, managing, and debugging existing issues. So, top-of-mind for us is to ensure that we don’t interfere with this.

We won’t remove any existing issues from the Issues table. But over time, you’ll start seeing new issues pop up in your Issues table that look a bit different from what you’ve seen before – these are issues generated from our new algorithm. For example, we’re elevating the visibility of the stack trace’s most relevant symbol by moving it to the title, and we’re providing the exception type and message in the subtitle (before, it was buried in the issue’s details).

Since we’re keeping all your existing issues, we implemented the following workflow for incoming events. Your Issues table will evolve as new issues are generated from this algorithm.

A diagram explaining the flow outlined in the paragraph below
A diagram explaining the flow outlined in the paragraph below

When Crashlytics receives an event from your app, we’ll run it through our old (v1) algorithm to see if it matches to the events in any existing v1 issue. If so, we’ll group the incoming event with that issue. However, if an event doesn’t match to any existing v1 issue, then we’ll let our new smarter algorithm handle it. Over time, you’ll start to see more v2 issues populating your Issues table. And within some of those v2 issues, you might see variants, which are sub-groups based on stack trace.

Everybody is getting the improvements automatically

We’re automatically rolling out all the changes described above to everyone, so you don’t need to do anything to have the new algorithm analyze and group events from your app.

We can’t wait to hear what you think of these updates! If you have feedback or encounter any issues, please let us know by filing a report.