By now, you probably already know that you can export your Firebase Analytics data to BigQuery, which lets you run all sorts of sophisticated ad hoc queries against your analytics data.
At first, the data set in BigQuery might seem confusing to work with. If you’ve worked with any of our public BigQuery data sets in the past (like the Hacker News post data, or the recent San Francisco public data that our Developer Advocate Reto Meier had fun with), it probably looked a lot like a big ol’ SQL table. Something like this:
The truth of the matter is that BigQuery can get much more sophisticated than that. The rows of a BigQuery table don’t just have to be straightforward key-value pairs. They can look more like rows of JSON objects, containing some simple data (like strings, integers, and floats), but also more complex data like arrays, structs, or even arrays of structs. Something a little more like this:
Firebase Analytics takes advantage of this format to bundle all of your users’ user properties together in the same row. Rather than have you perform some kind of join against a separate
user_properties table, all of your user properties are included in the same BigQuery row as an array of structs.
A slightly simplified version of the
user_properties struct in your BigQuery data
The same thing holds true for your events. Your event parameters are included inside your events as an array of structs. And it turns out these events themselves are stored inside of an array. One single row of data in BigQuery will often contain 2 or 3 Firebase Analytics events all bundled together.
This means a single row in your BigQuery table can contain an array of user properties, as well as an array of events, which in turn also have their own arrays of event parameters. I know combining all of that information into a data structure like this seems confusing at first, but in the long run, it actually makes your life easier because there aren’t any JOINs with other tables for you to worry about.
An important note on syntax
For all of these examples, I’m going to be using standard SQL, which is what all the cool kids are doing this days1. If you want to follow along, turn off Legacy SQL in your BigQuery options. Also, you’ll need to follow this link to access the sample Firebase Analytics data we’ll be using.
For example, I can see all of my event data at once just by calling
#standardSQL SELECT event_dim FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607` LIMIT 50
…and I’ll get back all of my event data, along with all of the event parameters, in one nice little table
And then if we want to get a list of all of my “Round completed” events, I can just write some SQL like this…
#standardSQL SELECT event_dim FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607` WHERE event_dim.name = "round_completed"
…which gives me a nice result of…
Error: Cannot access field name on a value with type ARRAY<STRUCT<date STRING, name STRING, params ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, ...>>>, ...>> at [2:17]
Oh. Oh dear.
Okay, so this won’t win any awards for “Best Error Message of 2017”2 , but if you think about it, the reason it’s barfing makes sense. You’re trying to compare a string value to “an element of a struct that’s buried inside of an array”. Sure, that element ends up being a string, but they’re fairly different objects.
So to fix this, you can use the
UNNEST function. The
UNNEST function will take an array and break it out into each of its individual elements. Let’s start with a simple example.
#standardSQL WITH data AS ( SELECT "primes under 15" AS description, [1,2,3,5,7,11,13] AS primes_array) SELECT * FROM data
…will give you back a single row consisting of a string, and that array of data.
Instead, try something like this:
#standardSQL WITH data AS ( SELECT "primes under 15" AS description, [1,2,3,5,7,11,13] AS primes_array) SELECT description, prime FROM data CROSS JOIN UNNEST (primes_array) as prime
What you’re basically saying is, “Hey, BigQuery, please break up that
primes_array into its individual members. Then join each of these members with a clone of the original row.” So you end up with a data structure that looks more like this:
The results are similar as before, but now each prime is in its own row:
You’ll notice that the original
primes_array is still included in the data structure. In some cases (as you’ll see below), this can be useful. In this particular case, I found it was a little confusing, which is why I only asked for the individual fields of
prime instead of
It’s also common convention to replace that
CROSS JOIN syntax with a comma, so you get a query that looks like this.
#standardSQL WITH data AS ( SELECT "primes under 15" AS description, [1,2,3,5,7,11,13] AS primes_array) SELECT description, prime FROM data, UNNEST (primes_array) as prime
It’s the exact same query as the previous one; it’s just a little more readable. Plus, I can now stand by my original statement that this data format means you don’t have perform any JOINs. :)
And the nice thing here is that I now have one piece of “prime” data per column that I can interact with. So I can start to do comparisons like this:
#standardSQL WITH data AS ( SELECT "primes under 15" AS description, [1,2,3,5,7,11,13] AS primes_array) SELECT description, prime FROM data, UNNEST (primes_array) as prime WHERE prime > 8
To get just that list of prime numbers between 8 and 15.
So going back to our Firebase Analytics data, I can now use the
UNNEST function to look for events that have a specific name.
#standardSQL SELECT event.name, event.timestamp_micros FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`, UNNEST(event_dim) as event WHERE event.name = "round_completed"
As you’ll recall, events have their own
params array, which contains all of the event parameters. If I were to
UNNEST those as well, I’d be able to query for specific events that contain specific event parameter values:
#standardSQL SELECT event, event.name, event.timestamp_micros FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`, UNNEST(event_dim) as event, UNNEST(event.params) as event_param WHERE event.name = "round_completed" AND event_param.key = "score" AND event_param.value.int_value > 10000
Note that in this case, I am selecting “event” as one of the fields in my query, which gives me the original array of all my event parameters nicely grouped together in my table results.
Querying against user properties works in a similar manner. Let’s say I’m curious as to what language my users prefer using for my app, something our app is tracking in a “language” user property. First, I’ll use the
UNNEST query to get just a list of each user and their preferred language.
#standardSQL SELECT user_dim.app_info.app_instance_id as unique_id, MAX(user_prop.key) as keyname, MAX(user_prop.value.value.string_value) as keyvalue FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`, UNNEST(user_dim.user_properties) AS user_prop WHERE user_prop.key = "language" GROUP BY unique_id
And then I can use that as my inner selection to grab the total number of users4 that fits into that group.
#standardSQL SELECT keyvalue, count(*) as count FROM ( SELECT user_dim.app_info.app_instance_id as unique_id, MAX(user_prop.key) as keyname, MAX(user_prop.value.value.string_value) as keyvalue FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`, UNNEST(user_dim.user_properties) AS user_prop WHERE user_prop.key = "language" GROUP BY unique_id ) GROUP BY keyvalue ORDER BY count DESC
I can also
UNNEST both my event parameters and my user properties if I want to create one great big query (no pun intended) where I want to look at events of a specific name where an event parameter matches a particular criteria, while also filtering by users who meet a certain criteria:
#standardSQL SELECT user_dim, event, event.name, event.timestamp_micros FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`, UNNEST(event_dim) as event, UNNEST(event.params) as event_param, UNNEST(user_dim.user_properties) as user_prop WHERE event.name = "round_completed" AND event_param.key = "squares_daubed" AND event_param.value.int_value > 20 AND user_prop.key = "elite_powers" AND (CAST(user_prop.value.value.string_value as int64)) > 1
Once you start playing around with the
UNNEST function, you’ll find that it’s really powerful and it can make working with Firebase Analytics data a lot more fun. If you want to find out more, you can check out the Working with Arrays section of BigQuery’s standard SQL documentation.
And don’t forget, you get 1 terabyte of usage data for free every month with BigQuery, so don’t be afraid to play around with it. Go crazy, you array expander, you!
1 ↩ The BigQuery team has asked me to inform you that this is really because standard SQL is the preferred SQL dialect for querying data stored in BigQuery. But I’m pretty sure they’re just saying that so they get invited to all the good parties.