When you have a lot of data, you might need to perform calculations on a list of results. Firebase Data Connect already supports @view
for you to write SQL to do these, but we’re excited to announce that you can now use aggregate fields to do the same calculations.
For each field in your schema, Data Connect generates corresponding aggregate fields depending on the field type. These are the supported aggregates, and this blog post dives deeper into each one:
_count
: returns the number of rows that match your query.
- All fields have a
<field>_count
field, which counts how many rows have a non-null value in that field. - Numeric fields (
int
,float
,int64
) have<field>_min
,<field>_max
,<field>_sum
, and<field>_avg
. - Date and timestamp fields have
<field>_min
and<field>_max
. distinct
argument: get all unique values for a field (or combination of fields). You can also usedistinct
on aggregate fields to instead aggregate the distinct values.- Grouped aggregates: selecting a mix of aggregate and non-aggregate fields on a type applies the aggregate fields to split by each distinct combination of values for non-aggregate fields.
where
andhaving
arguments return only groups that meet the provided criteria.where
lets you filter rows based on non-aggregate fields andhaving
lets you filter groups by aggregate fields.
Movie Example App
To see how each of these would have been written with @view
versus how much less code you can now write using aggregates, we’ll use the running movie review app example you might already know from the documentation and the Everything Data Connect deep-dive blog series. We also cover some cases in which you still need to use @view
for more complex aggregates.
As a review, these are the relevant tables:
type Movie @table(name: "movie", key: "id") {
id: UUID! @col(name: "movie_id") @default(expr: "uuidV4()")
title: String!
releaseYear: Int
genre: String @col(dataType: "varchar(20)")
rating: Int
description: String
}
type Review @table(name: "Reviews", key: ["movie", "user"]) {
user: User!
movie: Movie!
rating: Int
reviewText: String
reviewDate: Date! @default(expr: "request.time")
}
Count
For the entire table
For the movie app, let’s say you want to show the number of movie titles available in the app.
Using a view and writing SQL:
type MovieCountView @view(sql: """
SELECT COUNT(*) FROM movie AS movies_count
""") {
moviesCount: Int @col(name: "movies_count")
}
query MovieCount @auth(level: USER) {
movieCount: movieCountViews() {
moviesCount
}
}
Instead, you can use the _count
aggregate field, which returns the number of rows that match your query.
query MovieCount {
movies: {
_count
}
}
Calling the query returns the number of rows in the Movie
table, giving you the number of movies in your database.
You eliminate the need to define a @view
in the type, and you directly use an aggregate field to get the value your app needs. This goes for all aggregate fields, and just for brevity, the rest of this blog post will not cover the query needed to use the @view
defined in the type, because those queries will look very similar to the one shown in this example.
Less code and easier to read, yay!
For a specific field
Now that you know how many movies your database has, you may want to find out how many movies in your database have a rating. This way, you can gauge if it’s worth the effort to create an incentive structure for users to give ratings and write reviews for non-rated movies.
Using a view and SQL:
type RatingsCountView @view(sql: """
(SELECT COUNT(rating) FROM movie) AS ratings_count
""") {
moviesCount: Int @col(name: "ratings_count")
}
All fields have a <field>_count
field, which counts how many rows have a non-null value in that field. Using the <field>_count
field, the above code becomes:
query RatingsCountView {
movies: {
rating_count
}
}
Numeric Fields
You’re next interested in seeing the oldest and newest movies in your database by release year, along with the average release year of all of the movies. This way, you know if your database is biased towards newer or older movies.
Using a view and SQL:
type OldestAndNewestMovieView @view(sql: """
(SELECT MIN(releaseYear) FROM movie) AS min_release_year
(SELECT MAX(releaseYear) FROM movie) AS max_release_year
(SELECT AVG(releaseYear) FROM movie) AS avg_release_year
""") {
minReleaseYear: Int @col(name: "min_release_year")
maxReleaseYear: Int @col(name: "max_release_year")
avgReleaseYear: Int @col(name: "avg_release_year")
}
All fields regardless of type have <field>_count
as mentioned above. Numeric fields (int
, float
, int64
) have additional fields: <field>_min
, <field>_max
, <field>_sum
, and <field>_avg
.
For this example here, you can use <field>_min
, <field>_max
, and <field>_avg
:
query OldestAndNewestMovies {
movies: {
releaseYear_min
releaseYear_max
releaseYear_avg
}
}
Less code again, hooray!
Dates and Timestamps
You’re intrigued by oldest and newest, so next, you’re curious about finding the oldest and newest reviews.
Using a view and SQL:
type OldestAndNewestReviewView @view(sql: """
SELECT MIN(reviewDate) as min_review_date, MAX(reviewDate) as max_review_date FROM review
""") {
minReviewDate: Date @col(name: "min_review_date")
maxReviewDate: Date @col(name: "max_review_date")
}
Date and timestamp fields have <field>_min
and <field>_max
, so using those:
query DateAndTimeMinMax {
reviews: {
reviewDate_min
reviewDate_max
}
}
Distinct
List Unique Items
You’re done fiddling around with years and dates, and as a potential movie buff, you want to find all of the unique movie genres to see if there are any categories you don’t know about!
Using a view and SQL:
type UniqueGenresView @view(sql: """
SELECT DISTINCT genre FROM movies AS genres
""") {
genresList: [String] @col(name: "genres")
}
The distinct
argument gets all unique values for a field or combination of fields:
query ListDistinctGenres {
movies(distinct: true): {
genre
}
}
Less code again! You might be seeing a trend here…
Count Unique Items
You notice a long list of unique movie genres, and of course as a coder, you can’t manually count them! So of course, you write some code to see how many there are.
Using a view and SQL:
type UniqueGenresView @view(sql: """
SELECT COUNT(DISTINCT genre) FROM review AS genres_count
""") {
genresCount: Int @col(name: "genres_count")
}
You can also use distinct
on aggregate fields to instead aggregate the distinct values:
query CountDistinctGenres {
movies: {
genre_count(distinct: true)
}
}
WHERE clause You know that in public media, horror movies are often controversial. Do people like them or not like them? Well, you happen to have all of this data on movies, so you figure out the answer once and for all by finding the average rating of horror movies.
Using a view and SQL:
type RatingPerMovieView @view(sql: """
SELECT AVG(rating) as average_rating FROM movies
WHERE genre="Horror"
GROUP BY user
""") {
avgRating: [Float] @col(name: "avergae_rating")
}
You can use where
to filter rows before they’re aggregated:
query FilteredBestRated {
movies(where: {genre: {eq: "Horror"}}): {
rating_avg
}
}
Are you reading my mind? “Less code yet again” - absolutely!
Grouped Aggregates
Now that you know the average rating for horror movies, you’re wondering how that ranks in genres generally, so you look for the average rating per genre. You need to group all the genres together, then find the average per each genre.
Using a view and SQL:
type AvgRatingPerGenreView @view(sql: """
SELECT genre, AVG(rating) as average_rating FROM movies
GROUP BY genre
""") {
genre: String? @col(name: "genre")
rating_avg: Float @col(name: "average_rating")
}
Grouped aggregates let you select a mix of aggregate and non-aggregate fields on a type, and it will apply the aggregate fields to split by each distinct combination of values for non-aggregate fields. Using grouped aggregates:
query AvgRatingPerGenre {
reviews {
genre
rating_avg
}
}
# If there are genres a, b, and c, an example result is:
[{movie: {genre: a}, rating_avg: 53},
{movie: {genre: b}, rating_avg: 89},
{movie: {genre: c}, rating_avg: 36}]
HAVING clause
You want to know which genres are especially popular, so you look for genres with an average rating of above 85.
Using a view and SQL:
type HighlyRatedGenresView @view(sql: """
SELECT genre, AVG(rating) as average_rating FROM movies
HAVING(AVG(rating) > 85)
GROUP BY genre
""") {
genre: String? @col(name: "genre")
rating_avg: Float @col(name: "average_rating")
}
having
lets you filter groups by aggregate fields:
query HighlyRatedGenres {
reviews(having: {rating_avg: {ge: 85}}): {
genre
rating_avg
}
}
# If there are genres x, y, and z, an example result is:
[{genre: x, rating_avg: 85},
{genre: y, rating_avg: 100},
{genre: z, rating_avg: 93}]
We can hear you chanting “less code, less code, hip hip hooray!”
Cross-Table Aggregates
You then ponder how your favorite movie’s rating is. Do you really want to know though, what if it’s low? After much deep consideration, you decide to go ahead with finding the answer.
Using a view and SQL:
type HighlyRatedGenres @view(sql: """
SELECT m.title as title, AVG(r.rating) as average_rating
FROM review as r
JOIN movies as m on review.movie=movie.id
WHERE movie.title = "My Favorite Movie"
""") {
genre: String? @col(name: "genre")
rating_avg: Float @col(name: "average_rating")
}
query AverageReviewScore($movieId: UUID!) {
movie(id: $movieId): {
title
reviews_on_movie {
rating_avg
}
}
}
It took some time and extra brainpower to write that SQL in the view, but using aggregates, hopefully it uses less:
query AverageReviewScore($movieId: UUID!) {
movie(id: $movieId): {
title
reviews_on_movie {
rating_avg
}
}
}
Complex Aggregates
Although aggregate fields are great for performing a lot of calculations you could need in your app, you may still need to use @view
and SQL for complex or nested aggregates. For example, if you wanted to find the average number of reviews for a movie in each genre:
type AverageReviewCountByGenre @view(sql: """
SELECT genre, AVG(COUNT(r)) as average_number_of_reviews
FROM movies as m
JOIN reviews as r on r.movie = m.id
GROUP BY genre
""") {
genre: String @col(name:"genre")
averageReviewCount: Int @col(name: "average_number_of_reviews")
}
Benefits of Aggregates
Aside from the much less code you can write using aggregates over a view and SQL, aggregates are performed server side. This offers a number of benefits over calculating them client side:
- Faster app performance, since you avoid client-side calculations
- Reduced data egress costs, since you send just the aggregated results instead of all of the inputs
- Improved security, since you can give clients access to aggregated data instead of the entire data set
Try it out!
Aggregate fields help you do calculations more easily, and we hope this feature is helpful for your apps. Now that Data Connect is generally available, we hope to see you using it to build your apps, and we’re always looking for feature requests on our products, which you can submit on Firebase’s UserVoice. We hope to hear from you, and we can’t wait to see what you build!