Aggregate Fields for Data Connect

When you have a lot of data, you might need to perform calculations on a list of results. Firebase Data Connect already supports @view for you to write SQL to do these, but we’re excited to announce that you can now use aggregate fields to do the same calculations.

For each field in your schema, Data Connect generates corresponding aggregate fields depending on the field type. These are the supported aggregates, and this blog post dives deeper into each one: _count: returns the number of rows that match your query.

  • All fields have a <field>_count field, which counts how many rows have a non-null value in that field.
  • Numeric fields (int, float, int64) have <field>_min, <field>_max, <field>_sum, and <field>_avg.
  • Date and timestamp fields have <field>_min and <field>_max.
  • distinct argument: get all unique values for a field (or combination of fields). You can also use distinct on aggregate fields to instead aggregate the distinct values.
  • Grouped aggregates: selecting a mix of aggregate and non-aggregate fields on a type applies the aggregate fields to split by each distinct combination of values for non-aggregate fields. where and having arguments return only groups that meet the provided criteria. where lets you filter rows based on non-aggregate fields and having lets you filter groups by aggregate fields.

Movie Example App

To see how each of these would have been written with @view versus how much less code you can now write using aggregates, we’ll use the running movie review app example you might already know from the documentation and the Everything Data Connect deep-dive blog series. We also cover some cases in which you still need to use @view for more complex aggregates.

As a review, these are the relevant tables:

type Movie @table(name: "movie", key: "id") {
  id: UUID! @col(name: "movie_id") @default(expr: "uuidV4()")
  title: String!
  releaseYear: Int
  genre: String @col(dataType: "varchar(20)")
  rating: Int
  description: String
}

type Review @table(name: "Reviews", key: ["movie", "user"]) {
  user: User!
  movie: Movie!
  rating: Int
  reviewText: String
  reviewDate: Date! @default(expr: "request.time")
}

Count

For the entire table

For the movie app, let’s say you want to show the number of movie titles available in the app.

Using a view and writing SQL:

type MovieCountView @view(sql: """
    SELECT COUNT(*) FROM movie AS movies_count
""") {
  moviesCount: Int @col(name: "movies_count")
}

query MovieCount @auth(level: USER) {
    movieCount: movieCountViews() {
        moviesCount
    }
 }

Instead, you can use the _count aggregate field, which returns the number of rows that match your query.

query MovieCount {
  movies: {
    _count
  }
}

Calling the query returns the number of rows in the Movie table, giving you the number of movies in your database.

You eliminate the need to define a @view in the type, and you directly use an aggregate field to get the value your app needs. This goes for all aggregate fields, and just for brevity, the rest of this blog post will not cover the query needed to use the @view defined in the type, because those queries will look very similar to the one shown in this example.

Less code and easier to read, yay!

For a specific field

Now that you know how many movies your database has, you may want to find out how many movies in your database have a rating. This way, you can gauge if it’s worth the effort to create an incentive structure for users to give ratings and write reviews for non-rated movies.

Using a view and SQL:

type RatingsCountView @view(sql: """
    (SELECT COUNT(rating) FROM movie) AS ratings_count
""") {
  moviesCount: Int @col(name: "ratings_count")
}

All fields have a <field>_count field, which counts how many rows have a non-null value in that field. Using the <field>_count field, the above code becomes:

query RatingsCountView {
  movies: {
    rating_count
  }
}

Numeric Fields

You’re next interested in seeing the oldest and newest movies in your database by release year, along with the average release year of all of the movies. This way, you know if your database is biased towards newer or older movies.

Using a view and SQL:

type OldestAndNewestMovieView @view(sql: """
    (SELECT MIN(releaseYear) FROM movie) AS min_release_year
    (SELECT MAX(releaseYear) FROM movie) AS max_release_year
    (SELECT AVG(releaseYear) FROM movie) AS avg_release_year
""") {
  minReleaseYear: Int @col(name: "min_release_year")
  maxReleaseYear: Int @col(name: "max_release_year")
  avgReleaseYear: Int @col(name: "avg_release_year")
}

All fields regardless of type have <field>_count as mentioned above. Numeric fields (int, float, int64) have additional fields: <field>_min, <field>_max, <field>_sum, and <field>_avg.

For this example here, you can use <field>_min, <field>_max, and <field>_avg:

query OldestAndNewestMovies {
  movies: {
   releaseYear_min
   releaseYear_max
   releaseYear_avg
  }
}

Less code again, hooray!

Dates and Timestamps

You’re intrigued by oldest and newest, so next, you’re curious about finding the oldest and newest reviews.

Using a view and SQL:

type OldestAndNewestReviewView @view(sql: """
    SELECT MIN(reviewDate) as min_review_date, MAX(reviewDate) as max_review_date FROM review
""") {
  minReviewDate: Date @col(name: "min_review_date")
  maxReviewDate: Date @col(name: "max_review_date")
}

Date and timestamp fields have <field>_min and <field>_max, so using those:

query DateAndTimeMinMax {
  reviews: {
   reviewDate_min
   reviewDate_max
  }
}

Distinct

List Unique Items

You’re done fiddling around with years and dates, and as a potential movie buff, you want to find all of the unique movie genres to see if there are any categories you don’t know about!

Using a view and SQL:

type UniqueGenresView @view(sql: """
    SELECT DISTINCT genre FROM movies AS genres
""") {
  genresList: [String] @col(name: "genres")
}

The distinct argument gets all unique values for a field or combination of fields:

query ListDistinctGenres {
  movies(distinct: true): {
    genre
  }
}

Less code again! You might be seeing a trend here…

Count Unique Items

You notice a long list of unique movie genres, and of course as a coder, you can’t manually count them! So of course, you write some code to see how many there are.

Using a view and SQL:

type UniqueGenresView @view(sql: """
    SELECT COUNT(DISTINCT genre) FROM review AS genres_count
""") {
  genresCount: Int @col(name: "genres_count")
}

You can also use distinct on aggregate fields to instead aggregate the distinct values:

query CountDistinctGenres {
  movies: {
    genre_count(distinct: true)
  }
}

WHERE clause You know that in public media, horror movies are often controversial. Do people like them or not like them? Well, you happen to have all of this data on movies, so you figure out the answer once and for all by finding the average rating of horror movies.

Using a view and SQL:

type RatingPerMovieView @view(sql: """
    SELECT AVG(rating) as average_rating FROM movies 
    WHERE genre="Horror"
    GROUP BY user
""") {
  avgRating: [Float] @col(name: "avergae_rating")
}

You can use where to filter rows before they’re aggregated:

query FilteredBestRated {
  movies(where: {genre: {eq: "Horror"}}): {
   rating_avg
  }
}

Are you reading my mind? “Less code yet again” - absolutely!

Grouped Aggregates

Now that you know the average rating for horror movies, you’re wondering how that ranks in genres generally, so you look for the average rating per genre. You need to group all the genres together, then find the average per each genre.

Using a view and SQL:

type AvgRatingPerGenreView @view(sql: """
    SELECT genre, AVG(rating) as average_rating FROM movies
    GROUP BY genre
""") {
  genre: String? @col(name: "genre")
  rating_avg: Float @col(name: "average_rating")
}

Grouped aggregates let you select a mix of aggregate and non-aggregate fields on a type, and it will apply the aggregate fields to split by each distinct combination of values for non-aggregate fields. Using grouped aggregates:

query AvgRatingPerGenre {
  reviews {
   genre
   rating_avg
  }
}

# If there are genres a, b, and c, an example result is: 
[{movie: {genre: a}, rating_avg: 53}, 
 {movie: {genre: b}, rating_avg: 89},
 {movie: {genre: c}, rating_avg: 36}]

HAVING clause

You want to know which genres are especially popular, so you look for genres with an average rating of above 85.

Using a view and SQL:

type HighlyRatedGenresView @view(sql: """
    SELECT genre, AVG(rating) as average_rating FROM movies 
    HAVING(AVG(rating) > 85) 
    GROUP BY genre
""") {
  genre: String? @col(name: "genre")
  rating_avg: Float @col(name: "average_rating")
}

having lets you filter groups by aggregate fields:

query HighlyRatedGenres {
  reviews(having: {rating_avg: {ge: 85}}): {
   genre
   rating_avg
  }
}

# If there are genres x, y, and z, an example result is: 
[{genre: x, rating_avg: 85}, 
 {genre: y, rating_avg: 100},
 {genre: z, rating_avg: 93}]

We can hear you chanting “less code, less code, hip hip hooray!”

Cross-Table Aggregates

You then ponder how your favorite movie’s rating is. Do you really want to know though, what if it’s low? After much deep consideration, you decide to go ahead with finding the answer.

Using a view and SQL:

type HighlyRatedGenres @view(sql: """
   SELECT m.title as title, AVG(r.rating) as average_rating
   FROM review as r
   JOIN movies as m on review.movie=movie.id
   WHERE movie.title = "My Favorite Movie"
""") {
  genre: String? @col(name: "genre")
  rating_avg: Float @col(name: "average_rating")
}

query AverageReviewScore($movieId: UUID!) {
    movie(id: $movieId): {
        title
        reviews_on_movie {
        rating_avg
        }
    }
}

It took some time and extra brainpower to write that SQL in the view, but using aggregates, hopefully it uses less:

query AverageReviewScore($movieId: UUID!) {
    movie(id: $movieId): {
        title
        reviews_on_movie {
            rating_avg
        }
    }
}

Complex Aggregates

Although aggregate fields are great for performing a lot of calculations you could need in your app, you may still need to use @view and SQL for complex or nested aggregates. For example, if you wanted to find the average number of reviews for a movie in each genre:

type AverageReviewCountByGenre @view(sql: """
    SELECT genre, AVG(COUNT(r)) as average_number_of_reviews
    FROM movies as m
    JOIN reviews as r on r.movie = m.id
    GROUP BY genre 
""") {
  genre: String @col(name:"genre")
  averageReviewCount: Int @col(name: "average_number_of_reviews")
}

Benefits of Aggregates

Aside from the much less code you can write using aggregates over a view and SQL, aggregates are performed server side. This offers a number of benefits over calculating them client side:

  • Faster app performance, since you avoid client-side calculations
  • Reduced data egress costs, since you send just the aggregated results instead of all of the inputs
  • Improved security, since you can give clients access to aggregated data instead of the entire data set

Try it out!

Aggregate fields help you do calculations more easily, and we hope this feature is helpful for your apps. Now that Data Connect is generally available, we hope to see you using it to build your apps, and we’re always looking for feature requests on our products, which you can submit on Firebase’s UserVoice. We hope to hear from you, and we can’t wait to see what you build!