Sunday, January 19, 2014

Using for-comprehensions to build Query-APIs

I have been developing Java with various frameworks like J2EE, spring, hibernate or others for a couple of years now, but a few months ago I got the chance to switch to scala, akka and play. I really had to learn a lot in the beginning and of course I am still gaining new insights into these tools and libraries every day. To share some of the aha effects I experienced during the last months I am starting a series of blog posts that might be helpful to others who are thinking about a similar switch or are even already in the middle of such a change.

Using for-comprehensions to build Query-APIs

When you start working with scala, you will quickly be confronted with its for-comprehensions or for-expressions. If you loop for example over two collections, you can do this with a for-expression like this:

1       val as = List(1,2,3)
2       val bs = List(4,5,6)
3       val cs: List[Int] = for {
4         a <- as
5         b <- bs
6       } yield a*b

This computes the products of all numbers in a with all numbers in b resulting in a list of 9 Ints. The scala compiler actually translates a for-expression into a cascaded set of calls to flatMap, map, withFilter. So the example above actually becomes:

1       val ds: List[Int] = as.flatMap(a => => a*b))
3       ds must be (cs)

withFilter comes into play as soon as you add conditions to the for-expression, so:

 1       val as = List(1,2,3)
 2       val bs = List(4,5,6)
 3       val cs: List[Int] = for {
 4         a <- as
 5         if a % 2 == 1
 6         b <- bs
 7         if b % 2 == 0
 8       } yield a*b


 1       val ds: List[Int] = as.
 2         withFilter(a => a % 2 == 1).
 3         flatMap(a => bs.
 4           withFilter(b => b % 2 == 0).
 5           map(b => a*b))
 7       ds must be (cs)

The interesting thing here is that the compiler first translates the for-expression into the cascaded form and checks afterwards, if the provided expressions actually support the required method calls, e.g. in our case as' and bs' common type (List[Int]) has to have all of the three methods: flatMap, map and withFilter. If you use your type only in a simple for-expression like this:

1       case class Container[A](a: A) {
2         def map[B](f: A => B): Container[B] = Container(f(a))
3       }
5       val as = Container(1)
6       val b = for(a <- as) yield a+1

you do not need to implement flatMap or withFilter at all. You do not even have to be generic at all. If your class Container only supports Ints and the expression after the yield works with Ints that will compile as well. So for the example above Container could be implemented as:

1       case class Container(a: Int) {
2         def map(f: Int => Int): Container = Container(f(a))
3       }

So far we basically simplified our custom class and made it just implement what is actually required. In addition to that one can also change the signature of these methods in a way that may look surprising. For example withFilter takes a function as argument that maps whatever is filtered to Booleans. While the Boolean seems to be a very natural fit for the if, there is no requirement at all, that the expression after the if evaluates to a Boolean (i.e. that withFilter takes an argument of type A => Boolean). Let's check out a little example that basically exchanges the meanings of map and withFilter:

 1       case class Container(list: List[Int]) {
 2         def map(filter: Int => Boolean): Container =
 3           Container(list.withFilter(filter).map(identity))
 4         def withFilter(f: Int => Int): Container = Container(
 5       }
 7       val as = Container(List(1,2,3))
 8       val bs = for {
 9         a <- as
10         if a+2
11         if a*3
12       } yield a % 2 == 0

The for-expression translates to:

1       val cs = as.
2         withFilter(a => a + 2).
3         withFilter(a => a * 3).
4         map(a => a % 2 == 0)

Even if it might look like, this does not return a Container with a list of Booleans (even though the expression after yield evaluates to a Boolean). It rather first adds 2 to each element of the list (yielding List(3,4,5)), then multiplies each element with 3 (yielding List(9,12,15)) and after filtering only even elements remain (yielding List(12)).

While this seems to be a pretty useless example, you can definitely utilize this in a more meaningful way. The container could for example store the elements in a sorted set and answer filter-requests for ranges directly without actually looping through the whole set:

 1       class Container(sortedElements: SortedSet[Int]) {
 2         def this(elements: Int*) = this(TreeSet(elements: _*))
 3         def withFilter(filter: Int => RangeFilter): Container = {
 4           val rangeFilter = filter(0)
 5           new Container(sortedElements.rangeImpl(rangeFilter.from,
 6         }
 7         def map(id: Int => Int): Container = this
 8         def toList: List[Int] = sortedElements.toList
 9       }
11       class RangeFilter(val from: Option[Int], val to: Option[Int])
12       case class Within(min: Int, max: Int) extends RangeFilter(Some(min), Some(max))
13       case class LessOrEqual(max: Int) extends RangeFilter(None, Some(max))
14       case class GreaterOrEqual(min: Int) extends RangeFilter(Some(min), None)
16       val as = new Container(6, 10, 3, 5, 15)
17       val bs = for {
18         a <- as
19         if Within(5, 10)
20         if LessOrEqual(9)
21       } yield a

The for-expression translates to:

1       val cs = as.
2         withFilter(_ => Within(5,10)).
3         withFilter(_ => LessOrEqual(9)).
4         map(identity)

As the function that is passed to map is basically ignored, any function can be passed.

In this example the Container keeps its elements in a SortedSet. The method withFilter takes a function that maps the container-elements (Int) to a RangeFilter. Instead of looping through all elements and passing them to the provided function filter we simply want to retrieve the RangeFilter instance that is provided after the if in the for-comprehension. As the expressions after the ifs do not depend on a, filter returns for all integers the same RangeFilter, so we can pick any integer, e.g. 0 (see line 4). Having the RangeFilter-instance we can get the from/to values to pass them to SortedSet.rangeImpl and compute the filtered result without looping through the set.

With an implementation like this you can provide the user with an elegant and efficient way to query your container with for-comprehensions.

Another use case for this kind of surprising implementation for withFilter and map is to use the for-comprehension to build a query-object that is afterwards applied to a dataset. Slick uses this approach to build SQL-queries in a typesafe manner. I will give a simple example mirroring the code from the previous example (with the same definitions of the RangeFilter class and its decsendents):

 1       class QueryBuilder(filters: List[RangeFilter]= Nil) {
 2         def withFilter(filter: Int => RangeFilter): QueryBuilder =
 3           new QueryBuilder(filters :+ filter(0))
 4         def map(id: Int => Int): QueryBuilder = this
 5         def apply(set: SortedSet[Int]): List[Int] = {
 6           filters.foldLeft(set) { (filteredSet, filter) =>
 7             filteredSet.rangeImpl(filter.from,
 8           }.toList
 9         }
10       }
12       val query = for {
13         t <- new QueryBuilder()
14         if Within(5, 10)
15         if LessOrEqual(9)
16       } yield t
18       val result = query(TreeSet(6, 10, 3, 5, 15))

In this case the class that is used in the for-comprehension as source (QueryBuilder) is not a container any more. Having no elements to filter withFilter just collects the passed RangeFilters. Output of the for-comprehension is a QueryBuilder-instance that can be applied to arbitrary SortedSets to perform the previously recorded filtering.

As you can see for-comprehensions are even more powerful as you may think in the beginning. They are not only useful as generic query-language for all kinds of containers, but you have all options to implement filtering (or mapping) in a more efficient way than simply looping over all container elements. In addition to this you can even split building the query from executing it and use them as generic query-builder whose output is applied to arbitrary datasets.

No comments:

Post a Comment