Filter Formula Performance Question

When using the filter formula on large lists with multiple expressions, which is better for performance?

Dataset.filter(exp1 AND exp2)

vs

Dataset.filter(exp1).filter(exp2)

Thanks!

The first one, pretty much always

1 Like

Thanks for the prompt response!

In example 1, does the order you write the expressions affect perf?

Disclaimer: I’m not a Codan and my answer relies on: 1. my theoretical understanding of how software is built, 2. pieces of conversations I’ve had with engineers and read in the Community, and 3. some empirical observations.

@alexdeneui to the rescue


The order will matter if you have significantly complicated expressions there.

For simple operations like checks for equality / inequality / blankness, Coda will create a composite index that it will use to check for matches much more efficiently. Theoretically the order shouldn’t matter for that as the index will be reduced to some cryptic hash anyways.

For complicated expressions such index won’t be created and checks will recalculate entirely. Then the order of operations will matter because AND can short-circuit: checks are made first to last, and whenever a check fails, the rest of the calculation doesn’t have to happen because the result is already false. Same with OR but another way around: if any check is true then the formula resolves to true without performing the rest of the checks.

P.S. I used to be under an impression that doing Table.Filter(expr) into a column and then further doing ThatColumn.Filter(otherExpr) would be more performant than doing Table.Filter(expr) and Table.Filter(expr AND otherExpr). Turned out I was wrong. If both expressions are simple, it is actually more performant to filter on the Table directly and not on the intermediate filter result even if I need both. The reason is the same: Coda will create a composite index on Table with expr1 AND expr2 but won’t create one on the ThatColumn. Hence it will calculate Table.Filter(expr1 AND expr2) faster.

P.S. It can depend though, so it’s always worth to test both ways when you have a significant data set, measure the time and compare.

3 Likes

Extremely helpful! Thanks!