When using the filter formula on large lists with multiple expressions, which is better for performance?

Dataset.filter(exp1 AND exp2)

vs

Dataset.filter(exp1).filter(exp2)

Thanks!

When using the filter formula on large lists with multiple expressions, which is better for performance?

Dataset.filter(exp1 AND exp2)

vs

Dataset.filter(exp1).filter(exp2)

Thanks!

The first one, pretty much always

1 Like

Thanks for the prompt response!

In example 1, does the order you write the expressions affect perf?

**Disclaimer:** I’m not a Codan and my answer relies on: 1. my theoretical understanding of how software is built, 2. pieces of conversations I’ve had with engineers and read in the Community, and 3. some empirical observations.

@alexdeneui to the rescue

The order will matter if you have significantly complicated expressions there.

For simple operations like checks for equality / inequality / blankness, Coda will create a composite index that it will use to check for matches much more efficiently. Theoretically the order shouldn’t matter for that as the index will be reduced to some cryptic hash anyways.

For complicated expressions such index won’t be created and checks will recalculate entirely. Then the order of operations will matter because `AND`

can short-circuit: checks are made first to last, and whenever a check fails, the rest of the calculation doesn’t have to happen because the result is already `false`

. Same with `OR`

but another way around: if any check is `true`

then the formula resolves to `true`

without performing the rest of the checks.

P.S. I used to be under an impression that doing `Table.Filter(expr)`

into a column and then further doing `ThatColumn.Filter(otherExpr)`

would be more performant than doing `Table.Filter(expr)`

and `Table.Filter(expr AND otherExpr)`

. Turned out I was wrong. If both expressions are simple, it is actually more performant to filter on the `Table`

directly and not on the intermediate filter result even if I need both. The reason is the same: Coda will create a composite index on Table with `expr1 AND expr2`

but won’t create one on the `ThatColumn`

. Hence it will calculate `Table.Filter(expr1 AND expr2)`

faster.

P.S. It can depend though, so it’s always worth to test both ways when you have a significant data set, measure the time and compare.

4 Likes

Extremely helpful! Thanks!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.