Atomic Formula: Do these not speed up docs like documentation says?

I was reading through “Optimizing Formulas” and saw two possible improvements I could make on a slow query:
(A) Atomic Formulas:
image
( Optimizing slow formulas in your doc | Coda Help Center)

This “Atomic Formula” concept is also mentioned in: ( Strategy three · Good habits: aggregate and optimize (coda.io))
image

(B) Making sure column output matches column type

I decided to implement in my doc, but had worse, not improved results.

I have a table t_Users. I also have a table t_TimeLog. t_Users has several columns which summarize durations per user from the t_TimeLog table. t_TimeLog has a Start Time, End Time, and calculated f_Duration based on those two datetimes.

The user summary columns look something like:
t_TimeLog.Filter(User = thisRow AND DateTime = AAA).f_Duration.Sum().Round(1)
t_TimeLog.Filter(User = thisRow AND DateTime = BBB).f_Duration.Sum().Round(1)
t_TimeLog.Filter(User = thisRow AND DateTime = CCC).f_Duration.Sum().Round(1)

(1) Because the output is a number, I set the column type of the summaries to Number.

(2) Also, because I reuse “t_TimeLog.Filter(User = thisRow)”, I thought I could pull that out and it could filter the user, and then use the filtered by user result to filter for the different times.

I assumed both of the above would speed up my columns. But they both did the opposite.

In the snapshot below, you can see I labeled “Hrs Worked This Week” by whether I set the column as a Text type or Num type, and labeled by whether I use the Raw unfiltered table to grab the durations, or if I use the shared Filtered by user column.

You can see that (C) is fasted, where I used the incorrect Text type, and where I went straight to the original table to filter out user, instead of using the pre filtered column (which I can reuse across the rest of the summary columns, so that each column doens’t need to grab the entire t_TimeLog table, but can just use the prefiltered by thisrow.user column).

Is consolidating the filter this way always a bad idea? And why is using Text for this field faster than using Num?

Hey there - I’m an engineer at Coda working on calculation performance and I’ve worked on a few things related to this, so I have a few ideas as to what could be happening here.

We have an optimization which automatically does some “Atomic Formula” work behind the scenes - this is useful when a user repeats the same calculations multiple times in one session, but is less effective when a user does those calculations for the first time. This is part of why we still recommend users do this for themselves, and splitting out shared formulas also makes tables easier to understand and maintain over time.

Assuming that this optimization is being used here, it would explain why the “non-atomic” formulas would be performing similar to or better than the atomic ones.

As for the Num/Text comparison, the times look both short enough that the difference between the two could be attributed to noise, but it’s hard to tell without the full timing data (i.e. exactly how many milliseconds each took).

To make sure I’m not missing something, could you provide a screenshot of the calculation timings with the full timing info, and also let us know how exactly you’re triggering these calculations? (i.e. are you adding/modifying a row, adding a formula, etc.)

3 Likes

Did I get it right that

t_TimeLog.Filter(User = thisRow)

as a separate column, and then

thatColumn.Filter(DateTime = AAA).Whatever.Sum()

performed worse than the

t_TimeLog.Filter(User = thisRow AND DateTime = AAA).Whatever.Sum()

?

For composite filters (i.e. the ones that have .Filter(expr1 AND expr2 AND expr3...) if the expressions are simple enough (equality/inequality check, contains, ifBlank etc) Coda will create a single index so that matching all the rows will be rather fast. This only works when you filter on the table directly. If you .Filter() on a list of rows (the result of the first Filter() formula), the index cannot kick in and Coda just checks all rows one by one to filter them out. So even if those lists are short, it still could be a longer operation than just directly filtering from the base table with a composite index.

I’m not a Codan and I’m not 100% sure this is what is happening. But I see the behavior that would prove this, so empirically this could as well be true.

4 Likes

There are any number of ways we can mix up variables here to get various results. The main lesson I wanted to get across in my article was to think about how to simplify workflows for better performance and for better ease of use.

In the example you listed, you did get a result that counters what I had in the article, but it also depends on what variable was changes as to what actually needs to calculate. In your example, if a new row is added with a new user, all formulas will have to recalculate because all formulas are dependent on the user column as a variable. But if a dateTime changes, only the smaller formula would need to run. If you have the whole formula together, you need to run the whole thing anytime anything changes. If you split them apart, you only need to run the part with the variable that changed.

So the goal is to find what is a best fit for you, both in performance and in how your formulas are written. For me, I like splitting things when possible because most examples I run into aren’t just two rules in the filter, but several to many rules in the filter.

5 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.