Batch API requests for Packs

I have a table with 100 rows and a column from my pack (Delta Date Pack, extend Coda with Delta Date | Coda)

When I refresh the column it takes around 6 minutes for it to finish, it’s getting a bunch of Too Many Requests errors.

What about having an option for the pack maker to make Coda create an array of all column values to only send a single api request? (Could be a batch parameter to only batch 100 rows at once or whatever)

Or am I doing something wrong, is there an existing solution?

Hi @Rickard_Abraham - Congrats on the new Pack! While there are some known performance issues running Pack formulas in large tables, that sounds a bit more extreme. Did you by any chance add cacheTtlSecs: 0 to your formula? If so it could be causing you to hit a different rate limit.

1 Like

Thank you! Oh, yeah cacheTtlSecs was 0. Otherwise, nothing happens when the column is refreshed.

I removed the cache definition to let it be the default value, which is 5 minutes according to docs

I waited 8 minutes to refresh the 100 rows with the removed cache setting;

  • It took 26 seconds, much better!
  • It had around 50% cache hits.
    • I expected 0% cache hits.
    • Their only input is thisRow.Modified(), is there some dynamic cache TTL going on in the background, or what’s going on?

I took another test straight after another;

  • It took 13 seconds.
  • It had 100% cache hits.

That was as expected

If so it could be causing you to hit a different rate limit.

Is there a different limit request limit for when it’s 0? Are there some docs on the request limit implementation?

I feel like there is way too much focus on caching. It seems like an XY problem to me. Caching feels like a bandaid on the root cause which is having an API call for every single cell.

This whole ordeal is a bit frustrating, I’m hoping this will improve, in my inexperienced opinion it seems there are some possible solutions to this problem:

For Caching:

  • Allow Pack makers to return a dynamic cache TTL for each formula call.
  • Have an option to refresh the column with invalidated caches, i.e. like chrome does with ctrl+shift+R.

For Pack Formulas

  • The batching I mentioned above.
  • Client-side formulas defined by Packs.
    • At the moment I feel like it’s always preferred to not install a Pack if you can spend some time to make it run client-side, which is a shame
    • I haven’t tried it myself but doesn’t seem too complicated to sandbox it.
    • Is there a discussion for this somewhere?

The formula engine sometimes calculates the Pack formula more than once. For instance, currently the formula is calculated once for your client (browser) and again by the server when it’s making a snapshot of your doc. This is a technical detail that can change over time, but in general the formula engine works in mysterious ways and you can’t assume that one user action equals one formula calculation.

Yes, we have a rate limit in place for when the same formula is run with the same input multiple times and isn’t cached. I believe this is in place to prevent runaway calculations from burning up a lot of resources, but when you disable formula caching you are more likely to run into it.

In general we don’t document the exact rate limits, since they are complicated and subject to change. I think I will add a warning about cacheTtlSecs: 0 though, since it seems to cause more hard than good in most cases.

I’m sorry for the frustration, and thank you for the great feedback. Coda docs are certainly quite complicated under the hood, but we should make it so that Pack makers can contribute without needing to be aware of how the machinery works.

Can you provide an example of how you would use this? What signals would you use to change the TTL?

That’s a great idea. Is that something you’d imagine using just as a Pack maker, or Packs users as well? Something you’d want as a manual option, or something you could use in a button, etc?

I didn’t reply to the batching idea directly in my first response, but I think it is an interesting idea. There would need to be some thinking of how to handle the 60 second timeout for Pack executions, and I’m not even sure if the underlying formula engine could easily support it.

This has been tossed around since the Alpha days, but there are no plans for it. Architecturally this is a major change, like converting a car from gas to electric. It would be a great place to get to, but it’s such a large departure from the current system that it’s unlikely to happen anytime soon.

1 Like

Wow thank you so much for your detailed answer! Really appreciate it :+1:

I was thinking something like this, where cacheTtlSecs can be a function instead of a static number:

    cacheTtlSecs: async function ([param], context, execute_result) {
        switch (highest_unit(execute_result)) {
            case "seconds":
                return 1;
            case "minutes":
                return 60
            default:
                return 3600
        }
    },

Or possibly allow execute to optionally return an object instead, containing both the result (like normal) together with cacheTtlSecs. This would also make a nice platform for any future formula configuration that isn’t necessarily static.

    execute: async function ([param], context) {
        const result = "Hello " + param;
        const cacheTtlSecs = calculateCacheTTL(result);

        return {
            result: result,
            cacheTtlSecs: cacheTtlSecs
        }
    },

I was thinking an option in the dropdown here: image

Didn’t think about RefreshColumn() but it would make sense to add an optional parameter there as well if so.

Oh alright, I was unaware of the timeout, but I don’t think it should get close for most cases. My thought was to just do all the api overhead one time per batch instead of one time per cell. My pack does one cell calculation in 4 microseconds for example.

Disregarding the cache lookup time and whatnot, for my pack the time taken in seconds could be:
Currently: 0.26x
Batched: 0.26 + 0.0000041875x

Thanks for the detailed response. In regards to the dynamic cache TTL idea, I had meant can you provide some more context on a use case where you would want variable caching? I’m having trouble thinking of a case where the data itself would influence the caching.

Oh gotcha! So the example I tried to show previously would be for my Delta Date pack.
It produces a phrase such as “24 seconds ago”, in that case I’d like a TTL of 1 second.
Other times it might be “2 weeks ago”, in which case I’d put the TTL to the max of 24 hours.

A general use case would be to allow advanced users to decide the TTL with an additional parameter when they’re calling the formula. This would make it easier to suit every user’s needs

Ah, thanks for clarifying. It seems like the issue in your case is that when the user doesn’t pass in the second date you default to the current date and time, which makes the formula not deterministic. The formula engine certainly prefers deterministic formulas, where the same inputs result in the same outputs.

The other common case we see is pulling data from an API, but in that case it’s hard to imagine why you’d want to cache one item longer than another. I think the variable cache time use case may be pretty specific to your Pack, but I’ll keep an ear out for others.

You’re right, I had a suspicion I was going about this a bit backwards.

I’d say it is deterministic though, if you include the default parameter values defined on the “JS” level.
Do you think it’d be possible for the formula engine to include those values when creating the hash key?

Which in my case would include new Date()

execute: async function ([date1 = undefined, date2 = new Date()]:

I, too, have often struggle with the right approach to be kind to both users and Coda’s underbelly. I spend a fair bit of time designing tables that force conditional Pack requests, and while this approach does help, it requires a bit more complexity in table formulas.

No, I don’t think that would be possible with the current architecture. The formula engine isn’t aware of those defaults, and I don’t think it would even be possible to get the exact value used by a particular execution.

The better approach would be to make both dates required parameters, and have users pass in an explicit Now() formula if they want the current time. This would be a breaking change of course, but perhaps it’s early enough that you could do so with minimal user disruption.