I have built a simple pack to extract readable text from a PDF.
The user just passes a file to the formula and the pack just downloads the file via the fetcher
execute: async function ([fileURL], context) {
let response = await context.fetcher.fetch({
method: "GET",
url: fileURL,
isBinaryResponse: true, // Required when fetching binary content.
cacheTtlSecs: 86400 // 24 hours - maximum allowed
});
let buffer = Buffer.from(response.body);
...
The pack works fine, but the issue I’m running into is that the formula is constantly being rerun in a table with about 300 rows. The files haven’t changed, and even with cacheTtlSecs
set the results are rarely returned from the cache. This results in the document being slow, and most of the time showing as “syncing”.
As I understood from the documentation, the pack formula should only run once its inputs change. If the files are not changing why is it being constantly triggered? And why is it not fetching from the cache?
1 Like
To try to narrow down the issue I switched to the PDF Extract pack from the marketplace, and the issue seems to be persisting.
When a new row is added the document start to sync and takes a really long time to finish.
Since the attachment for the new row was an image instead of a PDF the pack formula was not even triggered because I check for that. So to me it looks like that coda will re-trigger to formula for the whole column, not just the new row.
If that is the case, the document will only get slower and slower as more rows are added.
Currently an alternative seems to be to do this with an automation instead of a formula, but that has its own drawbacks. It would be great to get some input from someone at Coda.
Hi @Nikola_Lajic - I’m not aware of any issues with formulas being run too often, and in general adding new row shouldn’t cause all previous rows to recalculate.
In addition to a cacheTtlSecs
on the fetcher call, make sure you also have it set on the formula itself. If you set cacheTtlSecs: 0
on the formula it can lead to some performance issues.
If that doesn’t work it might be best to reach out to Coda support since they can look into the specifics of your doc and narrow down the issue.
Thank you for the input @Eric_Koleda.
I think I found the issue while preparing a test document.
This works no problem:
Extract(Attachment.First()),
But as soon as some logic is added Coda seems to no longer understand what exactly is the dependency and triggers a refresh for all rows once a new row is added.
If(
thisRow.Attachment.Count() > 0 AND
thisRow.Attachment.First().Name.ContainsText(".pdf"),
Extract(Attachment.First()),
"Not a pdf"
)
You can test it out for yourself int this doc
Thanks for digging in and providing a repro case. Looking even deeper, it appears the issue is specifically related to your formula depending on File.Name
. If you don’t use the name in your formula then the entire column won’t recalculate when you add a new file. Unfortunately the new row won’t calculate correctly either, as your setup was actually relying on that full column recalculate to work around another known issue.
The formula engine has challenges working with files, since there is an async upload and scan taking place. This full column recalculate you’ve found could have been a workaround added in to deal with it for the case of file names.
Our general recommendation for dealing with files and image parameters in Packs is to make them into action formulas triggered by a button. It provides a better UX when the files is still being uploaded and scanned (you can click again in a few seconds) and avoids duplicative calculations.
1 Like
Thank you for clarifying @Eric_Koleda.
For now I have switched to parsing the file in the automation which works much faster.
Hopefully in the future there will be time for the Coda team to make file handling more robust