Released: OCR Pack

Rickard_Abraham · January 19, 2025, 9:46am

I’m surprised the change affected you, it shouldn’t have, if you wrapped the formula in a ModifyRows() action like you said

And why are you on a different account, I’m afraid it looks like this might not be the real Shishir, please make a comment with @shishir to prove me wrong

shishir · January 20, 2025, 4:04pm

Ha, that’s me still working through the process of merging new accounts. It’s me

Re the breaking change - it’s unlikely it broke others, I was doing something a bit fancy!

Rickard_Abraham · January 20, 2025, 4:30pm

Ah fair enough, good to hear!

Chris_Williams · February 1, 2025, 6:19pm

@Rickard_Abraham I love the pack! It’s made our lives a lot easier when it comes to managing contracts. We often receive scanned copies back from customers and it can be a pain to extract information, especially if there are special or unique terms that we should keep in mind.

Right now, I’m using the OCR pack in combination with your Google Drive Extra Pack and it’s helped us build several powerful workflows. As an example, we’ve used those two packs along with a custom AI pack to create easy to consume contract summaries for our team. We have a few button columns that extract text and then call a llm to summarize it.

Unfortunately, we’ve run into some issues keeping up a consistent workflow because Coda’s “push all buttons” automation stops completely when it encounters its first error, rather than continuing to process the remaining buttons in the sequence.

With that in mind, I have two feature requests or rather a ‘wishlist’ of sorts:

File size metadata: For the Google Drive Extra pack, could you add a function that allows the users to extract the file size?

I’d like a function that accepts a Google Drive link and returns the file size. The output should be in a standard unit like bytes, as the Drive API’s size representation (e.g., “2 MB”, “2 KB”) isn’t consistently usable.

One of the common issues I run into is that we’ll try to download a file from drive that is over the 4mb limit which throws an error. Because of that, our automations or “push all buttons” buttons fail from time to time and I have to manually add an exclude from filter checkbox.
Unfortunately Coda’s Google Drive Pack doesn’t provide file size despite providing some metadata.
Today I have a n8n workflow that pastes the file size into my document, but I’d love to drop that workflow entirely and keep everything in Coda.
All I’d need is a function that takes the drive link and exports the file size in bytes or some common format, since the Drive API returns file sizes as “2 MB” or “2 KB”

Page Count: Similar to my ask above, I’d love it if there was a function that could output the number of pages in a document. This could be part of your OCR pack, PDF Pack, or Google Drive Extra - I don’t mind.

The other issue I run into often is that a document has too many pages and throws an error with the OCR pack, even if the file size is under 4mb. I’m not sure why this happens, but on occasion we’ll have document with 40+ pages. It’s rare, but again the automations (pushing all buttons) stop the first time they encounter an issue.
I view this as a way to pre-filter potentially problematic rows for someone from our team to investigate separately.

Happy to expand on these more. We’re definitely happy with the existing features, but I’d imagine these would also be use ful for other users of your pack(s).

Rickard_Abraham · February 1, 2025, 7:12pm

Hey @Chris_Williams,
I’m excited to read your post, and it sounds like you’ve made some awesome workflows!

Google Drive file size

This is a great idea, I’ll take a look this week to see if I can add it!

Page Count

My PDF pack already has PageCountPDFs()! I’m afraid you might’ve missed it

Allowing Errors

I don’t know of any good ways to allow errors either, would be nice to have

OCR Pack AI

Have you also tried to AI features inside the pack?

Entering the prompt parameter in the Scan() action, or
ScanToRows(), which can create coda rows directly

But since you’re having issues with too many pages sometimes, this could get a little tricky.
A possible solution would be to split the PDF with PageSelectPDF() in the PDF pack, but then you might lose valuable context if split on an unfortunate page.

The Google Document AI v2.0 engine will handle 40 pages with ease though! Downside is you can’t give it a prompt

Have you set the output parameter when you scan those 40+ pages that error sometimes, and to what, if so?

Chris_Williams · February 2, 2025, 2:45pm

Great, that sounds good!

Wow, I’m not sure how that happened. That should definitely work for us and help solve one of our challenges.

Ah, so yes we’ve looked into the AI features in the pack. The page count issue was the primary reason we created a separate custom pack to make the calls to the llm. We’ll revisit this though because our preference is always to use fewer packs and steps where we can. It makes it much easier to troubleshoot.

I appreciate the quick response, the Coda team should really highlight your pack(s) in the Docket or another marketing channel given the practical applications for many businesses and of course the praise from @Shishir_Mehrotra @Shishir_Mehrotra.

Rickard_Abraham · February 3, 2025, 10:49am

I’ve updated Google Drive Extra, read more here!
I also asked Coda if we can increase the size limit from 4 MB

Thank you again @Chris_Williams!

Chris_Williams · February 3, 2025, 1:17pm

@Rickard_Abraham Fantastic, thank you!

Rickard_Abraham · February 5, 2025, 7:41am

New OpenAI engines!

o1 (Vision functionality)
o3-mini (Only text)

Also, o1-mini (Only text) is now a third the cost

Rickard_Abraham · February 11, 2025, 2:26pm

New Google Gemini engines!

Gemini 2.0 Flash
Gemini 2.0 Flash-Lite Preview
Gemini 1.5 Pro

These engines are beasts at handling large media inputs - Cheap, fast, and decent quality!

103 page PDF example

Engine: Gemini 2.0 Flash
Prompt: Summarize this please
Credits: 11
Seconds: 15
Response: Irrelevant to post here – it looked good ^^

Videos

These engines can also look at videos, here’s a cherry picked example, using the top video here

Engine: Gemini 1.5 Pro
Prompt: How many vehicles do you count in the video, and what do they look like?
Credits: 1
Seconds: 4
Response:

Here are the vehicles visible in this clip, along with their descriptions:

1. A white box truck.
2. A blue van with several windows.
3. A light blue sedan.
4. A gray sedan.
5. Another box truck or van (it’s partially obscured)
6. A silver sedan.
7. A pale green bus.
8. A blue bus, either behind or in front of the pale green bus.
9. A silver van or truck.

There may be other vehicles on the road that aren’t clearly visible in the footage.

Super cool stuff!

Christiaan_Huizer · February 11, 2025, 2:29pm

that is really amazing and I am so happy you added this, will test it out soon and come back to you!

Blaise_d_Estais · February 16, 2025, 3:40pm

hey! i’m selecting o1-mini or o3-mini for image OCR… and it works!

but officially they don’t support image input…!

so why?

do you have a backup if error → what model are you using then (because i like it it works ok, but if it’s not the one i selected i’m happy to know the happy one that does the job!),

or did you find a hack to make it work (if yes please tell me )

Rickard_Abraham · February 16, 2025, 5:39pm

Hey Blaise, thanks for your questions!

Oh, that sounds odd, you should get this error if you try to give those models an image:

However, you can use text-only models with Scan() as long as you only give a prompt and no media input, could this be what happened for you?

There’s no backup model, if you chose a specific one then that’s the only one it’ll try

Magic-wise there’s a PDF pre-processor for the OpenAI models as they don’t take PDFs, only images. Gemini takes them natively which is pretty nice

There’s also the chaining of models, so you could set engines: List("g-doc-ai-v2", "o1-mini") which would make Google’s doc ai do the heavy OCR extraction, and then send that to o1 mini along with the prompt for example. This is not that useful given the current, very capable AI models though. Perhaps it’ll be more useful in the future!

Rickard_Abraham · April 14, 2025, 11:25am

Tool Support: Search

You can now tell the AI to search the web before responding!

Added a “Tools” column to the Engines sync table
Added tool “search” to these engines
- OpenAI
  - GPT-4o
  - GPT-4o mini
- Google
  - Gemini 2.5 Pro
  - Gemini 2.0 Flash
  - Gemini 1.5 Pro
Added parameter “tools” to the Scan() formula

Example

Scan(Invoice, prompt: "Find out who the CEO of the invoicing company is", tools: "search")

Rickard_Abraham · April 18, 2025, 3:11pm

New engines

OpenAI o3
OpenAI GPT-4.1
OpenAI GPT-4.1 mini
OpenAI GPT-4.1 nano

Updated ScanToRows()

Google AI engines are now supported for this formula!
New default engine Gemini 2.0 Flash which has these average results for ScanToRows():

Time taken: 15 seconds
Cost: 3 credits

Engine Outputs explanation

See which engines are supported when using ScanToRows() by looking for the row value in the Outputs column in the Engines sync table (see below).

Removed row output from “OpenAI o1”. It’s too slow, often hitting the 1 minute timeout (rows are still created but it’s annoying, and we have much better options available now)
Tools are still unsupported for structured outputs + vision with both AI providers, so we can’t enable search for ScanToRows() yet

Updated Engines sync table

Token limits (context windows) are now shown
Cost and speed is now clearer:

Topic		Replies	Views
Released: PDF Pack Showcase	31	2305	March 4, 2025
AI Screen shot of table to Coda Table Suggestion Box	9	934	April 1, 2025
Just released - OpenAI Vision Pack Showcase	1	242	February 28, 2024
Looking for Your Feedback Offers and Services	2	108	October 23, 2024
Looking for OCR API to Coda Requests and Gigs	3	1287	October 13, 2023