Released: OCR Pack

Rickard_Abraham · May 19, 2024, 8:20am

Updated Doc

Added a Security & Privacy page to the showcase doc
Can also be found in top nav:

Updated Privacy Policy

The Privacy Policy has been updated to reflect that I now store names and emails

The information is provided by Coda when you request a free token in the form
This is necessary in order to provide the automated personal credit system

Rickard_Abraham · June 13, 2024, 4:32pm

@Christiaan_Huizer has provided a helpful perspective for the development of this pack, and he just released a nice blog about it

Doug_Loud · June 14, 2024, 6:45pm

This pack is incredible! I have just spent a bit of time testing this pack with a will and a report on a mining company. Fabulous results - saved me a ton of time finding out what I needed to know. Check it out. Wizardry!

Rickard_Abraham · August 1, 2024, 10:41am

The server is now using multiprocessing, giving us ~3x speed for bigger documents!

A 66 page PDF used to take 43 seconds - It’s now done in 14 seconds

This will allow processing of even larger documents within the 100-second timeout

Keep in mind, if you expect more than:-

40,000 characters - Change your output format, see post
300,000 characters, be careful how you use ReadTextFile() on the Text File output, worst case Coda will automatically cancel your formula if you try to display it directly

Rickard_Abraham · August 21, 2024, 11:59am

You can now optionally send the scanned text to an AI along with a prompt!

Example: Summarize a 66 page PDF

Updated `Scan()`

// Get text like normal
Scan(PDF)

// Send the text to the default GPT-4o mini
Scan(PDF, prompt: "Summarize this please")

// Use the smarter but costly GPT-4o
Scan(PDF, prompt: "Summarize this please", engine: "gpt-4o")

New sync table: Engines

Updated data in sync table: Requests

You can see separate AI info for each request:

{"ai": "gpt-4o-mini", "pages": 2, "ai_cost": 1, "file_count": 2}

Rickard_Abraham · September 26, 2024, 12:26pm

New Formula: ScanToRows

I’m excited to announce a much-requested feature!

Automatically create rows from images and PDFs in any structure you choose

Here’s a showcase video! (5 mins)

I’ve worked hard on a general solution. For example, I asked it to process my profile picture:

Or some shapes with no prompt:

Updated Scan Formula

The Scan() formula now supports more engine options, like engine: "gpt-4o-mini", which allows AI-based OCR. This means it can capture extra details like colors, text positions, and more.

Other

I’ve raised the amount of free credits from 50 to 100
Added Path column to Requests sync table
Updated Engines sync table

Christiaan_Huizer · September 26, 2024, 4:19pm

impressive work and contribution, extracting a pdf and getting intelligent outcomes was never so easy!

Doug_Loud · September 26, 2024, 7:58pm

Fantastic ! Thank you so much!

Sam · October 3, 2024, 12:26am

Do you have a suggestion for best practice for designing a system that utilises this pack for the following scenario:

Extracting summary and row data from invoices from 20+ suppliers, where the formatting of invoices varies from supplier to supplier?

To be a bit more specific about me question, do you have a suggestion for how to scale up the design you demonstrated in this youtube video,

https://www.youtube.com/watch?v=oe3sKeo5B0I&t=4s (Convert PDFs to Coda.io Rows Instantly with OCR Pack!)

such that a wide range of invoice formats can be processed?

Thank you for creating this pack, and for the fantastic explainers, they are making the pack so much more accessible for me!

Rickard_Abraham · October 3, 2024, 6:40am

Hi Sam thanks for your question!
That setup in the video should be able to handle it!
I’d recommend considering adding some columns to the Invoice table such as

Currency (Text)
Creation Date (Date)
Expiration Date (Date)

as well as any other data points that generally occur within your invoices, otherwise the AI will probably put it in the Other Info column if you have that one

Imagine it’s a person you’re sending a message to to fill out your Coda rows, is there enough information to understand what values each cell should have, and how many rows per table?
If not then make your table names and column names more descriptive or simply give additional instructions with the prompt parameter

I’ve worked hard to accomplish a general dynamic solution, it’s an involved process under the hood, but for the user I’ve hoped to make it as intuitive as possible. If the AI has issues at any point then it will abort the process and create a nice error message for the user, hopefully informing what actions they can take to make it work

Sam · October 3, 2024, 7:52am

Fantastic!
You’ve given me plenty of info to keep going on with. I’ll report back with any progress updates.

Rickard_Abraham · October 9, 2024, 1:31pm

Check out the new quick-start with 10 simple steps!

Flowchart

Here’s a flowchart for my OCR and PDF packs, which I understand are confusingly similar

And thanks to @Christiaan_Huizer for his continuous support and latest post!

Sam · October 10, 2024, 1:36am

This is SO helpful. Thank you for taking the time to put it together.

shishir · January 5, 2025, 5:24pm

For anyone who has used this pack in the past and gotten mediocre OCR results – I highly recommend trying again, and be sure to try one of the new models. I’m finding great results with both gpt-4o and gpt-4o-mini. Totally amazing.

Thank you @Rickard_Abraham!

Christiaan_Huizer · January 5, 2025, 5:52pm

Indeed it is a fantastic pack with great capabilities, I wrote about it on LinkedIn.

This pack means no more manual data entry, no more tedious searches, and no more missed insights.

Rickard_Abraham · January 5, 2025, 8:27pm

I’m super honored you tried my pack
Thanks a lot for the feedback!

shishir · January 6, 2025, 2:15pm

A few small suggestions:

Change the default model to one of the better ones, and also publish a bit more instructions on how to change the model. I suspect many people will have better results that way.
Add a pack action for Scan in addition to the formula entrypoint. I was initially baffled by the fact that only ScanToRows was available as an action. Obviously more proficient Coda users know that you can wrap any Pack formula in a ModifyRows() action and turn it into a PackAction, but I suspect many will not know that.

Thanks!

Rickard_Abraham · January 6, 2025, 4:10pm

This is great! I’m always looking to improve

That makes sense. I’ve tried my best to plan for the future by making the default engine “Balanced”, as new models will continously appear this allows me to accomodate for that. It’s a very interesting problem though, as all models vary in Cost, Speed, Quality, and I/O.
For the cost and speed I’ve done some linear regression to get estimates

image565×199 20.4 KB

But, I’ll improve the instructions and have a look at the default models/engines!
Oh, making Scan an action would have been much better, I honestly didn’t know actions could return data to the user, would you look at that

image597×435 22.5 KB

I’ll add a pack action for it like you suggested, and probably begin deprecating the formula alternative

Rickard_Abraham · January 15, 2025, 4:06pm

Alright got a nice little update!

OpenAI costs are reduced by 20%
gpt-4o now points to an updated model which is ~50% cheaper
Scan() formula is now an action
Default engine “Balanced” will now:
- Use gpt-4o-mini if a prompt is provided
- Continue to useg-doc-ai-v2 if no prompt is provided
Engine choice instructions are clearer

Thanks again for the feedback @shishir

Note for Scan() change

Changing Scan() to an action is technically a breaking change, but it will only affect workflows where it’s been used outside a button.
Using Scan() as a formula is strongly discouraged as it will consume your credits with every automatic recalculation.
And also, as this pack isn’t seeing too much traffic, this felt like the best path forward

Shishir_Mehrotra · January 19, 2025, 6:26am

Ah the breaking change did trip me up a bit - but agreed it is much better for this to be an action instead of a formula. Thanks for the updates!

Topic		Replies	Views
Released: PDF Pack Showcase	31	2305	March 4, 2025
AI Screen shot of table to Coda Table Suggestion Box	9	934	April 1, 2025
Just released - OpenAI Vision Pack Showcase	1	242	February 28, 2024
Looking for Your Feedback Offers and Services	2	108	October 23, 2024
Looking for OCR API to Coda Requests and Gigs	3	1287	October 13, 2023