Released: OCR Pack

Updated Doc

Added a Security & Privacy page to the showcase doc
Can also be found in top nav:
image

Updated Privacy Policy

The Privacy Policy has been updated to reflect that I now store names and emails

  • The information is provided by Coda when you request a free token in the form
  • This is necessary in order to provide the automated personal credit system
2 Likes

@Christiaan_Huizer has provided a helpful perspective for the development of this pack, and he just released a nice blog about it :pray:

5 Likes

This pack is incredible! I have just spent a bit of time testing this pack with a will and a report on a mining company. Fabulous results - saved me a ton of time finding out what I needed to know. Check it out. Wizardry!

3 Likes

The server is now using multiprocessing, giving us ~3x speed for bigger documents!

A 66 page PDF used to take 43 seconds - It’s now done in 14 seconds

This will allow processing of even larger documents within the 100-second timeout


Keep in mind, if you expect more than:-

  • 40,000 characters - Change your output format, see post
  • 300,000 characters, be careful how you use ReadTextFile() on the Text File output, worst case Coda will automatically cancel your formula if you try to display it directly
    image
4 Likes

You can now optionally send the scanned text to an AI along with a prompt!

Example: Summarize a 66 page PDF

Updated Scan()

// Get text like normal
Scan(PDF)
// Send the text to the default GPT-4o mini
Scan(PDF, prompt: "Summarize this please")
// Use the smarter but costly GPT-4o
Scan(PDF, prompt: "Summarize this please", engine: "gpt-4o")

New sync table: Engines

Updated data in sync table: Requests

You can see separate AI info for each request:

{"ai": "gpt-4o-mini", "pages": 2, "ai_cost": 1, "file_count": 2}
2 Likes

New Formula: ScanToRows

I’m excited to announce a much-requested feature!

Automatically create rows from images and PDFs in any structure you choose

Here’s a showcase video! (5 mins)


I’ve worked hard on a general solution. For example, I asked it to process my profile picture:

Or some shapes with no prompt:

Updated Scan Formula

The Scan() formula now supports more engine options, like engine: "gpt-4o-mini", which allows AI-based OCR. This means it can capture extra details like colors, text positions, and more.

Other

  • I’ve raised the amount of free credits from 50 to 100
  • Added Path column to Requests sync table
  • Updated Engines sync table
9 Likes

impressive work and contribution, extracting a pdf and getting intelligent outcomes was never so easy!

1 Like

Fantastic ! Thank you so much!

1 Like

Do you have a suggestion for best practice for designing a system that utilises this pack for the following scenario:

Extracting summary and row data from invoices from 20+ suppliers, where the formatting of invoices varies from supplier to supplier?

To be a bit more specific about me question, do you have a suggestion for how to scale up the design you demonstrated in this youtube video,

https://www.youtube.com/watch?v=oe3sKeo5B0I&t=4s (Convert PDFs to Coda.io Rows Instantly with OCR Pack!)

such that a wide range of invoice formats can be processed?

Thank you for creating this pack, and for the fantastic explainers, they are making the pack so much more accessible for me!

1 Like

Hi Sam thanks for your question!
That setup in the video should be able to handle it!
I’d recommend considering adding some columns to the Invoice table such as

  • Currency (Text)
  • Creation Date (Date)
  • Expiration Date (Date)

as well as any other data points that generally occur within your invoices, otherwise the AI will probably put it in the Other Info column if you have that one

Imagine it’s a person you’re sending a message to to fill out your Coda rows, is there enough information to understand what values each cell should have, and how many rows per table?
If not then make your table names and column names more descriptive or simply give additional instructions with the prompt parameter

I’ve worked hard to accomplish a general dynamic solution, it’s an involved process under the hood, but for the user I’ve hoped to make it as intuitive as possible. If the AI has issues at any point then it will abort the process and create a nice error message for the user, hopefully informing what actions they can take to make it work

3 Likes

Fantastic!
You’ve given me plenty of info to keep going on with. I’ll report back with any progress updates.

2 Likes

Check out the new quick-start with 10 simple steps!

Flowchart

Here’s a flowchart for my OCR and PDF packs, which I understand are confusingly similar :sweat_smile:

And thanks to @Christiaan_Huizer for his continuous support and latest post!

3 Likes

This is SO helpful. Thank you for taking the time to put it together.

1 Like

For anyone who has used this pack in the past and gotten mediocre OCR results – I highly recommend trying again, and be sure to try one of the new models. I’m finding great results with both gpt-4o and gpt-4o-mini. Totally amazing.

Thank you @Rickard_Abraham!

8 Likes

Indeed it is a fantastic pack with great capabilities, I wrote about it on LinkedIn.

This pack means no more manual data entry, no more tedious searches, and no more missed insights.

1 Like

I’m super honored you tried my pack :pray:
Thanks a lot for the feedback!

2 Likes

A few small suggestions:

  1. Change the default model to one of the better ones, and also publish a bit more instructions on how to change the model. I suspect many people will have better results that way.
  2. Add a pack action for Scan in addition to the formula entrypoint. I was initially baffled by the fact that only ScanToRows was available as an action. Obviously more proficient Coda users know that you can wrap any Pack formula in a ModifyRows() action and turn it into a PackAction, but I suspect many will not know that.

Thanks!

4 Likes

This is great! I’m always looking to improve :clap:

  1. That makes sense. I’ve tried my best to plan for the future by making the default engine “Balanced”, as new models will continously appear this allows me to accomodate for that. It’s a very interesting problem though, as all models vary in Cost, Speed, Quality, and I/O.
    For the cost and speed I’ve done some linear regression to get estimates


    But, I’ll improve the instructions and have a look at the default models/engines!

  2. Oh, making Scan an action would have been much better, I honestly didn’t know actions could return data to the user, would you look at that


    I’ll add a pack action for it like you suggested, and probably begin deprecating the formula alternative

2 Likes

Alright got a nice little update!

  • OpenAI costs are reduced by 20%
  • gpt-4o now points to an updated model which is ~50% cheaper
  • Scan() formula is now an action
  • Default engine “Balanced” will now:
    • Use gpt-4o-mini if a prompt is provided
    • Continue to useg-doc-ai-v2 if no prompt is provided
  • Engine choice instructions are clearer

Thanks again for the feedback @shishir :pray:


Note for Scan() change

Changing Scan() to an action is technically a breaking change, but it will only affect workflows where it’s been used outside a button.
Using Scan() as a formula is strongly discouraged as it will consume your credits with every automatic recalculation.
And also, as this pack isn’t seeing too much traffic, this felt like the best path forward

1 Like

Ah the breaking change did trip me up a bit - but agreed it is much better for this to be an action instead of a formula. Thanks for the updates!