Released: OCR Pack

Updated Doc

Added a Security & Privacy page to the showcase doc
Can also be found in top nav:
image

Updated Privacy Policy

The Privacy Policy has been updated to reflect that I now store names and emails

  • The information is provided by Coda when you request a free token in the form
  • This is necessary in order to provide the automated personal credit system
2 Likes

@Christiaan_Huizer has provided a helpful perspective for the development of this pack, and he just released a nice blog about it :pray:

5 Likes

This pack is incredible! I have just spent a bit of time testing this pack with a will and a report on a mining company. Fabulous results - saved me a ton of time finding out what I needed to know. Check it out. Wizardry!

3 Likes

The server is now using multiprocessing, giving us ~3x speed for bigger documents!

A 66 page PDF used to take 43 seconds - It’s now done in 14 seconds

This will allow processing of even larger documents within the 100-second timeout


Keep in mind, if you expect more than:-

  • 40,000 characters - Change your output format, see post
  • 300,000 characters, be careful how you use ReadTextFile() on the Text File output, worst case Coda will automatically cancel your formula if you try to display it directly
    image
4 Likes

You can now optionally send the scanned text to an AI along with a prompt!

Example: Summarize a 66 page PDF

Updated Scan()

// Get text like normal
Scan(PDF)
// Send the text to the default GPT-4o mini
Scan(PDF, prompt: "Summarize this please")
// Use the smarter but costly GPT-4o
Scan(PDF, prompt: "Summarize this please", engine: "gpt-4o")

New sync table: Engines

Updated data in sync table: Requests

You can see separate AI info for each request:

{"ai": "gpt-4o-mini", "pages": 2, "ai_cost": 1, "file_count": 2}
2 Likes

New Formula: ScanToRows

I’m excited to announce a much-requested feature!

Automatically create rows from images and PDFs in any structure you choose

Showcase Video (5 mins)

I’ve worked hard on a general solution. For example, I asked it to process my profile picture:
Profile Picture

Or some shapes with no prompt:
Shapes

Updated Scan Formula

The Scan() formula now supports more engine options, like engine: "gpt-4o-mini", which allows AI-based OCR. This means it can capture extra details like colors, text positions, and more.

Other

  • I’ve raised the amount of free credits from 50 to 100
  • Added Path column to Requests sync table
  • Updated Engines sync table
8 Likes

impressive work and contribution, extracting a pdf and getting intelligent outcomes was never so easy!

1 Like

Fantastic ! Thank you so much!

1 Like

Do you have a suggestion for best practice for designing a system that utilises this pack for the following scenario:

Extracting summary and row data from invoices from 20+ suppliers, where the formatting of invoices varies from supplier to supplier?

To be a bit more specific about me question, do you have a suggestion for how to scale up the design you demonstrated in this youtube video,

https://www.youtube.com/watch?v=oe3sKeo5B0I&t=4s (Convert PDFs to Coda.io Rows Instantly with OCR Pack!)

such that a wide range of invoice formats can be processed?

Thank you for creating this pack, and for the fantastic explainers, they are making the pack so much more accessible for me!

1 Like

Hi Sam thanks for your question!
That setup in the video should be able to handle it!
I’d recommend considering adding some columns to the Invoice table such as

  • Currency (Text)
  • Creation Date (Date)
  • Expiration Date (Date)

as well as any other data points that generally occur within your invoices, otherwise the AI will probably put it in the Other Info column if you have that one

Imagine it’s a person you’re sending a message to to fill out your Coda rows, is there enough information to understand what values each cell should have, and how many rows per table?
If not then make your table names and column names more descriptive or simply give additional instructions with the prompt parameter

I’ve worked hard to accomplish a general dynamic solution, it’s an involved process under the hood, but for the user I’ve hoped to make it as intuitive as possible. If the AI has issues at any point then it will abort the process and create a nice error message for the user, hopefully informing what actions they can take to make it work

3 Likes

Fantastic!
You’ve given me plenty of info to keep going on with. I’ll report back with any progress updates.

2 Likes

Check out the new quick-start with 10 simple steps!

Flowchart

Here’s a flowchart for my OCR and PDF packs, which I understand are confusingly similar :sweat_smile:

And thanks to @Christiaan_Huizer for his continuous support and latest post!

2 Likes

This is SO helpful. Thank you for taking the time to put it together.

1 Like