Released: PDF Pack

Hi everyone!
Today I’m releasing the PDF Pack!

  • Merge PDFs into one
  • Read the text from PDFs
  • Get a picture for a PDF’s first page

It has a lot more potential but for now this is the functionality, I’ve seen a lot of requests for these so I’m hoping it will be of some help :orange_heart:

Here’s the interactive showcase doc for it: PDF Pack Doc

14 Likes

@Rickard_Abraham there is a bunch of people trying to get this.

try to post on this link: https://community.coda.io/t/search-within-pdf-and-images-ocr/36840/6

your solution is awesome thanks!

1 Like

Update notes for Release #2

  • Attempting to process PDFs that Coda yet hasn’t processed gives a nice message now:
    image

  • Trying to process nothing no longer gives an error!


@Eduardo_Nunes Thank you, that’s exciting to hear! I also made a post in the thread you suggested :slight_smile:

1 Like

I’ve added a privacy policy for CCPA and GDPR compliance!

Added Google Drive support!

1 Like

Update notes for Release #3


  • New Formula: Select pages of a PDF
    • Creates a new PDF with only the selected pages, you can even reverse the pages!

@Rickard_Abraham Are there future plans to allow the pack to read from restricted access PDF files in drive?

I’m not sure if that’s possible with the way drive works, but it would unlock new use cases for us.

1 Like

Thank you for the request! This should be possible, stay tuned :+1:

1 Like

Alright I got some news for you!
It was too messy to add google auth to the PDF Pack so I made a new pack instead which you can use in tandem! Here’s the forum post with some instructions:

Great, this looks perfect for what I need. Thanks!

1 Like

@Rickard_Abraham Can you address privacy concerns with this solution?

  1. Where does the upload data live?
  2. Can the data read from the PDF be used for training a model?
  3. This doc states it doesn’t process sensitive information. Not sure is that’s about my account or the data in the PDF: https://mandera.pythonanywhere.com/privacy
1 Like

Hi @Omar_Torresola, thank you for your questions!

  1. The uploaded files live temporarily on the server within a Python context manager, this ensures that the files are always deleted once processed.
    @contextmanager
    def pdf_files():
        ...
        try:
            yield filenames
        finally:
            remove_files(*filenames)
    
  2. I guess, anything’s possible, but then it wouldn’t comply with the privacy policy anymore, which I take very seriously.
  3. I couldn’t see personal sensitive information even if I wanted to, Coda being the intermediary doesn’t relay such info. Please see this article

This post talks a little more about the service I use for the server. (Not super relevant since I’m not storing anything but still!)

1 Like

Added OpenPDFUrl which gives you an URL to open a PDF in a separate tab, without downloading it!

Hi, I just found that I can’t combine pdfs from different columns. Can you solve this problem?

Hey Ivan! Thank you for reporting this issue.
I believe it is now resolved, sorry for any inconvenience caused

1 Like

Hi, i just found that i can’t merge pdf more than 4 mb. Is it possible to merge pdf more than 4 mb?

Hi there!
I’m afraid that this is a general pack issue, it’s Coda’s single request limit, discussed here and also here

I think the best solution would be to store the PDF temporarily on the server and expose it with a private URL which Coda could ingest, but for security and privacy reasons, nothing is currently stored on the server

So while this is possible I’m afraid it’d be a pretty major update that won’t happen anytime soon, sorry about that. In the meanwhile I’ll apply for an increased single request limit for the PDF pack, which could help a little

Hi @Ivan1, good news!
The PDF Pack can now return PDFs up to 10MB, instead of 4MB :clap:

5 Likes