How to handle large docs (Archive Docs)

Check out this doc for an explainer on how to build an Archive Doc.

An archive doc can back up old rows from your large docs so they can be deleted, that way your current doc can run faster, but you’ll still have access to your data for building charts or running queries.

Included is a video of how it works so that you can implement it on your own:

Archive Doc - Watch Video

Thank you to @Kyleigh_Johnson, @Coda_Zac, and @Lena_Webster for help figuring this out!

12 Likes

Woo Woo Woo! Arc-high-five!! :clap:

2 Likes

hi @Connor_McCormick1 , very well done et Merci!

  • _Merge() is still an unsupported function, right? Do you feel okay about this?
  • What I like about the classic AddOrModifyRows() logic is that it prints the data as plain text, while in your solution you have to unpack every column with a formula. In your scenario you have automatically every column and I love this idea, while the AddOrModifyRows() breaks once a column is deleted that is part of the set-up. So I do see the serious advantage of your solution, but I also notice that the Archive Table becomes a bit heavier in this scenario due to the many unpack formulas you have to deploy. How do you feel about this?

I asked for acces to the source doc to check out the details

thanks again, great work!
Christiaan

1 Like

Dear @Connor_McCormick1 ,

I like the idea of color coding each step in the rules/code so it’s more clear what where should happen and in case of trouble shooting easier to know where to look for :mag::mag_right:

1 Like

@Jean_Pierre_Traets, @Christiaan_Huizer, @Jono_Bouwmeester, sharing settings on the Source Doc are now updated. Let me know if you still have any problems.

@Christiaan_Huizer re your questions:

  1. Since I’m unpacking the data with _Merge() I’m not overly concerned about it changing. All I’m grabbing is the underlying json datastructure, which will always be useful. If the formula were ever to break, it would just cause my archive automation to fail, which will just mean I’ll have to go change it. However, no data will be lost, and the retrieval function (ParseJSON) is definitely supported, so I’ll always be able to grab my data.
  2. Re the overhead of the ParseJSON formulas, I’m not sure how much slower it is. Presumably this is similar to what Coda is doing under the hood when it looks up the contents of a row in a Cross Doc table anyway. Either way, this is a tradeoff in exchange for way less maintenance on my part as columns are added and removed from the source doc.

What would be really great is if Coda supported a way to copy the row object that Cross Doc returns into a new table as an object. Then you could do this without _Merge() and could reference with just Row.Field like usual.

Or if Coda could cross doc more than 10k rows, then you could do this without archiving to a separate table.

Or if Coda could run quickly with more than 100k rows and complex lookup formulas, then you would never have to cross doc.

:point_up: all of this is worth pointing out just to show that the root of the problem is performance, and all of this is a workaround. And of course, Coda’s working hard on performance, and also these sorts of workarounds may always be necessary at some level of doc size and complexity :slight_smile:

1 Like

Update! Don’t use this approach. For reasons I haven’t yet figured out, this doesn’t work. My syncs constantly fail, and so do my automations. I don’t know why. I’ll update this thread once I figure out what’s going on

Fixed!

2 Likes

Update! Nevermind, it works! There was a performance issue because of a formula I was using. That is now fixed.

Here’s what changed.

Before, we were detecting whether the sync table was up-to-date with the archive table by looking at all of the rows of the archive table and trying to find one that matches. That’s silly! So many comparisons.

Now, when we archive a row we cache the archived contents in a row of the sync table called Archived Contents. The column that checks if the content is archived now can just look at the previously archived contents to see if it matches the current row’s data. If it’s the same, then the row is up to date. If it’s different then it has changed and needs to be updated, so mark the row as not archived. Duh!

We went from so slow that everything was frozen to blazing fast! Thank you @Shitong_Stone_Shou on the Coda team!

The video needs to be updated to match the new formula, but I’ll have to do that later.

2 Likes

Thank you, Coda team! I don’t know if this is new, but just noticed it and it’s really helpful!
image

I don’t see this within my docs. Where do you get this JSON option?
Thanks @Connor_McCormick1

Might need to turn developer mode on?

coda.io/account

I’m not entirely sure about this :point_up: (I would have said the same thing though :blush: ) but you need some text Coda will be able to interpret as JSON in your column :blush:.
(It seems to work like the Separate as date which is only available in the menu when right-clicking on a Date field AFAIK :blush: )

Ahh good call! Try getting some json and pasting it in the cell

1 Like

Hey those on this thread! (@Christiaan_Huizer , @Pch , @Connor_McCormick1 , @Jean_Pierre_Traets ) - Looking through some different archiving options and wanted to stay away from _merge()

The way Im approaching it now is:

  • Holding x amount of archived rows in a single row in a CSV format
  • Using ParseCSV(), formulaMap(), and addRow() to recover the rows

Its seeming to work really well but just wanted to see if ya’ll had any opinions on limitations? Don’t want to go for this method if Im going to run into obstacles down the road - (In this instance, won’t need the data for queries, charts, etc after its been archived. Just want to store the data and not lose it)

Here’s random example doc just as proof-of-concept:

Edit : The only real limits I’m noticing is cell length limits (The amount of characters you can hold in a single cell)

  • Cell with rich text: ~7,000 character limit
  • Cell with no rich text: ~43,000 character limit

You can store rich text (dates for example) as plain text and they are still recoverable. The run I did for the 43,000 character limit was:

  • 1156 rows
  • 4 columns of data per row

If you had to archive more than 43,000 characters you could easily do it in two runs. . .

7 Likes

that is insanely cool and clever, wow!

Compared to the archiving approach this is incredibly simple. However, it will still grow the size of the source doc and you obviously can’t use the data in the archived rows for computations (e.g. you can’t offer summary charts).

I’ll definitely be using this in the future though. Have you noticed a speedup?

hi @Scott_Collier-Weir , like @Connor_McCormick1 I like the simplicity of the approach. Like you I try to stay away from not supported functions.

I wonder why we should store so much data in a cell, what is the use case you have in mind?

The layout with the grey boxes is clever, I like it!

I noticed you applied the LineBreak(), the function to be released soon :wink:

Hey Christiaan! Yeah - I tried using both character(10) and LineBreak() - there was no difference in terms of the speed or amount of data stored, so I went with LineBreak() as, when it is released, it will be more accessible to the average user (B/c seriously, character(10) is so arbitrary).

Storing data in a cell because I am essentially storing over 1,000 rows worth of data in a single archived row thereby saving room and hopefully increasing speed of a doc by archiving data in a simple way.

I have clients with large docs and plenty of rows and this would allow the doc to scale more easily, work faster, etc

Running some tests, I think we can actually use data in the archived rows for computation:

See below:

I can pull out all dates, all names, etc the archived rows and then run different queries on them. Not yet sure If I can pull charts from them

This pulls out all dates from the archived rows:
image

I guess you could also combine cell data with main table data pretty easily, too.

Like if you want to answer the question, “how many sales have we made this year?” Is there an idiomatic way to simultaneously use non-archived and archived data?

Definitely easy ways to run queries on archived/non-archived data at the same time:

But you always gotta use listCombine() - Not as natural or intuitive as one would on a table but definitely possible

Yeah that’s a real nice pattern. Great find!