Realistic number of rows in a table

For people watching this thread, we just made some announcements related to performance:

Hey @Angad, @Jason_Tamulonis and Codans!

A client asks me how many tables x rows can a Coda doc safely support, assuming we’re following best performance practices as explained in that article?

We’re not doing many aggregations, but we’re doing lots of lookups based on “ID” columns (mirroring our MySQL DB)

Thanks

1 Like

This topic got bumped up for me now, so here’s some sort of an answer.

Coda API and cross-doc stops working when the doc’s JSON goes over 125 MB limit. You can see the limit in developer tools — that would be the largest file loading. Usually it’s much smaller though in compressed state — 125 MB is the limit of uncompressed one.

Got this info from Coda support. That said, the doc would still load in UI even if it’s over that limit.

1 Like

Hi @Angad-- What determines “expensive formulas” ?

  • number operations </> text operations
  • lookups </> counts
  • how many “.” in the formula?
  • how many nested methods per formula?

Also what is considered an “expensive schema”?

  • first assumption is too many columns… but usually database tables are set up to handle lots of columns easily
  • columns refer to each other within the table?
  • columns refer to columns in other tables?

The maker community is filled with optimizers. If you guys could moderate a thread collecting all the best doc-optimization hacks… I know myself and (perhaps) other community members would appreciate.

If I’m just too new and haven’t found this holy grail of tips, please link me :slight_smile:

3 Likes

Short answers; will expand sometime later in a blog post

Expensive formulas:

  • Operations that become exponentially slower as the data set grows. Example: in Classes table, a column that looks up all previous classes by the same student. Every time a class is added, it recalculates all rows, and every time it’s comparing each row with every other row of the data set that grows (hence N^2 operations)

  • Along with the previous one: operations that unnecessarily recalculate data that is known to not change. Same example as above: looking up previous classes. When you’re adding new rows which are classes in the future, you know that this new row will not affect the calculation result of the past 1000 of classes, yet Coda would recalculate the whole column anyway. Solution in this case: to not use the formula column but calculate once on demand instead (e.g. with a button, and write the filter result in an editable cell)

  • Operations that result in multiple recalculation cycles. These happen when you have circular dependencies in your data. Example: you have tables Posts and Tags where each row on Tags is linked to a Post, and the table Posts has a lookup column that looks up all tags for this post. You have a button that adds 10 tags to a post, but each time it looks up into yourPost.Tags to see if it hasn’t been added already. Normally Coda is capable of performing many AddRows in one quick burst, but in this case there’ll be a delay between each AddRow because Coda will add a row, then recalculate Post.Tags column, only then add another row, then again recalculate Post.Tags column etc. Solution: rewrite the formula to not depend on Post.Tags on each iteration, but e.g. pre-filter the list of tags to add.

  • Everything else laid out in that optimizing help article, like avoiding using filters within filters.

Expensive schema:

  • Elements that Coda stores inefficiently, e.g. Buttons and conditional formatting, contribute to the doc size a lot. More: This doc is too big to sync
    Workaround for buttons in large tables (500+ rows): instead of adding those in such tables directly, make a small helper table that would host that button and perform the desired operation on another table. On the big table you can have a checkbox to select on which rows to perform that operation. Example: we have a huge (2000+ rows and counting) Classes table that is meant to scale for as long as possible, but for each added class we need to send out an email notification and calendar invite. Instead of having 2000+ buttons that are stored individually on the schema, I rewrote the logic so that an external button is sending that out for recently added rows.
  • Unnecessarily storing plain text as formatted text. More: Here's a doc size riddle for the community
  • Cross-doc also takes more space than tables within the same doc. So if you’re thinking of splitting the doc into smaller docs but still importing all the data in, that’ll only do harm in terms of doc size.

The holy grail of tips is gonna be here eventually: codatricks.com, but that’s after I’ll have beaten my depression current apathetic mood towards anything.

6 Likes

Don’t be depressed Paul :cyclone::ok_hand::expressionless::ok_hand::cyclone:

The best day to launch is always today; never when perfect.

Coda needs you!

4f0Y

1 Like

@Paul_Danyliuk What would be really helpful is a good formula/bad formula example for each of your three bullet points

  • Operations that become exponentially slower as the data set grows. :arrow_right: this sounds like a lookup formula for a lookup column in the same table, yes?
  • operations that unnecessarily recalculate data that is known to not change :arrow_right: so previous_classes is a column of the Students table? Totally lost you on the button-filter-result-in-editable-cell example
  • Operations that result in multiple recalculation cycles :arrow_right: this seems obvious, but the difficult thing is not knowing how Coda handles the recalculation cycles. Sounds like your example solution is to use a SELECT column instead of a LOOKUP column; with the SELECT column formula pre-filtering options to select from??

Clearly there are some loops and extra round trips with certain types of formulas.

Anyway… a good formula/bad formula table would be really awesome. I wonder if this itself would be a cool community coda doc…

Column 1: doc size optimization tactic
Column 2: bad formula that most beginners use when building codas
Column 3: good formula that reduces doc space and reduces expensive formulas

2 Likes

That’s why I’ve been trying to get a hold of a Coda engineer to talk to about the under the hood stuff. Haven’t had a chance yet, sadly. They say everyone’s busy with building features. So the only way I can learn is through my own trial and error (good thing I have a lot of opportunity to do so, working for clients and building ever more complicated docs).

I hear you about the good doc / bad doc examples. Need to make those docs from scratch though, because all the good examples are in the docs I cannot share.

Imagine this scenario that hits all three points. There’s a table of sales that calculates a salesperson’s cut as a % of the sale amount. But the % for this month is itself dependent on the sum of sales they did in the previous month (circular dependency of data). Adding a new sale could trigger a recalculation of all % for all months (including previous ones, unnecessarily), which could then trigger a recalculation of all salespeople’s cuts for all sales (possibly triggering the recalculation of % again, until the system is in equilibrium).

One of the solutions to get out of this cyclic dependency is to NOT use a formula somewhere but instead store the once-calculated value, e.g. in the end of the month capture the sum $ of all sales by that person and store that in a cell that’s not a formula. This way if a sale row is added in the future, it doesn’t trigger recalculation of the percentages table.


Fun thing about these issues is that it’s very hard to think of one when you need a simple example, yet in real projects they happen all the time :slight_smile:

They should take a page out of Stripe’s playbook and hire superstar community members (such as yourself) as contractor / internal engineers. That’s how Stripe came out of “seemingly” nowhere to dominate online payments: Stripe paid community members to come in and fix the issues they were seeing as they integrated Stripe’s API into their clients’ websites.

@shishir @maria
What will kill Coda is very simple: your users want to build bigger, better, more bad-ass apps-- but they can’t because internal engineers are racing with Notion and Airtable for feature parity instead of DevOps and backend scaling.

Coda is so much stronger than Notion and Airtable for a very niche but extremely valuable use case that penetrates every business. All you need to do is scale these docs to handle 10k, 100k rows in less than 200ms to maintain the lead. Notion will be stuck holding the bag of “well, we power the world’s internet of wikis” use cases, while high-performance teams have all switched to Coda to run their project/product management workflows.

SCALE THE EXISTING FEATURES, DON’T BUILD NEW FEATURES

5 Likes

Haha, thanks :slight_smile: I’m fine though; I run my own little practice as a Coda expert (and occasional business consultant) for hire, building the “bigger, better, more bad-ass apps” for end clients :slight_smile: We’re in touch with Coda and I pass on any feedback I get, but I’m not involved in development or planning of any sort. And, frankly, I don’t think I even could help with actual coding (I’m a has-been coder who burnt out, now I enjoy tinkering with Coda and quickly building ad-hoc solutions that provide value quickly, without long development cycles.)

I agree with your sentiment. Although I also believe Coda has good reasons to build what they’re building now. I don’t think Shishir and co. are bad strategists, and I believe their decisions are driven by sheer data (i.e. people using Coda for “normie” Notion-level stuff vs those few who build badass apps). Coda has been for like what, 6 years in development now? and got its paid tier only recently. Sure they need to think of the business side of things.

Regarding the “DevOps and backend scaling” part — well technically there’s no “backend” behind the docs. These are files, fully loaded into your browser and calculated by Javascript. Occasionally a backend-side runtime would recalculate your doc (API access, automations, cross-doc), but the overall concept is just that — a file that’s processed on the frontend. It could never be as scalable as an e.g. real indexed lazy-loaded database on the backend (which I thought Coda was for a long time before I dug into the doc files). Some improvement may come when they update the way they pack data in those files, but that’s just gonna make them const times smaller and allow for const times more rows, after which the file would become of the same size. That’s why I’m saying that if one picks Coda, they must be well aware that it’s just a file, with all the limitations that come out of it. That’s not good or bad, that’s just how it’s designed.

3 Likes

This is fascinating. So despite the absurdly many rings around the orange loading circle… Coda actually does work offline? (once the app successfully loads the doc file)

Didn’t mean to imply they were bad strategists. Clearly the team has vision, and I myself am in love with that vision.

I suppose if the best we can hope for is reduced doc size, my most immediate frustration is your discovery of how column buttons are stored in tables.

I made a doc that felt very much like an app with ~8 column buttons; was quite proud of the flow. It the task dump doc I’ve been dreaming for but never made time to build… It was awesome, until it got really really slow after maybe 300 rows. After your last breakdown describing the column buttons, I cut my losses and started from scratch with as few column buttons as possible.

But now I’m thinking,

“The main reason I switched from Notion to Coda was for the buttons. Now I’m not using buttons, what’s the point.”

Of course I was able to think of a few prominent reasons to stay, for sure, but it does point out a UX frustration.

RE the “it is an actual document” revelation-- I am surprised there hasn’t been a stronger emphasis on offline for the mobile app.

  • Google docs works offline with no loading screens (if you tag the file for offline mode ahead of time).
  • Slack works offline with no loading screens.

If the maximum file size is in fact only 125 MB, then it stands to reason we could keep at least 8 x 125 MB = 1 GB worth of coda docs stored in “offline mode”; pure, no-loading-screens, offline mode is the dream (a la Superhuman) @rahulvohra #suggestion-box

Hi @Charles_Jing,

This is a good discussion here, I just want to clarify a couple things.

There isn’t a doc size limit. The limit mention of 125 MB is a benchmark we’re using at the moment for whether or not we allow the API to run. The is a little bit of an arbitrary limit to prevent abuse. As far as the doc itself, it can be bigger.

The other thing I want to mention is that I would be very surprised to see a doc of 300 rows around the 100 MB mark. Just for reference, I have a doc that I currently use and it’s around 9,000 rows and at 15 MB for the doc size, so I would tend to think any slowness you’re seeing there has to do with formulas and schema. Generally, when we see this, there is a formula that is constantly being run or some sort of circular reference. Using Now() can catch up with you at this number of rows and also many-to-many relationships can catch up at this point.

You can write into Coda Support, using the “?” icon at the bottom right of any doc, and include the link to the doc you’re having issues with and we can check it out there. Feel free to mention me on the ticket and they can route it to me if you’d like.

For working offline, the catch is that the doc has to already be opened and that you can’t refresh until your connection picks back up and the edits are updates at the server level. But yes, you can use Coda offline. Before a flight I’d open the tabs I needed to work on, then I’d be able work on the plan, and the doc would update when I got to the hotel and was connected to the internet.

Good feedback on this topic and I appreciate the discussion!

Ben

1 Like

Are you sure you’re looking at the uncompressed size and not the compressed one? You want the sum of these two numbers:


@Charles_Jing now back to you :slight_smile:

Coda actually does work offline? (once the app successfully loads the doc file)

Once the app loads yes. And I wish it supported true offline, but it’s a bit more complicated than I originally said. Here’s what happens in details:

  • When you load “the file”, you actually load the latest snapshot saved on the server.
  • When you make changes to the doc, these are stored in your browser’s IndexedDB as a list of uncommitted operations.
  • When you’re online (or as soon as you’re online), these uncommitted operations get sent to the Coda servers and get applied on the server side. It’s not like your browser edits “the file” and sends it back completely. That simply wouldn’t work because Coda needs to support simultaneous editing by many users. Guess Docs work the same way.
  • Also if you’re using automations, that’s also going to be server-side runtime committing operations into your doc’s queue, and you’ll be receiving the diffs only when you’re online.
  • When you’re offline, these uncommitted operations stockpile in your browser and tend to slow down Coda. There were moments when Coda had issues and couldn’t store my edits, so what I saw the next time I loaded the doc was an old snapshot and a minute or two of all the ~6000 operations being applied to it in my tab one by one, to get the doc to the state I left it before the tab refresh.

So yeah, Coda not writing directly back to 0.json/1.json and reliance on ops that are applied to the file on server side -> means no efficient offline-only support.

Yeah, I’m sure:

Row count: 9,654
Doc size: 15.71 MB

1 Like

Hey Ben :wave::wave: thanks for jumping in!

After digging around, I realized the issue was I tried to pull one of your community templates and jam my workflow into it. Round peg, square hole situation; noob move, but it was important for learning. Had such clutter with columns, column buttons, and circular lookup formulas that I decided to start from scratch (with all the knowledge I built up about how Coda works) and everything is running much more smoothly on the fresh doc :slight_smile:

I sincerely hope I can get to that 9k row size with this new, more efficient schema.

I don’t agree that there’s “no efficient offline-only support” with that workflow. Github and Google Docs are two massively scaled platforms which solved this very problem a years ago. I could see an engineering group making the argument that “coda’s new primitives introduce engineering challenges that Git and Docs don’t have to deal with”, but I would make an argument that Coda’s new primitives mean it can handle merge issues MUCH more efficiently because of one truth: Github and Docs have to manage document level merge issues, while Coda can manage block-level merge issues.

I would guess that 90% of merge issues could be resolved automatically because people usually work on different pages/sections of the same doc. The trickier stuff would be edits to rows in tables because one table can have so many views and different users would operate on the same objects in their different views.

But imagine a workflow where many Users in a doc (or a single user on offline mobile and offline desktop) can merge block edits the way developers deal with merging branches on GitHub. Maybe this could be a responsibility reserved for the Doc Creator? #suggestionbox

This article from github’s blog in 2016 has an illustrative diagram:

I could see some calling this inelegant, some saying that Coda is designed for the non-coders, but that discussion could be resolved by segmenting Users with some sort of “technical achievements” skill tree the way video games hide “super features” from new players until they’ve mastered the basics. Coda already does this to some degree by masking use of API support via admin panel.

The alternative to never resolving this is never having true offline support. Which sucks because something will come out with true offline support from the get-go… and the brilliant thing here is Coda is a doc. I wonder what the overlap would look like of Users/Teams who “desperately crave offline support” and “could not handle the technical or mental responsibility of managing merge issues”

With great power… comes great responsibility… :fist::sunglasses:

Right, but there’s still a difference between version control systems and Coda. In simplified form:

  • With VCS, you have previous file snapshot, you do the changes that get saved into your “working set” file, and VCS calculates a diff at the time of commit. Commits happen once in a while (usually ~once in an hour to once in a few days).
  • With Coda (and Docs too, I believe), the process is the other way. There’s no local working set file to save changes to. Every change is its own small patch. There can be hundreds of such patches (operations) per minute. When you’re offline, these patches accumulate in your browser, and when you’re back online, they get sent to the server. This is how e.g. Firebase Firestore works (that’s why I originally thought Coda was powered by a Firebase-like cloud database; the row IDs also resemble Firebase row IDs a lot, and the mechanism is identical). And while Coda could technically work offline indefinitely, that would mean that you’ll always have an old snapshot and a huge set of patches to apply each time you open it. TL;DR I can totally see how data integrity could be compromised with this approach.

Also mind that the files VCS’ work with are primarily small (several KB) plaintext source code files. Binary files (icons etc) are usually just replaced entirely. Coda is a mix of plaintext and binary data, meaning resolving diffs would be pretty hard. A row is encoded as binary data entirely, meaning a patch from User A would overwrite the changes User B just made to the same row but different column.

And you cannot just swap one design with another that easily. The Git approach (working set + calculating diff at commit time) wouldn’t allow for robust collaboration. It is actually quite often that multiple people work on the same section at the same time (e.g. fill in neighboring cells of the same table).

Hi everyone! We’ve made a bunch of performance improvements lately, including a host of calculation improvements that should help your tables and formulas load much faster. You can learn more here: Improvements to Calculation Performance

@Angad
any idea if your performance has improved for tables with 10+ rows? i have a doc with table 10+ rows and lots of formulas with Filters connecting other tables. the overall doc has become almost unresponsive, (takes many minutes to respond or sometime doesn’t respond at all)

Hi @Manoj_Gupta_Engrg-SW_IN,

First question is do you mean 10,000+ rows or is this really an issue that you’re seeing with 10 rows?

If it’s with just 10 rows, I would think there must be a fairly inefficient iterative formula somewhere in the doc causing this. Even known calculation intensive solutions, like previous row formulas, don’t show themselves as slow at just 10 rows.

There is a screenshot in this post that will show how you can run a calculations analyzer to see what formulas may be taking a good bit of time.

If that doesn’t give you any hints, it might be worth writing into the Support Team and sharing a link to the doc so we can check it out there.

As for improvements to performance, we have see big dips in both doc size and time it takes to run calculations across nearly all docs on our system. That doesn’t mean that this affects every single formula though. Coda has infinite possibilities and it’s still possible to write formulas that are time consuming.