Size Limits & Doc Design Best Practices

Hey @James_Eades I empathize with what you have written and myself have butted up against many of the limits. At the end of the day you have to realize that while that seems like a huge limitation for the app, and in a sense it is, there is still so much that can be done at the <10k row space. Take a look at the template gallery. People can organize their household, small business, meetings, processes, etc.

You, like many of us in the community here, see the massive potential to use no-code apps to just focus on the ā€˜buildingā€™ of our creative pursuit rather than dealing with the infrastructure (servers, security, etc). However, as we write this in May 2020, if you want to have a massively scalable app (and I know that your 5 employees @ 40 notes a day doesnā€™t seem 'massively scalable, but in this sense it is) you still have to deal with all that stuff that none of us here want to deal with (servers, security, etc).

I was somewhat early to Coda and dealt with many of these issues, and I can say Coda has been very conscientious and made a huge amount of progress. Things just work better over time as they slipstream improvements into the app. So it is getting better over time, and will continue to do so.

There is a LOT you can do with cross doc to address the limitations, but thereā€™s a high cost in scheming, testing and implementation. I too have built a to-do manager custom for my team and once we hit ~3,000 rows it wasnā€™t much fun to use any more. I built an archiver:

that stores 30 days of data (for reporting) in the functional app and then every night moves the 31+ data out and the app is again snappy and fun. It was tedious to develop and test the cross-doc, but definitely worth it and made me better at computer science and coda.

So, if youā€™re creative and patient you can develop work arounds until Coda builds the infrastructure to scale (and probably the accompanying business model) which I am confident they will. Coda has confirmed to me several times that they are interested in allowing Makers to sell/license their docs/apps to third parties which will inherently bring the scaling issues, so it is de facto in the roadmap to eventually address.

My advice would be to work around it for now (it also will make you focus on efficiency and leanness which is a great practice to have anyways) and eventually, when youā€™re app is nice and polished and ready for the masses, hopefully Coda will be too!

Hey @James_Eades, welcome to the community and thanks for the mention!

I donā€™t know the whole picture of yours, but I believe there are ways to make Coda work bearably with over 20,000 rows (I have to believe ā€” otherwise my work for my recent client would be a flop)

Hereā€™s what you need to aim for:

  • Mostly scalar inputs in columns (i.e. numbers, strings, checkboxes, arrays of strings ā€” NOT references or formulas)
  • No formulas that filter over the same table (directly or indirectly)
  • No buttons in this table. Build a separate single-row table to host a button and iterate instead
  • No conditional formatting ā€” use something else to distinguish, e.g. a dedicated Status formula column with emoji
  • No unnecessary formatted text ā€” flatten Format()s and Concatenate()s with .ToText()

Basically, these large tables should be plain fact tables (i.e. where one row registers one fact, e.g. one transaction of X money from user with ID=Y to user with ID=Z)


but yeah, sometimes it could be just better to use something else, like a traditional database, when you expect your app to work at scale

What exactly do you mean?

For example, in table Tasks those would be things like:

  • Finding a previous/next task for each task
  • Linking to a Project, which looks up all Tasks for that project, then somehow depending on that

There are lots of scenarios where I donā€™t have logical explanation, but they resulted in doc size blowing up. E.g. I had two tables: Lessons and Syllabi, each Lesson would link to a Syllabus, and each Syllabus would aggregate all Lessons via a simple lookup formula. Yet these two turned out to 1) not cross-doc at all, even with the column hidden, and 2) take an unreasonable amount of data in the doc file. Still not sure why, but the circular dependency caused that somehow (as Coda support told me). This is purely empirical for now; maybe one day Iā€™ll get to the bottom of it. I suspected it was because references to rows from other tables mightā€™ve pulled related data such as references back to this table, which in turn pulled in references to that other table and so on for quite a few levels of nesting, but I checked for that and it doesnā€™t seem to be the case.

Well, certainly Coda can handle 20,000 records and do it quite well. I referenced 20,000 because that was the number of records I added when I finally crashed my docā€¦ and it crashed pretty hard. I was attempting to use Coda a simpler tool than a self-hosted Vtiger CRM, so I was looking to store companies, contacts, sales rep notes, quotes, orders, invoices, and a little more. When I saw ā€œunlimitedā€ on the pricing screen I took it to meanā€¦ unlimited:

Alas, once my doc crashed after a couple of weeks working on it I then dug under the hood and found your posts about how Coda actually works. I had assumed there was a bit more of a beefy infrastructure to support the claim of unlimited but I was misled like so many others. Unfortunately, what I was working on would have many circular dependencies in data:

Prospects (20k)
Customers (5k)
Contacts - related to customers and prospects (30k)
Quotes/Orders/Invoices - 900 orders might have 4,500 product lines
Products - related to Quotes/Orders/Invoices (15k)
Activites - related to customers (48k)

I am very doubtful Coda will be able to handle this amount of data based on what I am seeing from the community here.Unfortunately a lot of the data is dependent on other parts - orders have customers and products. I could certainly prune certain parts but at the cost of it being worth switching the company to coda from the slow but working crm with a mySQL database.

I am fascinated by this though and wish I knew more about the actual structure behind the scenes. I am a bit confused by some decisions and the way certain aspects are implemented - I program for fun and have generally dealt with massive amounts of data so I was really surprised by the limitation. If it is a limitation in the size of a .json I wonder why each page isnā€™t itā€™s own .json or why there is no mongodb implementation in the background to better handle things.

@Paul_Danyliuk thank you very much for the clarification - I will see what I can do with those tips (some have been implemented already on my rework). I cannot wait to see you launch your site!

misled like so many others

Well, nothing is truly unlimited :slight_smile: Conventional databases have some constraints too.

Coda is catering to a wide audience. Only a fraction of it is power users. For many people and the majority of use cases, ā€œunlimitedā€ indeed feels like unlimited.

Thatā€™s why I said in some other thread that before any commitment, you MUST fully realize the fact that Coda is a doc. Not an app builder, not a cloud database ā€” a doc. More user-friendly and stricter about data structure than spreadsheets, but still technically in the same category. So if your project is too big for tracking it in a single Excel file, then itā€™s very likely that itā€™s also too big for Coda.

The lack of the database behind the scenes must be a design choice. Iā€™m not on the Coda team, so I can only speculate based on my experience in software engineering, and Iā€™m not ready to tell why I think they went this way. But it must have been a conscious design choice at the time. Heck, I even kinda understand why they did the buttons the way they did (i.e. so that each row has its own button instance and not one declaration for the entire column).


Regarding your dataset numbers. Are those all required to be available at any time? Or the majority of it is historical data that can be tucked away / no need to recalculate it constantly? Because, frankly, if youā€™re dealing with 5k ongoing customers and processing lots of orders, an effort to hire a team of coders and build a bespoke piece of software sounds like a viable option to me. Not meaning to overstep my boundaries ā€” just an outsiderā€™s perspective on pain (value) vs cost to solve.


P.P.S. If I were hired to solve the kind of problem that you have with the volumes, my first instinct would be just that ā€” figure out if we could split the data into the ā€œhistoricalā€ part (where nothing is linked and everything is scalar like I described above, and daily/monthly summaries for statistics are also stored as scalars in some Snapshots table) and the ā€œworking setā€ where live formulas and references are okay. With the capability to move data between two sets, of course (i.e. archive a customer, then ā€œresurrectā€ the customer back into the working set when required)

3 Likes

Thatā€™s why I said in some other thread that before any commitment, you MUST fully realize the fact that Coda is a doc . Not an app builder, not a cloud database ā€” a doc.

This was the most helpful piece of advice of how to look at Coda at the moment Paul. You had given this to me earlier as well.

Instead of looking to replace things like Jira, big CRMā€™s etc, I now look at it to primarily replace other documents we normally write in things like Google Docs.

And often we do things like very simple todo list or trackers in Excel or a Google Spreadsheet. For small projects it can be much better than all the overhead of a heavy enterprise product.

Weā€™re still using coda for CRM and project tracking, but on a small scale. If our CRM needs grew to enterprise levels, we would likely switch to a product focused around that.

Then you certainly must understand the nature of tradeoffs.

In every product, there are advantages that lay ahead and down the pathway for each design choice. I cannot speak for the Codans, but based on my tests and evaluations, the development team has made some very clear decisions that remain true to the compass-heading of their product vision. Sometimes, these design choices are suboptimal in the true essence of cutting-edge ā€œdatabase technologyā€. Indeed, I can find a half-dozen suboptimal design choices - donā€™t get me started on data visualizations. :slight_smile:

But, these choices have allowed them to carve out a niche that clearly advances the science of ā€œdocumentsā€; a realm that needed a little disruption since thereā€™s been almost zero advances in the concept of digital documents since the advent of early pioneers Wang, Word Perfect, Wordstar, and Microsoft Word.

Data, seamlessly embedded and perhaps piped into documents, filtered, rendered, ordered, and charted is certainly part of that disruptive equation. However, an embedded ā€œdatabaseā€ is not likely central to the spirit of their objective.

My assessment is that Coda is a platform for building document-centric point-solutions, not a platform for building ā€œappsā€ in the most common sense of the term.

Yes - that is the pill I have now swallowed. :sweat_smile: Unfortunately the marketing side of Coda does make some more lofty claims regarding capabilities which is what led me to this post. In all honesty I do believe the wording on the pricing page should be changed for legal reasonsā€¦ Looking through my original desired use-case Coda will not be the right tool, so now I am exploring some other options.

Most of the said data will be moved over to a big-name enterprise system in the future, but that is on hold because of the impacts of COVID-19 - I was looking for a band-aid in the beginning.

I do have a few other (much smaller) uses for the app so I will most-likely continue to use it. If Codans could solve the riddle of doc size I think it would open up a whole new market and I am sure there would be a lot more money to be had with those capabilities. Similar services also have some hard limits on data complexity and size. Would love to see this change with Coda because it really is a very easy to use and well designed tool outside of its limitations. We will see!

Yes, it does and there have been others who have pointed this out. Butā€¦ all companies make lofty claims; itā€™s our duty to scratch the surface to peer into the truest capacity of any tech.

2 Likes

Glad I read this thread before I spent too much time working on a new ā€œappā€ idea instead of a ā€œdocā€,

I would have likely hit doc limitations since I was hoping to use Coda to replace other ā€œappsā€ I use.

Any suggestions from the experts in this forum on a similar product(s) other than AirTable that I can use to create replacement ā€œapp(s)ā€ [CRM, PM, etc.]?

Everything with a similar feel as coda has a similar limitation. Unless it allows you to work with a traditional database you will probablyrun into these types of barriers. Even with traditional databases you can start to have performance issues if things arenā€™t well maintained and your code isnā€™t efficient.

As @Paul_Danyliuk mentioned it would likely be best to get something custom made for your use case if size limitation is a potential issue. There are some no/low code app building sites out there but I imagine they have their own limitations.

Hi Mike, I saw that your line of business is internet marketing. I created this playbook with a few digital agencies who are using Coda for parts (or most) of their business: https://coda.io/@john-scrugham/digital-agency-playbook.

The areas mostly correlate with the apps that they would replace. For example, Clients shows how to bring your CRM into Coda (though many agencies will gravitate towards Hubspot as itā€™s Free and pre-built).

3 Likes

Hi, @Paul_Danyliuk.
Could you please elaborate on how you can archive and ā€œresurrectā€ data.
Iā€™m trying to use Crossdoc for that but for me itā€™s a one way street, specially with references. We canā€™t get them back.
In broad terms, what is your schema like?

I never link cross-doc items by references, but by Row IDs, Row UIDs, or your internal record IDs instead. Those are simple scalar values (numbers or text), and I re-link back to actual references on the receiving side.

E.g. in this Cross-doc table Iā€™m importing Student UID and Class UID (red arrows; these are read from the Row object). Separately Iā€™m importing a table of Students and a table of Classes. Then in the actual Class and _Student columns (blue arrows) Iā€™m doing [IN Students].Filter(UID = thisRow.[Student UID]).First() and the same for the class.

This has multiple benefits:

  1. Iā€™m not hard-dependent on row references, so if I e.g. have to delete and re-add the row and itā€™s a different row now, I can more easily restore its old ID back (e.g. make a manual override column) and everything will be linked correctly again.

  2. Most importantly, I donā€™t have to import base tables from the same doc like with row references. Hereā€™s what I mean:

    • Thereā€™s a doc with Students table.
    • Thereā€™s a doc with Classes table where the Students table is imported. Classes are linked to Students.
    • Thereā€™s now a third doc where I want to import Classes along with Students.

    With reference linking, Iā€™d have to import BOTH the Classes and the Students tables from the Classes doc. Thatā€™s because each time you import a table, it becomes like a new separate table on its own, with its own table and row IDs (and thatā€™s logical because you can add more columns onto it). Now if this had to propagate even further, thatā€™d mean that eventually in some terminal doc youā€™d have a cross-doc table thatā€™s a cross-doc of a cross-doc of a cross-doc of the original one.

    With scalar ID linking though, I can always import tables from their respective source-of-truth docs and link together on the receiving side. And those donā€™t have to be base tables (like cross-doc references now enforce) but they can be views.

8 Likes

Wow. This is really cool. Thank you
Iā€™m trying to replicate your schema here. I hope Iā€™m getting it right.

Can anyone working with large-ish datasets in Coda share their current experiences? Coda boasts performance improvements in their emails, and I wonder how this translates to these kinds of workflows.

@Danila_Sentyabov If your dataset takes more than 5 second to be copied from excel to coda itā€™s not a db to use on coda, for what i got till now more than 5000 rows becomes just unusable in comparison to smaller db.
Some says rows are not the critical limit but i disagree, also with 10 columns the doc become unusableā€¦

Off course iā€™m using those on coda anyway but i would not recommend anyone to work on those conditionsā€¦
For now coda is fine with small and not growing db :slight_smile:

With small i mean mini, what we consider a small db in excel become huge in coda, more or less all of the time, iā€™ve had stopped trying to consider rows and columns because itā€™s really not that important, if you apply grouping itā€™s gonna become 5x slower, if you filter it, 3x slower, all of this while pressing the ā€œwaitā€¦ā€ button on chrome while the doc is not respondingā€¦

So my answer is as april 2021 is no, coda is not designed to manage normal dataset, and if a small one is used there is not a lot of space to grow it

For anyone who think in a different way i have some simple db that you can have fun in trying to work on them, i can send you pure frustation in a .csv :slight_smile:

P.s. maybe and if the performance will improve 50x (not 0.3% faster, lovely result but not game changer) we will be able to use those, but this would mean a complete rewrite of how coda work, so itā€™s probably easier to build another oneā€¦

2 Likes

Thanks! Thatā€™s what I thought.
Coda is very promising, and is still immensely useful as a prototyping tool, and for publishing docs with a little bit of interactivity, but not as even a half-serious process management solution.

in this video Lane talks about opening up Coda for pack makers. This they can only do if the performance goes way beyond what we have today. So I am rather optimistic on this issue.

2 Likes