Hej there,
just wanted to share and gather some input on enormous tables.
What started as an experiment on “when coda would break”, might become a tool we’ll use in our workflow.
I embarked on the journey to calculate solar PV yields for our construction projects. To do that, I had to download gigantic CSV tables from a website, recording solar radiation levels.
For every hour of a year
For 18 different directions into the sky
For 5 different tilt angles
Thats 788,400 rows…
I imported over 140k rows already with 5 numeric columns and a text column which I’m converting into a date-time using a formula. In addiction, there’s formula columns for the time, hour, date, weekday and weeknumber and two select columns for angle and tilt.
I absolutely did not expect coda to handle that. But besides some reloading of that doc I am very surprised on how well it works. Even conditional formatting works well.
I want to continue that experiment until I have a tool or something breaks.
Does anyone else have some tips and experiences to share with tables that go beyond the recommended 20k row-count?
What’s your current doc size? (use Doc Map). When it comes to big tables, you may find that if you want to create a view of that big table, the doc may crash and stay in this loop.
You should be able to select the size. Asking this because I have some docs with a lot of rows (300k rows) but very few columns, and the column types are mostly text, which are very light when calculating the total MB.
I would do this in Excel… Coda works, then slowly, and until it suddenly doesn’t…
You may end up with an unrecoverable crashed Doc.
I understand the Coda team has been working in improving performance and handling larger databases (Excel can handle over slightly over 1 Million rows for more than 17 years now…)
If everything works, I want it to be an interactive doc in which you can adjust roof angle, PV type,battery storage, primary heating source,… which will calculate the respective ROI. I feel like coding that in excel would be a nightmare.
It’s currently a separate doc - if it will be unrecoverable, I really don’t mind, since I don’t rely on it.
I’ll also check out if interpolating data is accurate enough so I don’t have to import so many different radiation-angles.
I see. @Konrad_Mertner, another idea would be to push all the heavy lifting calculations to a proper DB system. You could check if you can get away with Supabase’s PostgreSQL free tier (and connect it to Coda to sync a results table or get it straight to Excel). LLMs have gotten pretty good to write SQL for you so you could basically push all the lookups you are doing + formulas in Coda to a set of queries in Supabase
I wonder whats the most efficient way to use formulas and data entries in coda.
Regarding formulas:
I like to break down complex operations into smaller formulas and reference them later in a bigger formula. I’m wondering if it has any performance implications. Is it more efficient to have all the smaller formulas inside that bigger formula or does it have no effect?
Regarding data entries:
Is it more efficient to store data in a column or in a row? In my example above it would be a question of storing all the different radiation values for every angle inside one row that represents an hour or would it be better to store every radiation value for every hour in a separate row that has the additional value of the angle attached. In other words: Is it more efficient to store data in one dimension or two dimensions?
Formulas: if your concern is performance, you “theoretically” minimize overhead with a single formula. In practice, it would typically not be very important or even perceptible - breaking down into easier to understand and debug substeps usually pays off big time! Now, what you have to keep in mind is:
Different approaches to the same result (or a good enough result) may be computationally much more efficient (see Coda calculation and Coda Doc performance optimization considerations)
If you formula contains queries (aka filters to tables in Coda) or you are executing SQL, a little bit of query optimization can go a long way
Data entries: I personally prefer many rows over many columns and I think it usually allows for much more efficient formulas and flexibility.If I am understanding you correctly, my personal preference would be a table that would have the following columns:
Day of the year (I guess angles will vary by season!)
Time of the day
Angle
Radiation
With the above table columns you can store whatever combination of angles and radiation for a given hour of a given date (and would easily accomodate you providing this info in any time increments you may decide to, now or later)
If you start creating 1 column per time of the day for example, you lose that flexibility.
Also, say you want to calculate the cumulative radiation for a certain angle after a certain hour till midnight. Good luck writing that formula with one column per hour of the day, especially if time increments are less than 1h.
Ask your LLM of choice these questions and request it to explain with illustrative examples.
I feel that if one is used to working in Excel without being a pivot table user, one would generally choose to organize the data in many columns (because a formula to add many adjacent columns in excel is easy).
Working with pivot tables is the “intermediate” step between excel and working with a database, because thinking of rows as records and that a column can be used to identify an attribute (which is VERY handy when working with a pivot table).
Example: track avg sunlight hours every month per location
Table structure 1 columns (very intuitive and easy to understand by anyone)
Location, JanAvgSunlightHrs, FebAvgSunlightHrs, MarAvgSunlightHrs, etc… (13 columns)
but that’s good for 1 year… Should you keep adding 12 columns every year?
You could add a column for the year and repeat a new row for every year…
But, then, why not do the same for the months?
Table structure 2 columns (MUCH more flexible, but requires a little bit more of “database mindset” to query)
Location, Date, AvgSunlightHrs (3 columns)
Working with the above table in excel becomes more complicated in terms of formulas because now you have to do lookups, conditional logic… For the majority of real-life operations, it is so much simpler to keep things sorted by date and apply simple operations on ranges of adjacent cells, hence the majority of people will tend to organize the data like Table structure #1, also because excel nudges you to mix data organization and with how you want to see the data, you can’t create editable “views” of your data (only read-only views).
In my humble experience, someone who has ever (seriously) worked in Excel with pivot tables, slicers, and is comfortable with xlookup, filter, sort, groupby (note that excel has added most of these functions only relatively recently) will absolutely almost always prefer table structure #2… and have an easier time learning Coda Formula Language.