Row and Column Analysis

I have seen and participated in discussions about similar topics in the past, but want to hit this question directly. When you receive a classic excel-styled table where you want to run an analysis on the rows and the columns, what’s the most straightforward way to do so in Coda?

For example, if I have a table of fruits and people, and each person scores each fruit, what is the easiest way to run an analysis on the scores of each individual fruit as well as each individual person.

When importing such a fruit-person table into Coda, I may have this:

Running analysis on any particular fruit is straightforward. e.g. =FruitScore.Apple.Average() tells me that the average score for Apples is 3.6.

The less-straightforward part is if I want to also analyze how, for example, Bill scored the Fruits. There are of course ways to analyze Bill’s scoring, but they are not as straightforward as a column analysis. The formulas become significantly longer and/or more complex.

I recognize that this is rooted in the design of Coda, and one of the distinctions between Coda and Excel/Sheets. I’ve read many polite and condescending comments to this end. But, it would be great to have an intuitive solution.

The most obvious solution that comes to my mind would be a transpose; i.e. make a new (non-view/non-connected) table with the columns and rows swapped. It would open the door to data integrity issues, but baby steps.

You can currently transpose data by exporting the data to Excel/Sheets and transposing there, but I advocate for a native solution. You can also manually create a transposed table, but again that seems like something you shouldn’t have to do.

Another approach is to create a new column that averages out the values in each row. For example:

While this approach is helpful, it does not account for any new fruit columns that may be added later.

Big fan and advocate of Coda, and interested to hear other thoughts and approaches.

1 Like

Hello @Joseph_B,

I totally understand the frustration! I work with a lot of financial schedules and estimates which are typically (from legacy activities) setup in similar data schema.

As part of my work to make past information more flexible, I have “de-pivoted” / “flattened” a lot of excel tables into something that is less table and more row-focused.

This can be done automatically within excel prior to data import, and doing so allows for a lot more flexibility once you are inside Coda.

I know this isn’t exactly what you are asking, but it is one route that has made my life easier when approaching the same problem, thanks!

2 Likes

Thanks for the share. Appreciate the alternate approaches to this issue.

@Al_Chen_Coda If you don’t mind, tagging you for your expert insight into the intersection of Coda and Excel/Sheets.

1 Like

@mallika or any Codans have thoughts on this? Would love insights on approaches and best practices.

1 Like

Hello. Checking in again for best practices or insights on this issue. Appreciate any thoughts.

Hi @Joseph_B, apologies for the delayed reply here and missing the notification when you tagged me in this thread. The short answer is that there is no “right” way of doing this in Coda, and the reasons for this (as you pointed out) is due to the way Coda was designed.

Potential Solution #1 Pros/Cons
Using the “FruitScore” table, the simplest way to get Bill’s score on `Banana` using a formula on the canvas would be: `=FruitScore.filter(Name="Bill").Banana`. You could then give this formula a name and then reference this score throughout the doc.

This does not feel very “scalable” because what if you want to average that score with Carrie and Daniel’s scores? You would then have to write individual formulas for them and have another formula that perhaps averages all those scores.

Compared with Excel/Google Sheets, the analysis feels a bit simpler since I can create my own list of names and use a `VLOOKUP` formula to grab the scores like so:

Potential Solution #2 with Pros/Cons
If we want to get the average score of all fruits for Bill, the `Average Fruit Score` column would not be “scalable” because, as you mentioned, it would not account for any new columns. As an aside, I think if you know the number of columns will stay relatively fixed or will not grow by much over the coming weeks/months, then I think manually fixing the formula here is acceptable, but still not “scalable” in terms of building an error-free model.

In Google Sheets, formulas will automatically detect new columns/rows inserted and the cell reference will automatically stretch and shrink to properly account for the new columns added:

This definitely feels more “scalable” if you find yourself adding many columns to the table whereas the Coda solution requires you to manually add in the column to the formula (side note: this is something we are looking to improve for future releases).

Potential Solution #3 Pros/Cons
The transposing solution would work if you knew that there wouldn’t be many more people you want to add to the list and you are only interested in doing analysis on the average scores of the fruits. I would argue that this is not a scalable solution given the data set may change, but could be a completely fine solution if the list of names will stay fixed throughout the analysis.

The database approach
As @Bobby_Ritter pointed out with his example, flattening your data is the right approach. Perhaps there should be a native feature to flatten your data out like @Bobby_Ritter has done with his embedded doc (will add this to the product backlog!). The difficulty here is that your list of data may require you to do a lot of copying/pasting to get the proper flattened structure you’re looking for. If you’re coming from Excel, the Get and Transform feature is one way to flatten or you could use a custom macro like this to get the data the way you need.

Flattening your data can be cumbersome and time-consuming, but I think the underlying reasoning for doing this is important to call out.

Analysis vs. Tools/Models
Coming from Excel, Coda definitely challenges your pre-conceptions about what it means to do analysis. It blurs the line in that the platform feels like it could be used for analysis, but other times (as in your original post) it feels too cumbersome to pull simple numbers like the average score across all fruits.

When you are doing analysis in Excel, it’s typically very messy and you have raw data in one sheet, lookups in another sheet, and hard-coded assumptions in another sheet. The goal of the analysis is simply to find the answer to a business question and you may not care about building an error-free spreadsheet since it’s a one-time analysis. The sole goal of that Excel file is to give you a number and you most likely wont revisit that file ever again.

Then you have models that may give you options to do a sensitivity analysis, and formulas are built in such a way that they account for new data (whether it’s from new rows or columns). It’s one step above doing an analysis since you know you’ll be revisiting that Excel file every week or month and adding inputs to see what the model outputs. You might make the model less error-prone by putting in checks to ensure numbers balance correctly, or friendly error messages if your teammates enters in a date instead of a number in a cell.

Then we have tools where you care about the data model and the reusability of the tool for anyone on your team or even the public. Unlike the Excel file used for doing data analysis, the tool cannot be “messy” and have random data strewn around the file. The integrity of the data and the structure of the tables are important which is why Coda’s tables push for the database structure rather than arbitrary columns. Building a tool in Coda is also the most scalable solution since doing aggregate analysis across
fruits and names is only possible if you have flattened data.

I know I didn’t answer your original question directly, but the question hints at some underlying core principles about why Coda was designed this way. I would also take a look at @Matthew_Hudson’s blog post here for even more detail.

3 Likes

Thank you for the fantastic breakdown and insight @Al_Chen_Coda.

Reading through, I’m intrigued by future potential of Coda automatically recognizing a new column being added to a set of existing columns.

Until that time, will continue with the approaches discussed in this thread.

1 Like