How do I create a bulleted list from a table view that emulates SQL "Select Max .. GroupBy .. Having .."?

Hi, I’m hoping that there is a way to create a bulleted list of something that can be done easily in SQL:

select name, max(delay_days) max_delay_days
from view 
group by name
having max_delay_days > 14
order by name

And have this rendered as a bulleted list.

The raw data is here:

(Edited: Jun 7, 2020)

Projects

Name
Foo
Bar
Biz

Project Change Log

Id Name Original Date ModifiedTime DelayInDays =NetWorkingDays(thisRow.[Original Date],thisRow.ModifiedTime )
1 Foo 5/5/2020 1:00 PM 6/7/2020 7:53:08 AM 24
2 Bar 5/19/2020 1:00 PM 6/6/2020 1:04:25 PM 14
3 Biz 5/29/2020 1:00 PM 6/6/2020 1:04:27 PM 6
5 Bar 5/19/2020 1:00 PM 6/7/2020 7:52:20 PM 15

This is what I can do today but I’m hoping there is a more powerful way to use table views and lists:

=[Project Change Log].Sort(true,[Project Change Log].Name).FormulaMap(Concatenate(Name, " delayed by ", DelayInDays, " days")).BulletedList()

This produces:

* Bar delayed by 14 days
* Bar delayed by 15 days
* Biz delayed by 6 days
* Foo delayed by 24 days

You see Bar appears twice here with 14 and 15 days. I wanted to be able to show only the max for each name which would’ve been Bar: 15.

Thanks!

Hi @niwhsa_jayaprakash what isn’t Coda doing? I’m not sure what output you’re looking for and your final formula looks like it has a sophisticated output.

Hi, I’ve updated the description to hopefully, explain better what I was trying to do.

OK cool now I understand what you’re looking for. So this is a a powerful trick once you understand it. What you need to do is determine if the row in the Project Change Log is over 14 days. I do that by creating an array for each project based on the RowID() for each row, then look in that array to see if it is > 14:

if(thisRow.Delay>14,true,false)

I created this example to show you:

Hey @niwhsa_jayaprakash, welcome to the Community.

It is possible to do what you want with a single formula, but this will not be the Coda way. One key difference between SQL and Coda is that:

  • In SQL, you “make relations” of your data at query time (i.e. join the data, group the data etc. — or don’t.) At rest, your tables are simply collections of records with scalar data that are not related to each other. Yes, you can have foreign key constraints, but you still need to explicitly JOIN on those foreign keys to declare a relation at query time.

  • In Coda, you build relations at design time. This means you need e.g. to link Project Change Log entries to Projects, then calculate Project-level things in the Project table ahead of time, not at the moment of writing your final “query” formula.

So here’s what you need to do, Coda way:

  1. In your table Project Change Log, make “Name” not a text field but a lookup to your table Projects. Rename it to Project

  2. In your Projects table, make a column “Change log entries” where you’ll look up all entries with a formula:

    [Project Change Log].Filter(Project = thisRow)
    

    This is basically your “joining” at design time.

  3. Now in your Projects table, you can calculate max delay per project: make a new column with a formula:

    thisRow.[Change log entries].DelayInDays.Max()
    
  4. Now to output Projects whose delay is >= 14 days, you query the Projects table with a filter (and optionally render as a bulleted list):

    Projects.Filter(CurrentValue.[Delay max] >= 14).BulletedList()
    

I’m not going to put a single query formula approach here because I encourage learning good design, and also it’s inefficient. I discourage @Johg_Ananda’s approach as well: it’s useful and powerful in some scenarios, but I believe you should strive to do it the proper way with Projects and Log entries as two separate levels of data with their respective calculations (especially if you already have the Projects table that you keep up to date)

1 Like

I agree with @Paul_Danyliuk that his approach is ‘better design’, since your example @niwhsa_jayaprakash included a Projects table. There are cases where abstracting into another table may not make sense and my approach is useful as well. Maybe @Paul_Danyliuk could elaborate a bit from the computer science perspective of why his approach is more efficient :slight_smile:

It’s not so much from computer science perspective (although that as well) as from “proper data in proper places” perspective.

The trick to pull all associated records in a column of the same table, like here:


is useful when you cannot or don’t want to make a separate table of Projects, e.g. because of anticipated human error (e.g. there’s a person filling in this log by mindlessly copying data from Excel, and you anticipate they’ll forget to add missing Projects to another table). But with this approach you get:

  • quite some data duplication: lists will grow exponentially with each entry added with the same project name (+1 item in all rows of the same project; +1 row with as many items)
  • recalculation of the whole column each time a row is added, which will get exponentially slower over time

It’s okay for a small data set but it’s not scalable.

If you totally need to go with this approach, at least make it so that associated records are only collected once per group, since the end result will ignore repeated records anyway. Here’s an example where I used this approach:

(see the data table)

1 Like

Thanks @Paul_Danyliuk, I had already done the lookup between the 2 tables but did not show it in the example. I understand your approach and agree with your distinction between SQL and Coda.

However, I don’t think I understood how the Max() function does a max per Name group. Since the project log can contain the same name multiple times, I want the one with the “max delay per project”. Or have I missed something in your solution?

When you look up all log lines for each project, you can select Duration.Max() out of those, individually per project, based on looked up subsets.

Your project Bar will have two rows from the Log looked up. thisRow.[Change log entries].DelayInDays for that row will first dereference DelayInDays from those two rows (14 and 15), then .Max() will select the larger of those.