Filtering out all "doublons" from a table

Hey all,

this is my first post, sorry if it is a bit dumb. I tried to find if a similar question was posted, but couldn’t find any… so, here it goes.

I have a table representing tasks. A task have a sprint reference (“text”), can be assigned to someone (lookup from the “people” table), and finally can have an estimated effort in days (positive number). eg.
Task 1 / Sprint alpha / Laurent / 2
Task 2 / Sprint alpha / Laurent / 3
Task 3 / Sprint alpha / Nick / 0.5
Task 4 / Sprint alpha 2 / Laurent / 1
Task 5 / Sprint alpha 2 / Chris / 5
Task 6 / Sprint alpha 2 / Chris / 3
Task 7 / Sprint beta / Chris / 2

I’d like to display the sum of each people’s estimated effort on each sprint. eg.
Sprint alpha / Laurent / 5
Sprint alpha / Nick / 0.5
Sprint alpha 2 / Laurent / 1
Sprint alpha 2 / Chris / 8
Sprint beta / Chris / 2

I successfully computed the sum: I added a new column, and now I have the sum for each row, but my issue is that now I have multiple rows that are identical except for the task name (that I don’t need anymore).
Sprint alpha / Laurent / 5 (hidden columns = Task 1 / effort 2)
Sprint alpha / Laurent / 5 (hidden columns = Task 2 / effort 3)
Here I want to have only one of them. That seems like a nightmare, or I’m simply missing something … how do you filter out “doublons” within a table (going from 7 rows down to 5 in my example)? Instead of a view, should I use a second table that would be “populated” from my task table? But I have no clue how to do that :frowning:

edit: found out about formulamap + addrow. Thats seems a bit like overkill, and will force users to click on a button instead of having an always up-to-date view. Would that be the only way to achieve my goals?

Help on that matter would be much appreciated!

Laurent

Hey! Welcome to the community!

Multiple ways to approach this: the correct one, the quick one, and the solid one :slight_smile:

The correct one: introduce a separate table specifically for the per sprint, per user level of data (i.e. each row holds a summary for one user and one sprint). Cons: you’ll have to ensure that this table always lists all combinations of people and sprints, i.e. have some logic that would add missing rows, e.g. a button to click once in a while. Pros: you’ll have a fully usable data table that you can further calculate on without jumping through hoops.

The quick one: just make a view on your original table where you group on the left on sprint and on users. Set up the view to show a summary for the effort column as “Sum”. Collapse all groups — now you have a view that lists all the sums for unique sprint-user combinations (grouping takes care of that):
image
^ imagine those are sprints

Pros: no formulas to write. Cons: cannot copy-paste this into Sheets etc, and you cannot use these sums in any further formulas. You can only look at it.

The solid one: in your original table for each task determine whether it’s the first occurrence of this sprint and person. Then filter based on that. Pros: no need to maintain a separate table; won’t risk running into inconsistency. Cons: there’s still going to be duplicated data, just hidden. You can further improve the formula to only calculate the sum if it’s the first occurrence, else output blank value. Also keep in mind that calculating whether it’s a “doublon” is a relatively expensive formula and will become increasingly slower as your table of tasks grows, so that’s a con too.

I did this here:

See the doc attached in the first message, page DB Leads, for implementation.

Personally, in your case I’d go with option three unless you expect your table to grow to thousands of tasks over time. In that case you’ll have to resort to stamping (clicking a button to capture values rather than calculating them with a live formula over the whole column) eventually.

Cheers!

P.S. Not dumb at all :slight_smile: A super valid use case with multiple approaches to consider, each with its own pros and cons.

Thank you SO MUCH for that :wink: