I often find myself wanting a single button that will delete all columns with duplicate values.
For example, imagine I have this table:
I want to remove either “Ben H” or “Ben Howit” because they have the same email address. But I don’t want to remove both.
How can I do this in Coda? Well, I’m not the first to ask:
- Removing Duplicates by Email using a button
- Highlight duplicates in a column using a conditional format
- How to find out duplicate value on the row and delete it. Need help! - #4 by Paul_Danyliuk
To varying degrees, these don’t work. One of my criteria is I don’t want to have to add a column to my existing table. So no use of rowNumbers like in Paul’s answer
Note: Why avoid table based checks?
Searching for duplicates in a table is rather computation intensive. It can really slow down a doc if you have tons of rows.
The main issue is that each row has to search all previous rows to see if it has a duplicate value. (In computer science speak we’d call this a O(n^2)
algorithm, which means basically “slow”)
There’s no way to avoid running the algorithm if you’re trying to get rid of duplicates, but the downside of the table approach is that each time you add a row to your table your table has to rerun the entire calculation.
That can really slow you down. By putting it in a separate button, the calculation is only triggered when you hit the button. Temporarily slowing you down, sure, but only when the button is pressed, not all the time.
Thanks to WithName, it’s now possible to do all of this in one formula.
Formula
Here’s the formula. Just replace all instances of
-
Table
with your table name (shows up in line 2 and 3). In our case this isEmails
-
ColumnWithDuplicateValues
with the name of the column that contains duplicates you want to delete (shows up in row 2, 3, 6, and 8). In our case this isEmail
WithName(
Table.Filter(WithName([ColumnWithDuplicateValues], Value,
Table.CountIf([ColumnWithDuplicateValues] = Value) > 1
)), AllDuplicates,
WithName(
AllDuplicates.[ColumnWithDuplicateValues].Unique().FormulaMap(
WithName(CurrentValue, UniqueValue,
AllDuplicates.Filter(CurrentValue.[ColumnWithDuplicateValues]=UniqueValue).First()
)
),
FirstInstances,
AllDuplicates.Filter(FirstInstances.Contains(CurrentValue).Not())
)
).DeleteRows()
If you just want to list all the duplicate rows after the first occurring duplicate row, remove the .DeleteRows()
from the end.
Here’s what that output looks like:
It’s pulling out the second instance of benh@example.com and the second and third instance of test@test.com
By putting this formula in a button, and leaving the .DeleteRows()
at the end, we would delete these three rows.
How Does it Work?
Glad you asked! I’ll put this in a hidden section, since many people might be satisfied with having the formula.
AllDuplicates
First we want to find all the duplicate rows in our table. This formula does that.
WithName(
Table.Filter(WithName([ColumnWithDuplicateValues], Value,
Table.CountIf([ColumnWithDuplicateValues] = Value) > 1
)), AllDuplicates
...
Here’s what AllDuplicates
contains:
This is a nice formula to have in your back pocket anyway.
FirstInstances
Next we need to identify all the places where a value shows up for the first time. Remember, we don’t want to delete the first instances, just all the rest!
...
WithName(
AllDuplicates.[ColumnWithDuplicateValues].Unique().FormulaMap(
WithName(CurrentValue, UniqueValue,
AllDuplicates.Filter(CurrentValue.[ColumnWithDuplicateValues]=UniqueValue).First()
)
),
FirstInstances,
...
Here’s what FirstInstances
contains:
Only the Duplicates
Finally, we want to remove those first instances (the ones in FirstInstances
) from the list of duplicates we found (the ones in AllDuplicates
). Here’s how we do that:
...
AllDuplicates.Filter(FirstInstances.Contains(CurrentValue).Not())
And that’s it! That’s how we get our beautiful result: