Button to Delete Duplicates

I often find myself wanting a single button that will delete all columns with duplicate values.

For example, imagine I have this table:
image

I want to remove either “Ben H” or “Ben Howit” because they have the same email address. But I don’t want to remove both.

How can I do this in Coda? Well, I’m not the first to ask:

To varying degrees, these don’t work. One of my criteria is I don’t want to have to add a column to my existing table. So no use of rowNumbers like in Paul’s answer

Note: Why avoid table based checks?

Searching for duplicates in a table is rather computation intensive. It can really slow down a doc if you have tons of rows.

The main issue is that each row has to search all previous rows to see if it has a duplicate value. (In computer science speak we’d call this a O(n^2) algorithm, which means basically “slow”)

There’s no way to avoid running the algorithm if you’re trying to get rid of duplicates, but the downside of the table approach is that each time you add a row to your table your table has to rerun the entire calculation.

That can really slow you down. By putting it in a separate button, the calculation is only triggered when you hit the button. Temporarily slowing you down, sure, but only when the button is pressed, not all the time.

Thanks to WithName, it’s now possible to do all of this in one formula.

Formula

Here’s the formula. Just replace all instances of

  1. Table with your table name (shows up in line 2 and 3). In our case this is Emails
  2. ColumnWithDuplicateValues with the name of the column that contains duplicates you want to delete (shows up in row 2, 3, 6, and 8). In our case this is Email
WithName(
  Table.Filter(WithName([ColumnWithDuplicateValues], Value,
    Table.CountIf([ColumnWithDuplicateValues] = Value) > 1
  )), AllDuplicates,
  WithName(
    AllDuplicates.[ColumnWithDuplicateValues].Unique().FormulaMap(
      WithName(CurrentValue, UniqueValue, 
        AllDuplicates.Filter(CurrentValue.[ColumnWithDuplicateValues]=UniqueValue).First()
      )
    ),
    FirstInstances,
    AllDuplicates.Filter(FirstInstances.Contains(CurrentValue).Not())
  )
).DeleteRows()

If you just want to list all the duplicate rows after the first occurring duplicate row, remove the .DeleteRows() from the end.

Here’s what that output looks like:

It’s pulling out the second instance of benh@example.com and the second and third instance of test@test.com

By putting this formula in a button, and leaving the .DeleteRows() at the end, we would delete these three rows.

How Does it Work?

Glad you asked! I’ll put this in a hidden section, since many people might be satisfied with having the formula.

AllDuplicates

First we want to find all the duplicate rows in our table. This formula does that.

WithName(
  Table.Filter(WithName([ColumnWithDuplicateValues], Value,
    Table.CountIf([ColumnWithDuplicateValues] = Value) > 1
  )), AllDuplicates
...

Here’s what AllDuplicates contains:

This is a nice formula to have in your back pocket anyway.

FirstInstances

Next we need to identify all the places where a value shows up for the first time. Remember, we don’t want to delete the first instances, just all the rest!

...
  WithName(
    AllDuplicates.[ColumnWithDuplicateValues].Unique().FormulaMap(
      WithName(CurrentValue, UniqueValue, 
        AllDuplicates.Filter(CurrentValue.[ColumnWithDuplicateValues]=UniqueValue).First()
      )
    ),
    FirstInstances,
...

Here’s what FirstInstances contains:

Only the Duplicates

Finally, we want to remove those first instances (the ones in FirstInstances) from the list of duplicates we found (the ones in AllDuplicates). Here’s how we do that:

...
  AllDuplicates.Filter(FirstInstances.Contains(CurrentValue).Not())

And that’s it! That’s how we get our beautiful result:
image

9 Likes

Wow. Really thorough Post! So great how you not only give a Solution but explain all the backgrounds.
Thank you

1 Like