Button to Delete Duplicates

P.S. @Connor_McCormick good call on the column formula performance. Frankly, I don’t remember why my answers in those posts you linked to were specifically that. Something about the context, I guess — maybe I was just trying to help those people fix whatever they already had.

You fell to this tunnel vision yourself btw :wink: I’d argue that it should be a button to delete duplicates. IMO it shouldn’t be just a button — it should be a process. Because in real scenarios there’s more to it than just deleting repeated entries:

  • How do we determine which ones to keep and which ones to delete? Earlier one wins? Or the one that’s edited the most recently?
  • What if there’s valuable info in the duplicate we’re about to delete, e.g. fresher data or missing values from the original record?
  • What about row activity and comments? How that would affect which one we keep?
  • How do we deal with broken references? Some records in our DB could be linked to one copy and some to another.

But yes, having a column that actively recalculates “is duplicate” flag on any change is indeed gonna get horrendously slow. Three better ways:

  1. Have a button on each row that will search for duplicates upon click only for that row
  2. Have an external button like @Connor_McCormick suggests.
  3. Implement an Iterator table like I usually do for this and like you see above.

The first solution is best suited if you want to check on duplicates as you’re entering data into an already big table. I did this for a client that had a table of 8000+ contacts and only wanted to check whether any of the recently added bunch could be overlapping with the existing ones. They would enter a name and an email and press a button inline in that table. It would find if anything is overlapping with the current row, and they could decide whether to keep filling out the row or discard it and move onto the next one.

The second solution is the fastest because it runs the whole check in memory in a single formula. The big disadvantage though is that you cannot debug what it’s doing: essentially if you see something calculated wrongly, you’ll have to guess-fix the formula and rerun the whole check again.

I often advocate for the third solution because it’s the most flexible:

  • You can split your process down into separate clear steps (but you don’t have to)
  • If split into steps, you can manually test each step (e.g. unit-test how your button that merges two contacts would do so on two arbitrary contacts that you’ve picked, not run the whole process)
  • You can build extra indication, e.g. a progress bar to show you how dupe checking is going
  • It’s good practice to separate logic from data

ezgif.com-gif-maker (18)

3 Likes