Button to Delete Duplicates

Paul_Danyliuk · November 16, 2021, 2:33pm

P.S. @Connor_McCormick good call on the column formula performance. Frankly, I don’t remember why my answers in those posts you linked to were specifically that. Something about the context, I guess — maybe I was just trying to help those people fix whatever they already had.

You fell to this tunnel vision yourself btw I’d argue that it should be a button to delete duplicates. IMO it shouldn’t be just a button — it should be a process. Because in real scenarios there’s more to it than just deleting repeated entries:

How do we determine which ones to keep and which ones to delete? Earlier one wins? Or the one that’s edited the most recently?
What if there’s valuable info in the duplicate we’re about to delete, e.g. fresher data or missing values from the original record?
What about row activity and comments? How that would affect which one we keep?
How do we deal with broken references? Some records in our DB could be linked to one copy and some to another.

But yes, having a column that actively recalculates “is duplicate” flag on any change is indeed gonna get horrendously slow. Three better ways:

Have a button on each row that will search for duplicates upon click only for that row
Have an external button like @Connor_McCormick suggests.
Implement an Iterator table like I usually do for this and like you see above.

The first solution is best suited if you want to check on duplicates as you’re entering data into an already big table. I did this for a client that had a table of 8000+ contacts and only wanted to check whether any of the recently added bunch could be overlapping with the existing ones. They would enter a name and an email and press a button inline in that table. It would find if anything is overlapping with the current row, and they could decide whether to keep filling out the row or discard it and move onto the next one.

The second solution is the fastest because it runs the whole check in memory in a single formula. The big disadvantage though is that you cannot debug what it’s doing: essentially if you see something calculated wrongly, you’ll have to guess-fix the formula and rerun the whole check again.

I often advocate for the third solution because it’s the most flexible:

You can split your process down into separate clear steps (but you don’t have to)
If split into steps, you can manually test each step (e.g. unit-test how your button that merges two contacts would do so on two arbitrary contacts that you’ve picked, not run the whole process)
You can build extra indication, e.g. a progress bar to show you how dupe checking is going
It’s good practice to separate logic from data

ezgif.com-gif-maker (18)

Topic		Replies	Views
Deduping pack - similar to Airtable Dedupe block Suggestion Box	0	516	February 1, 2020
Creating a button to identify & remove duplicate rows	5	3426	April 25, 2024
Highlight duplicates in a column using a conditional format Tips and Hacks	25	10605	September 8, 2022
[Offer] 🔥 A template that finds and merges duplicates (premium doc) Offers and Services	6	2428	November 30, 2021
😬 What was hard for you to discover in Coda?	40	7005	October 14, 2021

Button to Delete Duplicates

Related topics