Highlight duplicates in a column using a conditional format

mallika · February 14, 2018, 11:39pm

Quick little conditional format that I often use to find the duplicates in a table -

The formula I have used here is
Table.Filter(Column.Contains(thisrow.Column).Count() >1

evan · February 15, 2018, 11:41pm

Nice @mallika! I definitely use this all the time for deduping (but usually put this into the filter directly so I have only the remaining results that need to be scrubbed; filter ends up being lookup([thistable], Name, thisrow.Name).count() >1.

cnr · February 17, 2018, 2:55am

The real question is what in the world are you making that requires that much Absinthe?

mallika · February 18, 2018, 9:58pm

Wouldn’t you like to know
-spending the weekend working on sanitized examples-

cnr · February 19, 2018, 3:28am

The good news is I’m pretty sure most alcohols can be used as antibacterials

tom · May 1, 2018, 12:38am

wondering if you can expand on this…

I’m trying to do this as a formula in the document that counts all the different names that are list more than once.

I can’t quite figure it out… can it be done @mallika?

mallika · May 3, 2018, 8:29pm

Hi Tom, sorry I was traveling for a bit. Did you figure this one out?

tom · May 3, 2018, 8:45pm

sure didn’t! have been working on other things! If you have any insight that would be great though! hope you had an awesome trip

mallika · May 3, 2018, 9:46pm

The formula above should work for you - give you the number of times a a name has been listed and you can filter to remove the ones that are less than 1. Or use the filter Evan used.
However, I am happy to take a look at your particular document - share it with support@coda.io or mallika@coda.io and I can take a look.

(I did have a wonderful time traveling and am back ready to look at docs and more docs! )

Nick_HE · January 6, 2019, 6:35pm

Maybe because of some changes in Coda, I think there a couple things to update about this:

missing a close bracket here after “thisrow.Column)”
when trying to use this formula in my doc, it keeps autocorrecting out the “thisrow.”, and works without it

The new formula should read:
Table.Filter(Column.Contains(Column)).Count()>1

@mallika would mind updating the original post for others’ later reference?

Michael_RnW · February 28, 2019, 5:21pm

Thank you that was great help! It works for me but with a quirk.

I’d like to add some more detail for others that might come across this situation.
Seems like Table.Column and thisRow.Column are shown in the same way currently in Coda. But you have to make sure that you choose the correct version.

thisRow.Column for the first instance

And Table.Column for the second instance

Joshua · April 11, 2019, 7:29pm

How would I tweak this to only highlight duplicates within the same group?

Guttorm_Flatabo · April 24, 2019, 12:46pm

Only thing that worked for me was to use @evan’s filter in the conditional formatting.

mallika · April 24, 2019, 7:00pm

@Guttorm_Flatabo Welcome!
I am glad you got the filter working. Do you want to elaborate on what did not work for you?

Brittany_Dufresne · October 28, 2020, 6:12pm

Does anyone know how to edit the formula so that it doesn’t recognize blank cells as “duplicates”?

Andrei_Kharlanov · October 29, 2020, 10:36am

Hi, @Brittany_Dufresne!

Add AND thisRow.Column.IsNotBlank() inside the filter.

This formula checks if value of the column “Column” for the currently evaluated row is not blank (that it has any value).

Full formula:

Table.Filter(Column.Contains(thisRow.Column) AND thisRow.Column.IsNotBlank()).Count() > 1

Gregg_Stebben · February 15, 2021, 1:51am

This is awesome, thank you!

Mata-i · January 27, 2022, 5:56pm

Hello,

This formula causes big performance problems on a 825 rows table (5 minutes 37 seconds loading time). Is there a more optimal way to proceed?

I tried moving the formula to a checkbox column, then applying conditional formatting based on that column, since I understood that it was better to put formulas in columns to optimize performance. But it doesn’t seem to change anything.

Thanks in advance!

Matthew_Strax-Haber · February 20, 2022, 9:43pm

Hi @Mata-i ,

Since performance matters to you, I suggest using an alternative approach to finding duplicates. The ones involving Filter and Count/CountIf will give you the correct answer, but they have to loop through the whole data set fully for each item in the data set (in computational complexity terms, this is an O(n^2) algorithm because it touches each item n^2 times. This is VERY slow!

Instead, first sort the data. Then, compare each value (besides the last) to the one after. If equal, it’s a duplicate. If not, it’s not. This results in an O(n*log(n)) algorithm (the slowest part is the sorting), which is much faster than an O(n^2) algorithm.

For comparison, on a ~20k record data set, the first approach (using CountIf and Filter) takes about 40 minutes. The second approach takes less than one second. Both are equally correct.

Here is a sample canvas formula (it ignores duplicate blanks, but that would be easy to change):

WithName([Table With Duplicates].FieldToMatch.Sort(), SortedValues,
  Sequence(1,SortedValues.Count()-1).FormulaMap(
    WithName(SortedValues.Nth(CurrentValue), curVal,
    WithName(SortedValues.Nth(CurrentValue+1), nextVal,
    If(curVal!=nextVal,"",curVal)))))
.Unique().Filter(CurrentValue.IsNotBlank())

If you require an “Is Duplicate?” column or a conditional formula, you can name the canvas formula above “Duplicates” and then use a formula like: thisRow.FieldToMatch.In(Duplicates), which runs in O(n) time (i.e., very quickly).

If you want to find duplicate table rows based on some of the values of multiple columns, rather than duplicates within a single column, you can also use this more complex version of the formula:

WithName(
  [Table With Duplicates]
    .Sort(True,[Table With Duplicates].Field1)
    .Sort(True,[Table With Duplicates].Field2)
    .Sort(True,[Table With Duplicates].Field3),
  SortedValues,
  Sequence(1,SortedValues.Count()-1).FormulaMap(
    WithName(SortedValues.Nth(CurrentValue), curVal,
    WithName(SortedValues.Nth(CurrentValue+1), nextVal,
    If(Or(
      curVal.Field1 != nextVal.Field1,
      curVal.Field2 != nextVal.Field2,
      curVal.Field3 != nextVal.Field3),
      "", List(curVal,nextVal))))))
.ListCombine().Unique().Filter(CurrentValue.IsNotBlank())

and thisRow.In(Duplicates) for the column / conditional format formula.

Jean_Pierre_Traets · February 21, 2022, 8:49am

Dear @Matthew_Strax-Haber, thanks a lot sharing this very valuable perspective
I like also your very practical/clear naming in “WithName()”, so it’s very easy to understand the code.

Always eager to learn

Topic		Replies	Views
Find and mark duplicates automatically	10	396	January 12, 2024
How to find out duplicate value on the row and delete it. Need help!	4	5448	December 2, 2019
Button to Delete Duplicates Tips and Hacks	15	5292	September 5, 2022
Aligning Rows based on duplicates	1	683	March 25, 2019
Sequential Number To Duplicate Or Unique Values	15	507	July 16, 2022

Highlight duplicates in a column using a conditional format

Related topics