Highlight duplicates in a column using a conditional format

Quick little conditional format that I often use to find the duplicates in a table -

The formula I have used here is
Table.Filter(Column.Contains(thisrow.Column).Count() >1

27 Likes

Nice @mallika! I definitely use this all the time for deduping (but usually put this into the filter directly so I have only the remaining results that need to be scrubbed; filter ends up being lookup([thistable], Name, thisrow.Name).count() >1.

9 Likes

The real question is what in the world are you making that requires that much Absinthe?

8 Likes

Wouldn’t you like to know :smiley:
-spending the weekend working on sanitized examples-

The good news is I’m pretty sure most alcohols can be used as antibacterials

2 Likes

wondering if you can expand on this…

I’m trying to do this as a formula in the document that counts all the different names that are list more than once.

I can’t quite figure it out… can it be done @mallika?

Hi Tom, sorry I was traveling for a bit. Did you figure this one out?

sure didn’t! have been working on other things! If you have any insight that would be great though! hope you had an awesome trip :slight_smile:

The formula above should work for you - give you the number of times a a name has been listed and you can filter to remove the ones that are less than 1. Or use the filter Evan used.
However, I am happy to take a look at your particular document - share it with support@coda.io or mallika@coda.io and I can take a look.

(I did have a wonderful time traveling and am back ready to look at docs and more docs! )

Maybe because of some changes in Coda, I think there a couple things to update about this:

  • missing a close bracket here after “thisrow.Column)”
  • when trying to use this formula in my doc, it keeps autocorrecting out the “thisrow.”, and works without it

The new formula should read:
Table.Filter(Column.Contains(Column)).Count()>1

@mallika would mind updating the original post for others’ later reference?

2 Likes

Thank you that was great help! It works for me but with a quirk.

I’d like to add some more detail for others that might come across this situation.
Seems like Table.Column and thisRow.Column are shown in the same way currently in Coda. But you have to make sure that you choose the correct version.

thisRow.Column for the first instance
image

And Table.Column for the second instance
image

1 Like

How would I tweak this to only highlight duplicates within the same group?

1 Like

Only thing that worked for me was to use @evan’s filter in the conditional formatting.

1 Like

@Guttorm_Flatabo Welcome!
I am glad you got the filter working. Do you want to elaborate on what did not work for you?

Does anyone know how to edit the formula so that it doesn’t recognize blank cells as “duplicates”?

Hi, @Brittany_Dufresne!

Add AND thisRow.Column.IsNotBlank() inside the filter.

This formula checks if value of the column “Column” for the currently evaluated row is not blank (that it has any value).

Full formula:

Table.Filter(Column.Contains(thisRow.Column) AND thisRow.Column.IsNotBlank()).Count() > 1

5 Likes

This is awesome, thank you!

Hello,

This formula causes big performance problems on a 825 rows table (5 minutes 37 seconds loading time). Is there a more optimal way to proceed?

I tried moving the formula to a checkbox column, then applying conditional formatting based on that column, since I understood that it was better to put formulas in columns to optimize performance. But it doesn’t seem to change anything.

Thanks in advance!

1 Like

Hi @Mata-i ,

Since performance matters to you, I suggest using an alternative approach to finding duplicates. The ones involving Filter and Count/CountIf will give you the correct answer, but they have to loop through the whole data set fully for each item in the data set (in computational complexity terms, this is an O(n^2) algorithm because it touches each item n^2 times. This is VERY slow!

Instead, first sort the data. Then, compare each value (besides the last) to the one after. If equal, it’s a duplicate. If not, it’s not. This results in an O(n*log(n)) algorithm (the slowest part is the sorting), which is much faster than an O(n^2) algorithm.

For comparison, on a ~20k record data set, the first approach (using CountIf and Filter) takes about 40 minutes. The second approach takes less than one second. Both are equally correct.

Here is a sample canvas formula (it ignores duplicate blanks, but that would be easy to change):

WithName([Table With Duplicates].FieldToMatch.Sort(), SortedValues,
  Sequence(1,SortedValues.Count()-1).FormulaMap(
    WithName(SortedValues.Nth(CurrentValue), curVal,
    WithName(SortedValues.Nth(CurrentValue+1), nextVal,
    If(curVal!=nextVal,"",curVal)))))
.Unique().Filter(CurrentValue.IsNotBlank())

If you require an “Is Duplicate?” column or a conditional formula, you can name the canvas formula above “Duplicates” and then use a formula like: thisRow.FieldToMatch.In(Duplicates), which runs in O(n) time (i.e., very quickly).

If you want to find duplicate table rows based on some of the values of multiple columns, rather than duplicates within a single column, you can also use this more complex version of the formula:

WithName(
  [Table With Duplicates]
    .Sort(True,[Table With Duplicates].Field1)
    .Sort(True,[Table With Duplicates].Field2)
    .Sort(True,[Table With Duplicates].Field3),
  SortedValues,
  Sequence(1,SortedValues.Count()-1).FormulaMap(
    WithName(SortedValues.Nth(CurrentValue), curVal,
    WithName(SortedValues.Nth(CurrentValue+1), nextVal,
    If(Or(
      curVal.Field1 != nextVal.Field1,
      curVal.Field2 != nextVal.Field2,
      curVal.Field3 != nextVal.Field3),
      "", List(curVal,nextVal))))))
.ListCombine().Unique().Filter(CurrentValue.IsNotBlank())

and thisRow.In(Duplicates) for the column / conditional format formula.

9 Likes

Dear @Matthew_Strax-Haber, thanks a lot sharing this very valuable perspective :trophy:
I like also your very practical/clear naming in “WithName()”, so it’s very easy to understand the code.

Always eager to learn :student: