Quick little conditional format that I often use to find the duplicates in a table -

The formula I have used here is

Table.Filter(Column.Contains(thisrow.Column).Count() >1

Quick little conditional format that I often use to find the duplicates in a table -

The formula I have used here is

Table.Filter(Column.Contains(thisrow.Column).Count() >1

26 Likes

Nice @mallika! I definitely use this all the time for deduping (but usually put this into the filter directly so I have only the remaining results that need to be scrubbed; filter ends up being `lookup([thistable], Name, thisrow.Name).count() >1`

.

6 Likes

The real question is what in the world are you making that requires that much Absinthe?

7 Likes

Wouldnâ€™t you like to know

-spending the weekend working on sanitized examples-

The good news is Iâ€™m pretty sure most alcohols can be used as antibacterials

2 Likes

wondering if you can expand on thisâ€¦

Iâ€™m trying to do this as a formula in the document that counts all the different names that are list more than once.

I canâ€™t quite figure it outâ€¦ can it be done @mallika?

Hi Tom, sorry I was traveling for a bit. Did you figure this one out?

sure didnâ€™t! have been working on other things! If you have any insight that would be great though! hope you had an awesome trip

The formula above should work for you - give you the number of times a a name has been listed and you can filter to remove the ones that are less than 1. Or use the filter Evan used.

However, I am happy to take a look at your particular document - share it with support@coda.io or mallika@coda.io and I can take a look.

(I did have a wonderful time traveling and am back ready to look at docs and more docs! )

Maybe because of some changes in Coda, I think there a couple things to update about this:

- missing a close bracket here after â€śthisrow.Column)â€ť
- when trying to use this formula in my doc, it keeps autocorrecting out the â€śthisrow.â€ť, and works without it

The new formula should read:

Table.Filter(Column.Contains(Column)).Count()>1

@mallika would mind updating the original post for othersâ€™ later reference?

2 Likes

Thank you that was great help! It works for me but with a quirk.

Iâ€™d like to add some more detail for others that might come across this situation.

Seems like Table.Column and thisRow.Column are shown in the same way currently in Coda. But you have to make sure that you choose the correct version.

thisRow.Column for the first instance

And Table.Column for the second instance

1 Like

How would I tweak this to only highlight duplicates within the same group?

1 Like

@Guttorm_Flatabo Welcome!

I am glad you got the filter working. Do you want to elaborate on what did not work for you?

Does anyone know how to edit the formula so that it doesnâ€™t recognize blank cells as â€śduplicatesâ€ť?

Hi, @Brittany_Dufresne!

Add `AND thisRow.Column.IsNotBlank()`

inside the filter.

This formula checks if value of the column â€śColumnâ€ť for the currently evaluated row is not blank (that it has any value).

**Full formula:**

`Table.Filter(Column.Contains(thisRow.Column) AND thisRow.Column.IsNotBlank()).Count() > 1`

5 Likes

This is awesome, thank you!

Hello,

This formula causes big performance problems on a 825 rows table (5 minutes 37 seconds loading time). Is there a more optimal way to proceed?

I tried moving the formula to a checkbox column, then applying conditional formatting based on that column, since I understood that it was better to put formulas in columns to optimize performance. But it doesnâ€™t seem to change anything.

Thanks in advance!

1 Like

Hi @Mata-i ,

Since performance matters to you, I suggest using an alternative approach to finding duplicates. The ones involving Filter and Count/CountIf will give you the correct answer, but they have to loop through the whole data set fully for *each* item in the data set (in computational complexity terms, this is an O(n^2) algorithm because it touches each item n^2 times. This is VERY slow!

Instead, first sort the data. Then, compare each value (besides the last) to the one after. If equal, itâ€™s a duplicate. If not, itâ€™s not. This results in an O(n*log(n)) algorithm (the slowest part is the sorting), which is *much* faster than an O(n^2) algorithm.

For comparison, on a ~20k record data set, the first approach (using CountIf and Filter) takes about 40 minutes. The second approach takes less than one second. Both are equally correct.

Here is a sample canvas formula (it ignores duplicate blanks, but that would be easy to change):

```
WithName([Table With Duplicates].FieldToMatch.Sort(), SortedValues,
Sequence(1,SortedValues.Count()-1).FormulaMap(
WithName(SortedValues.Nth(CurrentValue), curVal,
WithName(SortedValues.Nth(CurrentValue+1), nextVal,
If(curVal!=nextVal,"",curVal)))))
.Unique().Filter(CurrentValue.IsNotBlank())
```

If you require an â€śIs Duplicate?â€ť column or a conditional formula, you can name the canvas formula above â€śDuplicatesâ€ť and then use a formula like: `thisRow.FieldToMatch.In(Duplicates)`

, which runs in O(n) time (i.e., very quickly).

If you want to find duplicate table rows based on some of the values of multiple columns, rather than duplicates within a single column, you can also use this more complex version of the formula:

```
WithName(
[Table With Duplicates]
.Sort(True,[Table With Duplicates].Field1)
.Sort(True,[Table With Duplicates].Field2)
.Sort(True,[Table With Duplicates].Field3),
SortedValues,
Sequence(1,SortedValues.Count()-1).FormulaMap(
WithName(SortedValues.Nth(CurrentValue), curVal,
WithName(SortedValues.Nth(CurrentValue+1), nextVal,
If(Or(
curVal.Field1 != nextVal.Field1,
curVal.Field2 != nextVal.Field2,
curVal.Field3 != nextVal.Field3),
"", List(curVal,nextVal))))))
.ListCombine().Unique().Filter(CurrentValue.IsNotBlank())
```

and `thisRow.In(Duplicates)`

for the column / conditional format formula.

9 Likes

Dear @Matthew_Strax-Haber, thanks a lot sharing this very valuable perspective

I like also your very practical/clear naming in â€śWithName()â€ť, so itâ€™s very easy to understand the code.

Always eager to learn