Dedupe an array of arrays based on a common key/object with formulamap()

Paul_Danyliuk · May 16, 2020, 4:03pm

Alright, here’s an explanation (also noticed a problem with my solution while writing this post, lol, so fixed and also simplified).

Normally the solution would look like this:

Get the list of unique emails as
```
[List of lists].FormulaMap(
  CurrentValue.Last()
)
```
given that emails are last in each list.
Run a simple formula map:
```
[Unique emails].FormulaMap(
  [List of lists].Filter(CurrentValue.Last() = uhm… CurrentValue).First()
)
```
i.e. for each email (CurrentValue) filter the list of lists down to only those entries that correspond to this email and out of those matched rows select the first one.

But I guess you can see the problem already. Since the CurrentValue variable is overwritten within Filter() with a record from the list of lists, there’s no way to access the outer CurrentValue that would mean the current email to find records for.

So the trick here is to make a full product of M emails and N records and then filter/slice based on that. This way we can iterate over all combinations of inner and outer CurrentValue (because our CurrentValue will essentially be a List(OuterCurrentValue, InnerCurrentValue). And because of the cyclic nature of this new list (items 1…N correspond to 1st email, items N+1…2N correspond to 2nd email etc) it’s easy to extract and only use the portion of the list we’re interested in within this iteration.

So here’s what I’m doing:

Inner and outer count. This is straightforward:
I’m composing a full product of two lists, i.e. all combinations of items from the right and items from the left here:

image1189×214 30 KB

The math is like this:
- For M x N times (starting at zero for convenience)
- Make a list out of two elements
- Where the first element will be taken from the list of emails, but for the first N rows it will be the first email, for the second N rows the 2nd email and so on. That’s the RoundDown(CurrentValue / [Inner count]) plus 1 to get index back to 1-based.
- And the 2nd parameter is a corresponding item from the list of records, repeated with a cycle of N. That’s the Remainder() formula.
In the same cell (because why not) I’m replacing items where the join fails (i.e. email CurrentValue.First() doesn’t match the one in the record CurrentValue.Last().Last()) with blanks, and unwrapping the rest:

image1026×369 29.4 KB

Now we got our records aligned in an MxN matrix in such way that we know:
- the first N items (records or blanks) correspond to the first email
- the second N items correspond to the second email, and so on.
So all that’s left is to select one record from each sequence of N items:
- For each of M emails (and we don’t care about the emails anymore — we just know to take the next N items for M times) slice the corresponding sublist:
  
  image786×268 20.1 KB
  
  i.e., items 1–3 for the first email, items 2–6 for the 2nd email etc
- filter out the blanks
- and finally take the First() one of those remained (could’ve taken Last(), whatever).

Hope this explains the trick. There’s not much genius in this — I think any person with some CS and competitive programming background would’ve considered this approach.

Topic		Replies	Views
Formula for unique combinations	10	1698	October 13, 2021
FormulaMap() help: Not use any formula or button column	9	883	June 21, 2022
Removing Duplicates by Email using a button	3	966	March 8, 2021
FormulaMap: From textfield >> list >> adding the single items to a new table	4	906	March 31, 2020
Lookup different values from list using formula map?	1	1115	August 6, 2020

Dedupe an array of arrays based on a common key/object with formulamap()

Related topics