Missing numbers in a sequence

Breno_Nunes · September 25, 2021, 11:38pm

I’ve been trying to implement something in Coda that I thought would be very easy but I just can’t get it done.
Imagine you have a sequence 1,2,3,4,6,7,8,10,11 . I would like to find out the numbers that are missing . In this case, 5 and 9.

Scott_Collier-Weir · September 26, 2021, 2:38am

Here you go!

The short answer is this:

I don’t know about you but those long formulas are always so difficult for me to read, so if thats the case for you - I made an example doc that explains the thought process as to how to find it in depth, splitting the formula into 4 digestable pieces.

Example Doc

It can take as inputs any list of numbers in any order (does not have to be sequential) and then outputs the missing numbers. If your exact use case is a bit different you should be able to still use the concept!

Christiaan_Huizer · September 26, 2021, 7:20am

hi @Scott_Collier-Weir and @Breno_Nunes

this one is written a bit shorter and follows the same logic:

sequence(1,thisRow.name.Split(",").Max()).Filter(CurrentValue.Contains(thisRow.name.Split(",")).Not())

cheers, Christiaan

Paul_Danyliuk · September 26, 2021, 9:49am

I’d like to once again reiterate on what I tried to teach you just a few days ago.

Do the answers above work? Yes, absolutely.
Are they the most optimal? Not always.
Is this a problem for short sequences? Not at all, but it’s still a good exercise to try and find an alternative solution that could be better in certain situations.

What the above algorithm does:

For each of N items
- Check if the data set contains that item (by performing at least one, at most N checks of the item against each item from the set)

Therefore the complexity is O(N^2), i.e. the larger is the sequence, the more time it would take quadratically to calculate this. I.e. twice the numbers would take 4 times longer to calculate, 4 times the numbers would take 16x longer and so on.

Since the incoming dataset is already sorted, we could optimize the algorithm by only comparing N+1th with the Nth item and collecting the missing numbers between. The algorithm is more complicated and most likely will take longer time to calculate on small data sets, but on larger datasets it would win because it is linear, i.e. the number of operations (checks or additions to the list) and the time to calculate would grow proportionately to the number of items, i.e. 4 times the items would take not 16x the time but only 4x the time.

Sequence(2, Items.Count()).FormulaMap(
  If(
    Items.Nth(CurrentValue) - Items.Nth(CurrentValue - 1) != 1,
    Sequence(Items.Nth(CurrentValue - 1) + 1, Items.Nth(CurrentValue) - 1),
    List()
  )
).ListCombine()

I guess it’s time to record my own course on Coda performance and optimizations. Who’d like one?

Pch · September 26, 2021, 4:16pm

!

Breno_Nunes · September 26, 2021, 9:32pm

Thank you @Scott_Collier-Weir , @Christiaan_Huizer for your help. I really appreciate it.
Hi, @Paul_Danyliuk. Thank you for your help too. I didn’t get your formula to work, though. It seems there is something missing before list().
As my sequence has more than 1000 numbers, I believe, for my case, it’s more appropriate.

Christiaan_Huizer · September 27, 2021, 3:30am

hi @Breno_Nunes ,

This one works, the last argument of ‘if’ was not executed due to a missing comma:

Sequence(2, thisRow.items.Count()).FormulaMap(If(thisRow.items.Nth(CurrentValue) - thisRow.items.Nth(CurrentValue - 1) != 1,Sequence(thisRow.items.Nth(CurrentValue - 1) + 1,thisRow.items.Nth(CurrentValue) - 1),List())).ListCombine()

Your questions brought a great @Paul_Danyliuk insight to the community, thx

Ander · September 27, 2021, 2:39pm

@Breno_Nunes

I use this approach from time to time:

thisRow.[base list].FormulaMap(
  If(
    CurrentValue.In(thisRow.[actual list]),
    "",
    CurrentValue
  ) 
)
.Filter(CurrentValue.IsNotBlank())

Paul_Danyliuk · September 27, 2021, 9:11pm

Your approach is basically filter with extra steps What you have is logically equal to:

thisRow.[base list].Filter(
  CurrentValue.In(thisRow.[actual list]) != true
)

Again, to put my money where my mouth is, I created a simple benchmark doc. You can select the number of items (sequence from 1 to N) and fill % (how much of the sequence exists, e.g. 20% would mean that at average only 2 out of 10 numbers will exist and the rest 8 will be missing).

I improved both algorithms:

In the naive one I replaced .Max() with .Last() because the last is known to be already the max one, and getting Last() is 1 operation while calculating Max() is N operations.
In my algo I replaced the creation of empty list within an If() with returning a blank value, and added a .Filter(IsNotBlank()) instead. I have a feeling that merging and then filtering blank items could be a bit faster than initializing empty lists (I’m not so sure Coda optimized it under the hood by returning singleton empty lists).

Anyway, here’s the quite interesting observations:

On very sparse lists (only ~20% of items present) the linear algorithm wins on data sets over 100 items. The simple one wins on smaller ones.
On lists where only certain items are missing, the simple algorithm wins all the way. I guess the slowdown probably comes from the inner Sequence(start, end) generation that creates too many single item lists, and then ListCombine() has to unpack these. Ideally we could append to a single list instead of creating hundreds of single-element lists, but we don’t have that in Coda unfortunately.

The doc:

I’ll probably keep digging. For now actually consider using @Scott_Collier-Weir’s / @Christiaan_Huizer’s algo but replace .Max() with .Last() and remove .Sort() if you know your set is already sorted. Anyway this is a great illustration on how optimization is a fine craft and there’s no silver bullet to work under any circumstances — each client’s situation is different and calls for a different solution.

Ander · September 28, 2021, 12:51am

True.

Clients should weigh what these kinds of 2.5 second optimizations are worth to their business.

They may be worth a lot, depending on other factors like frequency, dependencies, etc.

They may be nice to have, but the business has bigger fish to fry.

Or they may have no practical value at all.

Each client’s situation is different.

Scott_Collier-Weir · September 28, 2021, 1:23am

True! Opportunity cost - overall, is the work of optimizing in these cases worth the extra speed?

If it’s just 2.5 seconds, and to gain back those 2.5 is a lot of work, then probably not.

But more often than not I’d say it’s definitely worth it. Even if just 1 second. So often clients come to me with documents that they are ready to give up using, despite the function of the doc because of the way those 1 second optimizations stack up over an entire doc.

Overall, and in almost every case, I would say doing the work to optimize is worth it. At the very least, when building out systems that eventually run crucial aspects of a business these questions should be a part of every decision and every formula (even if the optimal formula isn’t written every time, it should still be end goal)

@Paul_Danyliuk has some good points - no silver bullet to optimization, but no good doc should be built without optimization as a foundational goal.

If it a doc is slow and clunky, no one is going to use it.

Anyways - this is way off topic from missing numbers in a sequence. Wonder if we can move the thread elsewhere??

Topic		Replies	Views
Comparison - Not Contained	19	1111	March 17, 2023
Looking for insights to suppress some redundancy in a "long" formula community-champions	6	935	October 20, 2021
No help needed - Just a question to ask about : Concatenate() a list of sequential and/or non-sequential numbers?	5	1837	October 6, 2020
[SOLVED] Difference between two lists	9	2966	April 7, 2019
Concatenate sub sequences	3	225	October 31, 2023

Missing numbers in a sequence

Related topics