The Purge! Removing Orphaned Rows From A Multi-Tiered Database

Most real-world workflows will involved ‘Multi-Tiered’ databases of linked tables.
But, over time (especially during development), we can end up with rows in lower tiers that are no longer linked to valid ‘parent’ rows in the higher tiers. We call these ‘disconnected’ rows “orphans”.

For example, we might have CUSTOMERS who place many ORDERS which contain several ITEMS.
So the ITEMS table has a lookup-column, MyOrder, that links it to its order in the ORDERS table.
And the ORDERS table has a lookup-column, MyCustomer, that links it to its customer in the CUSTOMERS table.

We often find that, over time, we end up with ORDERS that are no longer linked to valid CUSTOMERS, and many ITEMS that are no longer linked to valid ORDERS. They are caused by people deleting CUSTOMERS and leaving their ORDERS behind or by adding rows that are not properly linked into the rest of the data-model. Often this also happens during development while we are testing buttons and automations.

So we need a way to carefully and safely remove all these “Orphans” from our database to leave only those rows that are properly linked to live parents etc.

To do this, we create a canvas-button, PurgeOrphans!, on a hidden page that has the following action-code (for our CUSTOMERS->ORDERS->ITEMS example)…

RunActions(
    DeleteRows(Filter(ORDERS,Not(MyCustomer.In(CUSTOMERS)))),
    DeleteRows(Filter(ITEMS,Not(MyOrder.In(ORDERS))))
)

We need to start at the top tier and work our way down through the tiers for the purge to work properly.

How It Works (using the ORDERS as an example)…

  • the MyCustomer.In(Customers) selects all ORDERS that are correctly linked to existing CUSTOMERS
  • the Not() reverses that selection - selecting all ORDERS that are NOT linked to existing CUSTOMERS
  • so the Filter(ORDERS,Not(MyCustomer.In(CUSTOMERS))) returns a list of all the orphaned ORDERS rows
  • which is then used by the DeleteRows() function to remove those orphaned rows

Having done this for the ORDERS table, the next action does the same for the orphans in the ITEMS table.

In programming terms, this is known as “Garbage Collection”, an automatic way to identify and delete anything in memory that can no longer be ‘reached’ by the valid objects in the system. But I prefer “Purge Orphans” as being a better description of what it does.

No human-orphans were harmed during the production of this post :wink:

Max

UPDATE: this process is efficient and compact BUT if you enter the formula incorrectly, it could delete a LOT of the wrong rows - so please see my SAFER “2-Phase Process” in the replies below…

13 Likes

I appreciate your compassion towards human orphans…

2 Likes

Thank you Max for the clear explanation of how the action actually works instead of just sharing the formula for it.

I find this to be very helpful because I always come up with similar cases in which people didn’t do the correct procedure to eliminate records in the database and we end up with orphan records.

I’ll definitely consider using this approach as a generic automation in all my docs!

1 Like

Ok, my conscience has been bothering me about the Purge code I explained above.

While it is efficient, effective and very compact; it is a bit UNSAFE, especially for novice makers.

It is fine if you enter the formula CORRECTLY; ie if the correct LOOKUP columns and their corresponding parent TABLES are listed correctly - it works a treat.

BUT

If you were to enter the wrong details, then there is no way to TEST it first, and it COULD end up deleting a LOT of the wrong rows in an instant. And if you detect that immediately, you can roll-back the changes with Control_Z.

But if you DONT detect the issue immediately, then recovery is still possible (using the document history) but not at all easy!

SO

I am changing my proposed process to a TWO-PHASE approach as follows

  • Phase 1 Button; will MARK all the orphaned rows for deletion
  • Phase 2 Button; after you have checked the results, the marked rows are deleted

This requires you to add an extra column, a check-box, Delete?, to every table

Phase 1 code

RunActions(
    ModifyRows( Filter( ORDERS, Not( MyCustomer.In( CUSTOMERS ))), Delete?, TRUE),
    ModifyRows( Filter( ITEMS, Not( MyOrder.In( ORDERS))), Delete?, TRUE)
)

Then you check that the Delete? flag is set ONLY for those cases where rows are orphaned.
Only then do you execute the Phase 2 code below

Phase 2 code…

RunActions(
    DeleteRows(Filter(ORDERS, Delete?)),
    DeleteRows(Filter(ITEMS, Delete?))
)

This is not as elegant, but is a much safer process.

Max

5 Likes

This post is very helpful to me Max. Thank you!