Migrate large nested dataset from one Doc to another

Hi all, as the title suggests, I’m wondering what the best approach would be for me here.

I have a Doc containing a main “Trades” table with a deeply nested relational structure. Trades contain references to items in many other tables:

  • Instrument
  • Screenshots
  • Transactions (which references another table called accounts)
  • Field Groups → Field Data (a relational structure I’m using for users to be able to add custom fields for their Trades)

And several other tables which I won’t go into here.

My problem is this:

  • I want to publish and share my Doc, but first remove all my personal data and fill it with some example data
  • I want to continue to develop and maintain my Doc after release, without having to duplicate the work across my personal version and the published version (error prone)
  • I want a (reasonably simple) process (for myself, and others) - to migrate my current data into the latest version of the published Doc.

My Doc contains potentially hundreds (if not thousands) of rows in the “Trades” table and many of the relationships are 1 to many (e.g a single “Trade” in my use-case usually contains 3 or more entries in the Screenshots table that are related to that trade)

So whatever process I use needs to be fairly robust, as I will not be able to go through each individual item by hand, whenever I want to migrate my data into to the latest Doc version.

My assumption is that a simple copy/paste of the rows from each individual table will permanently break all the references to related rows

Without a solution, publishing a Doc feels like releasing software on a compact disc - it must be completely “finished” before release, because once a user makes a copy of it, they will never be able to update to the latest version if their dataset has grown past the point where they can meticulously transfer everything by hand.

For now, let’s assume that updates to the Doc will “compatible” i.e. not major changes to the core data structure - the changes will be more superficial in nature (e.g. adding some new charts or calculations to my dashboard or similar)

I figured I can’t be the only person out there facing this problem :slight_smile:

I’m not averse to the idea of developing some sort of Pack that could migrate the data for me, even if the code was tightly coupled to the Doc version - it just depends on how much work is involved.

So, what do you think would be the best way for me to approach this?

Thanks in advance.

Hi, Traderunner

I wouldn’t simply make this assumption:

I have been pleasantly surprised in the past. So try things out a bit.

I am hoping that coda will implement upgrade/ version management for docs in addition to what they are doing for packs.

The ideas that I am playing with unfortunately involves a fair amount of functionality change, at least in the early days.

If you are only interested in data exchange, there is a pack that allows export to Excel/ CSV. I THINK import as well. What you could then consider is making a pack with functions that exports all of the tables/ per version, functions that import to the new version, and where necessary, function that can translate between different versions.

What is then left, is anything additional that users might have added to their docs.

Thanks for this,

It seems for copying and pasting, I have 2 options:

  • Copy rows
    This seems the same as the CSV export. References to rows in other tables are copied as plain text (using the display column). When pasting into the other Doc, Coda does not know what to do with this plain text reference, even if a related row with the same display text exists. Makes sense.

  • Copy entire tables from the page level
    Pasting an entire table in gives me a new table: “Trades 2” with an entirely new reference. Any formulas that reference the old Trades table now need to be nudged to reference the new table (the difficulty level of doing this without mistakes is high, as I have tons of complex formulas that reference it)

But really as you say, the thing I need is versioning. I can easily maintain the example data in my personal Doc and just remove my personal data when publishing. The problem is that the published Doc does not sit on it’s own “branch”, so to speak. Instead it will always mirror my personal version.

So the only way is to copy the entire Doc, remove my personal data, and publish that. The big disadvantage of this is I will have to create a brand new URL and be starting over when it comes to my SEO.

I came up with a solution to my problem that I’m happy with, so I thought I’d share. Feel free to copy:

I tried a bunch of methods to get data out of Coda and back in again. At first I had a basic Import/Export proof of concept working, using Copy to Clipboard to get the data, then pasting it as JSON into a cell and uploading that object via API. But I came across lot of problems like cell character limits. Every time I hit a limititation, I engineered a solution to get larger quantities of data in or out. In the end I can safely say that I can export/import as much data as I like :slight_smile: The only limitation I have now is whatever limits Coda itself has (in terms of Doc size).

The data model for this example consists of Trades, each of which can have one currency “Pairs” lookup, and multiple Tags. There are also Transactions - each of which has a link to a single Account and a single Trade. In effect, a Trade item can have many Transactions assigned to it (so a reverse lookup).

I have introduced the concept of a “Database” to my Doc - Another table to which all tables in the app are linked to. This way I can switch the “Database” to view an entirely separate dataset:

  • All tables are filtered by database using the “DatabaseSelect” interactive filter.
  • The “Database” field is set as the default value for new rows on all tables.
  • All lookup select columns filter the available options by the currently selected database:

This gives us seamless switching between databases. From the user’s perspective, creating a new row in the Database table and selecting it will effectively give them a completely empty app to start filling with data.

Pretty simple so far. However my end goal was to be able to export an entire database and then re-import it (either as a new database in the same doc, or to migrate to another version of the Doc - possibly a future update with extra features a user would like to take advantage of).

Export
The goal with the export is to get a .json file of all data from all tables in one hit, which can then be imported. My first approach to export a row was to create an “Object” column on each table, and have a formula build an object for each row which can later be put into an array. In the end I settled on creating a dedicated “Export” table, containing a row for each Database, and a column for each table to be exported. This way my object definitions are all in one place and it’s much easier to manage:

In each column, I grab all of the rows from the table I’m exporting, filter by the Database of the current row, and construct my row object using the Object() function. The requirement for import is that each exportable table must have a Row ID column (thisRow.RowId()) and an import ID column (which is just a blank cell, for use later). I also include the gridReference string using the hidden ROW_KEY formula so that my import pack can extract the table ID for the API request.

With the required fields out of the way, I then add all of the columns I want to export, being sure to only grab user input (non-formulaic) values. When adding a lookup column, I take the Row ID of the lookup value, so that the relationships between the tables can be paired properly when imported:

Screenshot 2022-12-09 at 16.12.40

I then combine the list of rows from each table into a monolithic array, the output which looks something like this:

On my Databases table, I have an Export button with the following formula. I am using the free Temp File pack to force the browser to download my JSON array as a file. Great.

The only problem is, the pack can only process a certain amount of text before failing (everything has it’s limit). To get around this, I create an array sequence to split my monolithic array into smaller chunks. The button action then runs on a loop, triggering the Temp File pack each time - so I will get multiple files (database1.json, database2.json etc), with the array length of each file depending on my “export chunk length” setting - until the entire array is downloaded.

Import

I use a file field to accept the user’s files. Then I run multiple nested RunActions() and ForEach() loops within my import button action, which performs the following tasks:

  1. Loop through and download all of the JSON files we just uploaded into memory, and reconstruct my monolithic array. For this I developed a simple Pack function to just retrieve a file URL as a JSON string.
  2. Split the array returned in to chunks (the length of which is based on another configurable setting) and run a sequence similar to what I have on the export button. But this time, we dump the chunk of array data into a hidden cell. The reason I split it into chunks here is because again, I found a limitation - a cell can only handle around 40,000 characters of text before it gets upset. So we need to split it up for large datasets.
  3. Once the chunk cell has been populated, my loop hits the “Import chunk” action to actually perform the API request via my pack (this is another action button which I hid from view)
  4. I then add a delay using the hidden _DELAY() feature before the loop goes around again - just so I don’t end up firing off a million requests at once :slight_smile: - I also added a rate limit setting for this but actually, the API and the Pack Studio seem to be able to handle A LOT of requests (trust me, I’ve been hammering it!), so I’m not sure I really needed it in the end.
  5. At each step, my formula updates various cells in the Database table, so that I can feed back to the user what’s happening with a progress bar and some status messages. The last thing you want when you are performing large, scary file operations is no feedback!

Here’s what the import Chunk button action looks like. Simple:

Screenshot 2022-12-09 at 16.32.16

My Pack function then takes the data chunk, transforms it into something the Coda API understands, and inserts the rows. But before it calls the API, it also copies the Row ID to the empty Import ID column. This way, when the data is received by the API, Coda can link all of the row relationships back together. The only problem is that for this to work, the user has to set the “Import ID” column as the display column on each table before they run the import. I wish there was a way to automate this, or some better way to link the rows without this manual step (I can’t tell you how many times I forgot to do it while developing this feature!), but I can live with this for now.

Once the data is imported via the API, and Coda has completed it’s task of re-linking the rows and bringing them into the Doc, we now have a bunch of non-empty “Import ID” fields that we don’t want. If we want to import another data set, those left over fields will cause us problems - data could get overwritten or linked incorrectly. So I needed a “clean” button that lights up when the import completes - this simply erases all of the temporary import IDs in all tables, “cleaning” my database.

After all is said and done, we end up with the following user experience, which is not bad at all in my opinion (I’m using silly small files sizes to demonstrate the chunking method):

import-export-demo

I have tested this method with around 40,000 rows - and while Coda definitely slows down, and the import takes a while, the process runs smoothly and reliably with basically unlimited amounts of data, thanks to the chunk/splitting method.

I learned a lot about Coda’s limitations while developing this feature. But I have to say I am very impressed with the sheer scope of creativity this platform provides. I am impressed with how powerful RunActions is and it’s sequential nature (it can be used like a promise chain, which I hadn’t realised previously).

The only downside is, I am relying heavily on hidden formulas throughout the entire process. But I can live with it :slight_smile:

Hopefully this helps someone out.

Enjoy your weekends :beers:

3 Likes

This is so convoluted lol but it makes me really happy. Nice work.

1 Like

:joy: I know right. I spent a lot of time trying to simplify things and get around the limitations in the most basic way I possibly could.

I also don’t mind if the code is a bit complicated so long as the user experience and maintenance of the output objects easy and simple.

One thing that would make a huge difference, is if I was able to access the Column ID reference through some sort of hidden formula or hack, instead of hard coding the column names/IDs - but it doesn’t seem to be possible at this point.

That and the rather annoying step of having to adjust the display column on every table before importing. But it is what it is.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.