I decided to answer your question inline to this thread because it’s more related to the supposition that we might be able to create vast data scale under a Coda document. And despite your interpretation of my earlier comment where I said…
… I am not doing this in a Coda app because it is [presently] not possible (more on that later). But I am doing it with very weak clients such as Google Sheets.
As you know, spreadsheets are ideal user interfaces for many things including FinTech apps. One of my clients has 800,000 clients each of which has an average 2200 transactions. This is a big number when you factor in the complete data set for all clients.
Workers using this dataset need a spreadsheet to perform the analyses and other research required for day-to-day operations. Furthermore, they need to perform certain data science activities such as running machine learning models against individual client data and sometimes comparing client metrics to the entire data set.
These are steep challenges given the depth and breadth of the data and the relatively limited operating ceilings for spreadsheets.
Something’s Gotta’ Give
And that something is simply a caching approach that allows users to (i) easily select and (ii) near-instantly begin to work on a client’s entire dataset. This is achieved with two key features -
- The ability to find a client’s data quickly and painlessly;
- The ability to extract and instantiate the client’s data in a collection of spreadsheet tabs quickly.
#1 Search
As you can imagine, a picklist of 800,000 client names or IDs is out of the question and especially so in a spreadsheet app. Search - and a very forgiving search - is the only way to create effortless access to specific client data sets. I do this by maintaining a deeply tokenized search index for the fields that contain likely search terms. The index architecture is essentially the Lucene (i.e., ElasticSearch) open-source algorithm.
Using this approach in Coda is not a big leap but it does require the ability to integrate a search UI into the Coda framework. A search UI is no different than a a web form and we all know this is not ideally possible in the current release. Another requirement for seamless search experiences into large data sets is need for the form to interact with the search index over HTTP. Again, this is not presently possible in Coda and why I specifically asked about it during the maker webinar.
#2 Extract and Instantiate
The integrated scripting engine (Google Apps Script) which is provided as a containerized application server behind every Google doc is ideal for processing requests over HTTP to services such as Firebase and Firestore. But, the Google Apps Script model also supports inline HTML panels that run in the sidebar of all Google documents.
This makes it possible to utilize dynamic HTML apps in the context of Google server-side containers for automation. Quite literally, an HTML app integrated into a spreadsheet instance has the ability to use the real-time aspects of Firebase. By setting the client ID context in javascript in the app, Firebase can instantly synchronize the spreadsheet with thousands of transactions. It’s able to do this because (i) a sockets connection is zippy fast, and (ii) Google Apps Script handles arrays quite efficiently.
Updating Big Data
Once again, record locking and updates back into the Firebase environment is fully supported because it’s a real-time architecture; changes occur in real-time which ostensibly eliminates collisions and other concurrent editing issues. I don’t want to trivialize this, but for most applications, a real-time architecture comes with big advantages that allow me to be lazy.
Data Science Operations
To use 800,000 client records in any sort of analytical process is not easy. You need ways to extract and run processes that stem from various aggregations that are maintained as the data changes. But there are some tools that make this practical.
I use Streamlit as the rendering layer for all Python-based data science apps. It includes a very advanced data caching approach that I do not fully understand but I certainly enjoy its advantages. It allows me to create a client web app that can load a million items in a few seconds and allow users to begin slicing and dicing the data.
Firebase, Firestore, Google Docs, Google Apps Script, Streamlit - I stand on the shoulders of giants.