Refresh a table with 6000 lines

I’m making a survey for my company, and want all the users to input their company username, which is unique to every user. In order to avoid errors, I made it a drop down that references a Coda table with all the acceptable usernames in the company (~6,000). That list obviously changes quite often, and it’s source is in our Redshift DataWarehouse. I want to refresh that list programmatically, but using Python it looks like I have to delete it row by row and then add rows in batches of 10… by the time it’s done it’ll be time to run it again!

Is there an easier way to do data validation in my survey? Or is there a faster way to drop and refill a table?

PS, I’ve tried collecting the submitter’s email in the form because I could usually extract their username from it eg. ‘sthompson@mycompany.com’, but occasionally people have customized their email so it’s not a perfect match.

# Delete rows from coda table
def delete_rows(coda, doc_id, table_id):

    # Read in current row IDs
    rows = coda.list_rows(doc_id = doc_id, table_id_or_name = table_id)

    # Extract table values
    table_values = rows['items']

    # Convert table values into dataframe
    output = pd.DataFrame()
    for i in range(len(table_values)):
        row = {k: table_values[i][k] for k in ('id', 'index', 'createdAt', 'updatedAt', 'browserLink')}
        values = table_values[i]['values']
        row.update(values)

        output = pd.concat([output, pd.DataFrame(row, index=[0])], ignore_index=True)

    # Delete rows
    for i in range(output.shape[0]):
        coda.delete_row(
            doc_id=doc_id,
            table_id_or_name=table_id,
            row_id_or_name=output['id'][i]
        )

    # Introduce a delay between row deletions to respect rate limits
        time.sleep(2)  # Adjust the delay as needed

# Generate Coda-friendly payload
def generate_payload(df, columns):
    import json
    from collections import defaultdict

    payload = defaultdict(list)
    row_array = defaultdict(list)
    for idx, row in df.iterrows():
        for col in columns:
            d = {'column': col, 'value': str(row[col])}
            row_array['cells'].append(d)
        payload['rows'].append(row_array)
        row_array = defaultdict(list)
    return json.dumps(payload, indent=2)

# Upload to coda
def coda_upload(catalog, token, doc_id, table_id):
    import json

    # Connect to Coda
    coda = coda_connect(token)

    # Delete rows
    delete_rows(coda, doc_id, table_id)

    # Generate payload
    payload = generate_payload(catalog, list(catalog.columns))
    payload = json.loads(payload)

    # Upsert rows in batches
    batch_size = 10  # Number of rows to include in each batch
    total_rows = len(payload['rows'])
    for i in range(0, total_rows, batch_size):
        batch_payload = {
            'rows': payload['rows'][i:i+batch_size]
        }
        coda.upsert_row(doc_id=doc_id, table_id_or_name=table_id, data=batch_payload)

        # Introduce a delay between batches to respect rate limits
        time.sleep(3)

Hi @Matthew_Adams - Welcome to the Coda community! I can think of a few ways you could try and improve the performance of your script:

  • Rather than delete and re-add all rows on every sync, instead fetch the current state of the Coda table and then calculate the delta between it and your source of truth. Then only add missing rows and delete outdated rows.
  • There is a deleteRows endpoint that allows you to delete multiple rows in one API call.
  • The upsertRow endpoint allows for many more than 10 at a time, and for a simple table I was able to upsert 10,000 rows in a single API call.

Lastly, syncing use cases like this can alternatively be solved by building a Pack. Your Pack can include a sync table, which handles all of the syncing logic and scheduling, and all you have to do is write the code to fetch the current set of records from Redshift.

2 Likes

Hey Eric, that sounds great, thanks for the insights!

Can you make a get call to your database of company names?

If you are willing to house this inside a doc rather than posting as an externally accessible form you can make a pack that “validates” their input.

Basically, make a great call to see if user input is an acceptable choice.

Or something very similar!

One word - webhooks.

Why don’t you integrate your survey data through webhooks. As the data changes, push it in live updates to Coda. We do this for CyberLandr survey data - about 10,000 rows so far - all real-time. If a customer modifies an answer, Coda has it in ~10 seconds. There are no polling or batch processes that cause latencies or update pressure.

Event-driven solutions scale, and I’m pretty sure this is possible in RedShift.

2 Likes