One-way sync (handle pagination for syncing source tables with >500 items)

I was following the steps in “Building a one-way sync” API guide and thanks to the great documentation my first two docs are almost in sync.

The only problem is that just the first 500 items of the source table are synced. I found this comment in the script that I copied from the guide but I couldn’t figure it out what should I do:

// TODO: handle pagination for syncing source tables with >500 items.

I copied the script from the guide, replaced the highlighted IDs, set the table names and didn’t change anything else.

Do you have any advice what should I change or how to resolve this issue to snyc all the data from the source table not just the first 500 items? Thanks!

2 Likes

Hey Zsolt! We were in touch directly but I figured I’d add my reply here as well to help anyone else who comes across this.

In order to handle syncing more than 500 rows, you have to update two lines in that script: the one reading source rows, and the one writing to the target table. I haven’t fully tested this, but you would replace the CodaAPI.listRows() line with this:

  var sourceRows = [];
  var pageToken;
  do {
    response = CodaAPI.listRows(source.doc, source.table, {limit: 500, pageToken: pageToken, useColumnNames: true});
    sourceRows = sourceRows.concat(response.items);
    pageToken = response.nextPageToken;
  } while (pageToken);

and then replace the CodaAPI.upsertRows() line with:

for (var i = 0; i < upsertBodyRows.length; i += 500) {
  CodaAPI.upsertRows(target.doc, target.table, {rows: upsertBodyRows.slice(i, i + 500), keyColumns: [TARGET_TABLE_SOURCE_ROW_COLUMN]});
}

One thing to be aware of at this time, more so for large tables, is that anyone making some changes during the sync could potentially lead to some missing data. It’s usually a very small chance (you pull first 500 rows, then someone deletes row #500, so you end up syncing the 500th row twice). There are things we could do on our end with nextPageToken to have it take into account the time of the original query, but we punted on that for now since most API usage we’ve seen for this scenario was syncing <500 rows.

2 Likes

Oleg, thanks a lot!

I just replaced the two lines with the snippets from above as you described and it works perfectly for me. :metal:

2 Likes