How to extract @People and other @-references from a Text cell

A client asked me today how he could take a text field like this and extract people from it:

image

The “straightforward” ideas like using Split didn’t work. Links and @-references in formatted text are unlike the rest of the text: technically they occupy just one empty character and instead have an extended encoded descriptor telling what should be displayed in place of that empty character. Running a formatted text field through just strips those descriptors and that’s what you get:

While splitting by space didn’t work, splitting by empty string to extract individual characters actually did:

I tried to find a way to Filter() out the regular characters and keep only the @-refs and came up with this solution: add and remove a character from each letter and see if it stays the same:

This actually worked and I had @-references listed in my cell. There was just one problem: it was still just formatted text values, not actual People objects or so. Besides these formatted text values retained paragraph formatting (bullets) of the original text, which is not very nice.

I knew I had to go and dissect the rich text object (aka slate) and extract references from its descriptor. Here’s ultimately how each formatted text value is encoded (see children of different types):

I extracted portions of the descriptor marked in red: objects of type InlineStructuredValue or actually their value properties. For that I used the infamous Something._Merge() + "" idiom to get the descriptor as JSON, and then ParseJson(descriptor, path) to extract interesting pieces from that descriptor as objects. TIL that the path parameter now supports JsonPath, which allows writing complex queries like “Get all values of children whose type is this or that, and additionally the ObjectID of those values is the People table”.

This is how I got all @-references:

And this is how I got just the People and formatted the column as multi-select People column:

Here’s the known catch. If you directly parse control objects (the values containing references) as top-level objects, Coda simply treats them as regular plain objects with no special meaning (i.e. doesn’t turn them into proper references):

It will, however, properly “unpack” objects at least one level deeper than the root. So the catch is to use ParseJson not on the values but on objects one level higher (in our case the children) and then extract values through _Deref_Object() function.

In the above example I extracted all @-refs. To only limit @-refs to People, I had to add a filter clause on the JsonPath to only limit found children to those where value.ObjectId is Global-Document-People, a hidden system table containing the list of Coda users in the doc:


This is a rather Pro hack but I decided to post it anyway. My goal here was to tell you about JsonPath, the thing I just discovered myself today :slight_smile: and show how flexible it was with regard to extracting just the right items from a complicated JSON tree.

:bird: Follow me at https://twitter.com/codatricks
:email: Subscribe to the newsletter at codatricks.com
:money_mouth_face: There’s Patreon, too

:mortar_board: I’m building a Coda Fundamentals course, Maker Funded by Coda. Watch either of the above to know when it launches. After my course you’ll be prepared to discover solutions like this one yourself.

4 Likes

Very cool. How brittle do you think are the dependencies on these internal data structures? Looking for you or Codans to comment. I love the approach; just wondering if this is going to cause regret in the future.

Well, they used to have a different object there earlier (Ranges) and they changed it to Slate in that Editor upgrade they were talking about. Slate is now a common model for both rich text values and canvases.

I don’t think they’re changing it in the foreseeable future. But then again, this is not official so Coda is under no obligation to keep this working as is. It’s your own responsibility to adjust the code if you rely on hacks like this.

1 Like