This is more of a lesson in efficiency. Inspired by this post:
Task: there’s a JSON with extremely nested structure. Need to extract all values for the key "text"
anywhere in the JSON tree.
The first instinct would be to parse the JSON into objects and try to traverse the tree somehow. However, in this very scenario, "text"
entries don’t all appear at the same level of nesting. So traversing and collecting them would require composing some complicated logic in Coda.
Efficient solution: treat JSON as plain text and simply extract pieces of data with regular expressions:
MyJSON.RegexExtract('(?<="text"\ *:\ *")(?:\\"|[^"])*', "g")
Explaining regular expressions is well beyond the scope of this topic. But knowing them is a very useful skill. There are lots of tutorials on regular expressions, and websites like regex101.com that let you compose and test regular expressions online with explanations.
What this regex above means is:
- Read the text until you find a spot just after
"text" : "
(spaces are optional) - Now read the text until you find the closing quote
"
(and don’t stop at escaped quotes\"
).
This will be the string we want to extract (i.e. the value after the"text"
key) - Repeat this for the rest of the text (as instructed by the ‘Global’
g
flag)
RegexExtract
is an experimental hidden formula. It is handy because it returns a List of strings, which you can then process, e.g. unescape newlines and special chars, and join all strings together with newlines:
thisRow.[My JSON].RegexExtract('(?<="text"\ *:\ *")(?:\\"|[^"])*', "g").FormulaMap(
CurrentValue.RegexReplace("\\n", Character(10)).RegexReplace("\\(.)", '$1')
).Join(Character(10))
It is possible to achieve the same logic with RegexReplace
to avoid using an experimental formula. But that would be a bit trickier and less of a presentation in how to use outside-the-box approaches to solve one’s problems more efficiently.
Example (with a very basic JSON really — the above would work with a much more nested tree too):