Regex Capturing Groups

I have a table with many similar Body texts as show in this image where I’m wanting to take out specific groups from the text. Coda doesn’t seem have the ability with regex to output the match cases, so what I have done in the past is match everything except what I want and use regexreplace() and replace with " ".

For this case my body texts often have multiple matches, and I have gotten these with regex capturing groups in python. Green highlight would be the capturing groups. Coda doesn’t seem to work with capturing groups which is making this difficult for me.

Not sure of the best way to solve this! Thanks for the help!

Edit: Here are some of the regex cases I’ve tried using in the regexreplace() function. Note they may not all work as regex expressions hahaha
.?(Apple Cider \d{4}|$)
(?:(?!Apple Cider \d{4}).)

.?Apple Cider\d{4}.
^(?:(?!Apple Cider\d{4}).)*$

do you need the RegexExtract() function?

see https://coda.io/d/CodaRegexTool_dOwRXfmW5Iu

hi @Lin_Silver ,

I am not into Regex that much, it is a powerful but different syntax compared to Coda.

Below the outcome I generated with the Cod Formula Language.

First we get the items that fit
Second we add the numbers
Third we apply a rule to get the desired outcome:

thisRow.item.Lower().Split(" ").WithName(theItems,

thisRow.body.Lower().Split(" ").Filter(
  CurrentValue.Contains(theItems) or
  CurrentValue.IsNumber()).WithName(Outcome,
    
 Sequence(1,Outcome.Count()).ForEach(
   SwitchIf(
  Outcome.Nth(CurrentValue).IsNumber().Not() and
  Outcome.Nth(CurrentValue +1).IsNumber().Not() and
  Outcome.Nth(CurrentValue + 2).IsNumber(),
    Format("{1} {2} {3}",
      Outcome.Nth(CurrentValue),
      Outcome.Nth(CurrentValue + 1),
      Outcome.Nth(CurrentValue + 2))).Filter(CurrentValue.IsNotBlank()).ToText()
 )))

As you can see I introduced lower to take care of capitals that maybe forgotten. This part you can remove it does not serve you.

Hope it helps, Cheers, Christiaan

Like Max said, there is one more (experimental) regex function, but it only returns the first match

RegexExtract(Body, [Wanted text] + " \d{4}") → Apple Cider 2023

@Xyzor_Max is right, there is a hidden RegexExtract function that can return you all regex matches when you use it with the g flag:

There’s no way to separately extract capturing groups though; this formula will always return the whole match (either the first one without the global flag, or a list of matches when "g" is specified). So unless you need to use backreferences, no need to bother about carefully setting up your capturing and non-capturing groups — you can simply just use brackets () for groups without the ?: ?<name> etc.

There are ways to do it with only the “official” RegexReplace function with some tricky lookaheads/lookbehinds, but that would be a lot of bother. I’d say just use RegexExtract — it’s been working for years already and I don’t think they are changing it any time soon.

1 Like

EDIT: @Lin_Silver, sorry I didn’t read your question well and my below reply doesn’t really answer your question.

The most simple approach will be the Coda way @Lin_Silver

As you see there are more road to Rome

1 Like

also…

check out @Eric_Koleda’s pack

4 Likes

that is a most interesting contribution, merci @Xyzor_Max !

This summer we would have used AI to find and replace alike texts in large texts, that is no longer very well doable.

2 Likes

i may be old-fashioned, but i have a bad feeling about using AI to do things that can be done with a few lines of code or a formula.

for now, there is the cost aspect. but over time, that will drop towards zero.

but then there is the deterministic issue; my formula or few-lines-of-code approach can be designed and tested to prove (to my satisfaction or a client’s) that it will ALWAYS return a suitable result.

but AI LLMs are way less deterministic, especially as implemented in Coda. i cannot set the temperature to zero (so there is always randomness in the response) and i cannot even define the exact model version to be used (responses change as you move to newer model releases). in fact, i suspect that Coda even uses DIFFERENT LLM models from one instance to another.

i LOVE to use AI (especially Coda AI) for those tasks that benefit from the unique strengths of LLMs. and in the future i hope Coda will provide embedding features as well.

but some day (not too far off), general AI interfaces will overcome this, and i will be able to define the level of determinism i need, and the cost will be negligible.

4 Likes

I would not say you are old fashioned in this matter, what can be done easily with code is the way to go. however only few people master Regex, so for most users AI would be a good alternative. My proposal works as well, but is a bit more complicated than the Regex and it works only in this specific setting (two words with one number), which makes it weak.

I guess that settings like temperature (there are more) will be added over time and I hope they allow us to use our own keys one day. That having said, I am not sure that AI will become cheap, I doubt it actually.

I also believe that AI’s will become far better than today, so the prompts that fails us today, may helps us out in 3 months from now.

Over time, we may find a group of users ready to accept little mistakes in outcomes when the costs are very low, those who want reliable outcomes will keep using formulas in many cases, even when not strictly required and when the costs are high.

the pack will do the job I guess in this case!

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.