Split Text in half or by word count

Is there a way to write a formula that will break a block of text into chunks by word count? If not, maybe splitting it in half?

1 Like

Not quite sure what you mean but;

To get a list of words in a string you can split it on every space:
Split("This is a sentence.", " ") → `List(“This”, “is”, “a”, “sentence.”)

To split a string in two halves you could do:

WithName(
  "Hi there",
  text,
  WithName(
    text.Length() / 2,
    middle,
    List(text.Left(middle), text.Right(middle))
  )
)

List("Hi t", "here"), this doesn’t make much sense though :sweat_smile:

Can you provide an example of what you want to achieve?

1 Like

Thanks for your reply.

I have a canvas that has a 5000 word article.

My ideal goal would be to have a button that will split that article into two new canvases in two new rows with each being no more than 3000 words. If I had a 20,000 word article it would make 7 new rows/canvases. Then delete the original.

If that’s not possible then doing the same thing but with each new canvas being half of the original. Then I could split them again if needed.

Does that make sense? (I’m trying to build a csv to fine-tune GPT3)

Alright, so what should the rules be for where the splits can be?
Is the article divided into paragraphs with two consecutive newline characters perhaps?
Otherwise maybe it could split between the end and start of two sentences?

If possible I would like to split every 3000 words.

Alright! This splits a given string into a list of words then returns a list of lists containing words by given chunk number. The last sub-list will be a bit longer to prevent sub-lists’ word count from being less than the given chunk number:

WithName(
  Split("This is a sentence here", " "),
  Words,
  WithName(
    2,
    Chunk,
    WithName(
      Floor(Words.Count() / Chunk) - 1,
      [List Count],
      Sequence(0, [List Count])
        .FormulaMap(
          Words
            .Slice(
              CurrentValue * Chunk + 1,
              If(
                CurrentValue = [List Count], -1, (CurrentValue + 1) * Chunk
              )
            )
        )
    )
  )
)

This example would result in [["This", "is"], ["a", "sentence", "here"]]

1 Like

Got a working example for you, it was a bit trickier than anticipated but works nicely now! :ok_hand:

3 Likes

Cool! Thank you so much!!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.