How to analyze word frequency in Coda or specific Docs?

I have create some docs ,greater than 30 docs within many texts
eg.

XXX doc name: 1 levels
---- 1st folder name : 2 levels
----- 10 docs : 3 levels
---- 2nd folder name :
----- 8 docs :
---- 3rd folder name :
----- 15 docs :

Suddenly the idea come , How to analyze word frequency in Coda ?
or export specific Docs ?

Funny, I just wrote about this yesterday -

Text analytics is an important idea and this gives us a good on-ramp to complain about the lack of API methods concerning text objects in sections. Ideally, we should be able to access text much the way we access data in the API. With the API complaint on the table, let’s focus on what can be done today.

Export Text, Analyze, Update Table

This approach, while simple, is not theoretical; I have two clients using this exact approach to perform certain NLP processing. It does require some extensive development effort, however.

Imagine you set up a Coda action that exports a section via the Gmail pack every two hours. The Gmail account (using Google Apps Script) sees the exported section content, parses it as plain text, and then performs the text analysis in Javascript or with the help of AI tools such as Dialog Flow. With the text analytics in hand, the same process writes the data back into the Coda Doc as data in a table. It’s a quick, sustainable integration process that runs silently to maintain text analytics in the Coda document.

Your requirements suggest the need for text analytics across multiple documents and this approach supports this idea because the API has access at the document level. It’s not unlike building a full-text search index across a collection of documents where you need to search for terms in both the text within sections as well as table rows and fields.

Coda-based Formulas and Filters

This [hypothetical] approach would be ideal (because it’s self-contained) but there are likely some challenges. I’ll bet someone like @Paul_Danyliuk has actually thought about this or has some ideas about it.

Text Analytics Pack?

If given the opportunity (i.e., pack development is made possible by Makers), this is one of the first packs I would develop myself. It seems that the Pack architecture is perfectly suited for this type of a plugin.

2 Likes

Thank you !
It seems you talk about technique , means how . Because based on my question is 'How to '.

I try to use text analysis or other tools( eg: http://textalyser.net/ , https://tagcrowd.com/ ) . I found the result is worse than human.

For example. I need to know " how many " How are you " in the texts.
but, the software give back to me that the data of every single word .

how — 20
are — 15
you ---- 10

I need the data of " How are you " …

So, I think the importance is algorithm and customize words function, not in or not in Coda.

I understand the requirements quite clearly. Since there is no apparent way to use native formulas to access Section texts, and there is also no API support for accessing text sections, you have to do pretty much what I said earlier:

Note - while there is certainly a Pack for Google’s NLP service, there’s no [apparent] way to utilize this Pack with section [document] text; formulas are limited to accessing data tables and their fields as far as I know.

Hi @Bill_French, actually Google Natural Language Pack APIs are not limited to fields only: they can process any text (https://coda.io/formulas#GoogleNaturalLanguage::AnalyzeEntities, for instance).
However, I think that this still doesn’t address @Steve_Yang’s requirement.

And please explain - how would you inject Section text into this analyzer?

Aside from table fields - as you did correctly suggest - from simple copy/paste, to doc’s named formulas, to Google Scripts.

I still don’t follow - what is a doc-named formula?

Hi @Bill_French,
I have the feeling this is going quite off-topic in regards with @Steve_Yang’s post.

Anyway, please find this as an example:


and:

If you believe we are in a constructive path, more than happy to open a specific topic.

Cheers.

@Federico_Stefanato:

I don’t think we’re off-topic at all. His original ask was…

I’m almost certain he is referring to “many texts” that are in sections. Ergo, Coda document content.

As such, my point was that given the current NLP Pack provided, there is no way to inject Coda document texts that people write in sections into the NLP methods to perform analysis. And the reason is that section texts – i.e., the collection of paragraphs within a section – are not addressable by Coda formulas or by Coda’s own API, or by the NLP Pack. Coda text “content” is presently a cul-de-sac.

So, when I saw this next comment from you, I thought you had some clever way of using NLP to process section text content.

I fully understand named formulas; I just didn’t recognize the context since I was expecting to see some way of accessing section text content.

But taking this a little deeper, indeed – Google Natural Language Pack APIs are not limited to fields only. However, aside from hard-coded text values and named formulas that access text in tables - neither of which can access broad section text content, the limitations are still severe - i.e., the NLP pack cannot be used with Coda document texts as written in section paragraphs.

We on the same page now?

From Steve’s perspective, he wants to create text analytics – and by “text” he means – from Coda document content. Presently, I know of only one way to do that - export the content via the Gmail pack and process said content behind the inbox of wherever it is being sent; then push the analytics back into a Coda document table (using the Coda API) and/or on to section content via named formulas.

Thanks @Federico_Stefanato and @Bill_French great discussion .
I don’t mind any off-topic. it’s doesn’t matter. I can be a bystander. I’m not coder , I don’t quite understand code.

I don’t much care do whether in Coda, another place is ok.

My requirement just is get specific words and sentences statistics ( eg. " how are you" ,“my name is steve " , like those sentences and " why “,” what” , like those words ) in the many specific doc files. (whatever TXT files or another )

Give us an example text in an actual Coda doc. I’d like to see how the text is stored.

haha. All is chinese . I’m not sure you’re whether understand. :slight_smile:

btw, I found the best Python Chinese word segmentation module.

but, I don’t know how to use that code in Coda. whatever.

First I have to export all that docs. 40+ sections. wow! and I don’t know how.

Dear @Steve_Yang,

Based on other posts related on non latin characters, it might be that Coda is not ready (yet) to deal with it.

I recommend to check with the Codans at support@coda.io before going on.

Otherwise the request is great :trophy: , please keep us updated :pray:

Thank you kind recommend.

I’m not problem with chinese in Coda. All is fine.

I’ll not use python in Coda. I’m not coder.

So, I don’t want to make a trouble to supporter.

1 Like