I’m trying to parse through some raw tweet text, which contains the Twitter t.co shortened URLs, and replace them with the non-shortened links, based on a JSON payload. Currently I’m…stuck. The one path I thought to explore was just pulling the JSON payload apart into other columns, and then trying to do a replace, but I get stuck when there are multiple entities (URLs).
Here is the tweet text (ignore the super spammy crap, this was just a good example).
@MartiniGuyYT Sappchat Token (APP)
The world’s first Decentralized Messaging App.
💎Sapp-Chat App✅
💎Sapp-DeFi✅
💎Sapp-Wallet✅
💎Sapp-Swap✅
Download app now on Googleplay;https://t.co/IViZwnbixz
Applestore;https://t.co/QJmkaNr6Sp
Cmc: https://t.co/rWw42hTZKz
TG : https://t.co/PzxEKUi7sp
The Tweet entities payload straight from the Twitter API for this tweet:
{
"hashtags": [],
"symbols": [],
"user_mentions": [
{
"screen_name": "MartiniGuyYT",
"name": "That Martini Guy ₿",
"id_str": "782946231551131648",
"indices": [
0,
13
]
}
],
"urls": [
{
"url": "https://t.co/IViZwnbixz",
"expanded_url": "https://play.google.com/store/apps/details?id=com.sappchat",
"display_url": "play.google.com/store/apps/det…",
"indices": [
169,
192
]
},
{
"url": "https://t.co/QJmkaNr6Sp",
"expanded_url": "https://apps.apple.com/fi/app/sappchat/id1569053748",
"display_url": "apps.apple.com/fi/app/sappcha…",
"indices": [
204,
227
]
},
{
"url": "https://t.co/rWw42hTZKz",
"expanded_url": "https://coinmarketcap.com/currencies/sappchat/",
"display_url": "coinmarketcap.com/currencies/sap…",
"indices": [
233,
256
]
},
{
"url": "https://t.co/PzxEKUi7sp",
"expanded_url": "https://t.me/sappchat",
"display_url": "t.me/sappchat",
"indices": [
262,
285
]
}
]
}
Right now I have the JSON going into a series of other columns and parsing out some of the arrays with:
ParseJSON(thisRow.[Tweet entities],path: "$.urls[*]")
But I’m finding this is pretty tedious, and I still haven’t solved for how to do the replacement with each instance of the short URL.
The desired outcome is to replace the t.co
and link to the expanded_url
but use the display_url
as the …well…display.
Any thoughts on the best way to go about this?