Graphics

 View Only
  • 1.  Remove HTML tags from RSS Feeds

    Posted 03-23-2020 18:38

    Hello, thanks for looking at this. I work for a news org and create tickers using roll/crawls and infograph boxes from RSS feeds a lot, and most of the time the RSS feeds I use from our pool feed subscriptions are pretty clean. 

    Now I'm trying to set up an RSS roll/crawl playout of the CDC's 2019 Novel Coronavirus RSS Feed. Unfortunately, some times the #CDATA for the Summary rows come back with HTML Tags like <span>, <em>, or <a href="www....>, etc. 

    Here is the specific RSS Feed: http://tools.cdc.gov/podcasts/feed.asp?feedid=183

    Here are the types of tags I want to remove/eliminate altogether: https://www.w3schools.com/TAGS/default.ASP

    I'm looking for a way to script or ideally visual logic out all HTML tags. So basically, if I could find a formula to simply remove anything with and between these symbols <>, that would solve it. 

    Alternatively, is there something in Datalinq that would translate these in some way that's airable?


    Your help is much appreciated! Thanks!



  • 2.  RE: Remove HTML tags from RSS Feeds

    Posted 03-25-2020 08:58

    Hi Rob,

    In VL you can try something like this : 

     

    I don't know if it's the best solution but in simple cases it seems to work.

     

     


    #XPression


  • 3.  RE: Remove HTML tags from RSS Feeds

    Posted 04-16-2020 14:15

    If you prefer the coding side, try putting this into the OnSetText of the text object.  Works for me.  
    Good luck!

     

    if Text.contains("<") then
    Dim DoRec As Boolean = true
    Dim textOut As String = ""

    For l as integer = 1 To Text.Length
    Dim tmp As String = Mid(Text, l, 1)

    If tmp = ">" Then DoRec = True : textOut &= " " : tmp = ""
    If tmp = "<" Then DoRec = False : tmp = ""

    If DoRec = True Then
    textOut &= tmp
    End If

    Next

    Text = textOut
    end if


    #XPression