Search code examples
javascripthtmlangularrxjsrss

How do I parse html elements from an observable array of objects in rxjs


In angular 8 I am parsing a wordpress rss feed and using one of its properties, 'content' to build a news scroller. The rss feed is processed into a javascript object using rss-parser from node.js.

I need to parse out a http link, an image and a few chars of text from between paragraph (p) elements'. My problem is that the data I need is contained within the 'content' property and I don't know the encoding or how to parse out the link, image and text and place them into variables I can use within the observable.

Using Angular and rxjs I am able to derive an array of objects that includes each article and the property I need. const http$ = this.api.rssSource$(); which is from the angular api.service.ts that gets the feed and returns an observable. Then I map it down to the array of objects using this code:

this.newsItems$ = http$ .pipe( map(res => Object.values(res['items']))); and I get this array of 20 items

(20) [{…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}]

each object within the array above looks like this:

{content: "<a href="https://example.com/"><img width="300" height="200" src="https://example.com/some-image-300x200.jpeg" alt="blah blah blah" /></a><p>A lot of text about something and then something else</p><br /><p>jabber jabber and more jabber</p>↵<p><a href="https://example.com/example.html/" rel="nofollow">...Read More About Blah And Jabber</a></p>↵}

using <div [innerHTML]="item.content"></div> in the angular template I can render html with an image and a lot of text. However, it is not in the format I want and needs to be shortened and re-arranged. I only need the complete 'a href="https://xxx..." ', 'img src="http://xxx..." ' and a single 'p xxxx /p'.

How can I access the object so that I can then further parse it to populate variables for newsLink, newsImg, shortDes?


Solution

  • If what you wish to do is manipulate each Object in the emitted array, you can add an array map call inside the RxJS map call:

    this.newsItems$ = http$.pipe(
      map(res => Object.values(res['items']).map(item => {
        // do item modification here
      }))
    );
    

    This will return the modified array. Alternatively, you can split the array and have it emitted as individual values, and then RxJS map them to modify them:

    this.newsItems$ = http$.pipe(
      switchMap(items => from(items)),
      map(item => // manipulate individual items here)
    );
    

    As for the actual parsing itself, that can be achieved using regular expressions and the match() function:

    arrayOfAnchorTags = item.content.match(/<\s*a[^>]*>(.*?)<\/\s*\s*a>/g);