Search code examples
css-selectorsweb-scripting

Can't create appropriate selector to grab names


How can I grab all the names under "Followed by" category from the below elements. The way the name have been laid out is kinda foreign to me that is why I can't get all of them. I have already used a selector which is capable of grabbing the first one. However, I expect to have all the names under "Followed by" title and until "edited_into" title. Thanks in advance.

Here is a link to the elements within which the required names are: https://www.dropbox.com/s/nzmvfc75szlgyyn/elements%20for%20expression.txt?dl=0

The selector I've tried with:

a#followed_by+h4.li_group+div.odd a

The result I'm having is only the first name:

Star Trek V: The Final Frontier

Btw, my only intention is to parse the names using this selector not to style.


Solution

  • The selector you have is almost correct.

    a#followed_by+h4.li_group ~ div.soda a
    

    The ~ works differently to the + in that it will select any matching element after the first part of the selector, whereas the + will only select elements immediately following the first part of the selector. Of course, by "first part" I am referring to a#followed_by+h4.li_group.

    I've also changed the selector to find div.soda rather than div.odd so you get all relevant elements, rather than just the odd ones.


    Because of the way CSS selectors work, we can't ask for "only elements up until edited_into". We can, however resolve this using JavaScript.

    A simple for loop with a conditional break will be the simplest method.

    var titles = [];
    var items = document.querySelectorAll('a#followed_by+h4.li_group ~ div.soda a');
    //firstEditedIntoItem represents the first element in
    //the `edited_into` section. If it's null, then there
    //is none, and we have nothing to worry about.
    var firstEditedIntoItem = 
        document.querySelector
        ('a#edited_into+h4.li_group+div.soda a, a#spin_off_from+h4.li_group+div.soda a');
        //   Note: This selector will find the first instance that matches any of the
        //   specified sections. You could add another and replace `spin_off_from` with
        //   another section id.
    for(var i=0; i<items.length; i++) {
        var it = items[i];
        //don't accept elements after 'edited_into'
        // If firstEditedIntoItem is null, it will not match either.
        if(it==firstEditedIntoItem) break;
        titles.push(it.textContent);
    }
    console.info(titles.join('\n'));