Search code examples
rstringr

Strange answer from stringr function


I'm learning about regular expressions with the stringr package. An exercise for myself was to find the number of strings containing a certain substring. The right way to do this is

length(str_subset(words,'ing$'))

Along the way I incorrectly tried

length(str_view(words,'ing$'))

That second example gave the incorrect result 8. But in trying different things, that second command always gave the result of 8 regardless of what I was searching and which regex I was trying to match.

Why was I always getting the answer 8 in the second case? What is it finding the length of?

I tried several different sets of strings and always got the same answer. I figured out how to do it correctly, but was surprised the wrong way I tried always gave the same number 8.


Solution

  • str_view isn't a string output, it's an object which has 8 fixed fields that help it draw the diagram e.g. names(str_view(...)) gives you those objects

    [1] "x"             "width"         "height"        "sizingPolicy" 
    [5] "dependencies"  "elementId"     "preRenderHook" "jsHooks" 
    

    Can see in the str_view(...)$x$html value the ones where the string has matched:

    str_view(letters[1:3], 'a')$x$html
    <ul>
      <li><span class='match'>a</span></li>
      <li>b</li>
      <li>c</li>
    </ul>
    

    Hope that helps :)