Search code examples
pythonyoutube-dl

What are the parameters to self._search_regex?


youtube-dl has in their CONTRIBUTING documentation

description = self._search_regex(
    r'<span[^>]+id="title"[^>]*>([^<]+)<',
    webpage, 'description', fatal=False)

What are the parameters to _search_regex? The documentation doesn't show what 'description' is? Is that an HTML attribute?


Solution

  • As an internal function (it starts with an underscore), it is not well-documented, but you can find its definition in the source code.

    _search_regex is a utility function that basically calls re.search, but unifies handling in the case the regular expression does not match. This is important as many extractors use regular expressions and it would be tiresome (not to mention a huge code duplication) to replicate the error handling all over the place.

    Here are its parameters:

    • pattern: The regular expression being searched. For instance something like r'(?:foo|href)\s*=\s*(http://[^"]*)". Usually, the first captured group (i.e. the stuff in parentheses, but not beginning with ?:. For more information on regular expressions, consult the Python standard library documentation.
    • string: The string to search in (i.e. the haystack), downloaded from the service you are connecting to.
    • name: A name you chose; this is presented to the user if something fails. Should be unique withing your extractor. Examples are 'manifest URL' or 'content section'. That way, you know immediately where the problem lies if a user posts an error message without the stack trace.
    • default=NO_DEFAULT: Default value. Sometimes, there is a default in case the regexp doesn't match. If so, pass it in here.
    • fatal=True: If no default is given, this determines the behavior if the regular expression fails to match. True: abort extraction and throw a detailed error; for instance if extracting the video URL fails. False: Only omit a warning and go on; if searching for an optional field (e.g. description) fails.
    • flags=0 - Explicit regular expression flags. Rarely used; see the Python standard library documentation for more information.
    • group=None - Match a different group but the first one. Rarely used, only sensible if your regular expression contains named groups. Refer to the Python standard library documentation (keyword named groups) for more details.