youtube-dl
has in their CONTRIBUTING documentation
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', fatal=False)
What are the parameters to _search_regex
? The documentation doesn't show what 'description'
is? Is that an HTML attribute?
As an internal function (it starts with an underscore), it is not well-documented, but you can find its definition in the source code.
_search_regex
is a utility function that basically calls re.search
, but unifies handling in the case the regular expression does not match. This is important as many extractors use regular expressions and it would be tiresome (not to mention a huge code duplication) to replicate the error handling all over the place.
Here are its parameters:
pattern
: The regular expression being searched. For instance something like r'(?:foo|href)\s*=\s*(http://[^"]*)"
. Usually, the first captured group (i.e. the stuff in parentheses, but not beginning with ?:
. For more information on regular expressions, consult the Python standard library documentation.string
: The string to search in (i.e. the haystack), downloaded from the service you are connecting to.name
: A name you chose; this is presented to the user if something fails. Should be unique withing your extractor. Examples are 'manifest URL'
or 'content section'
. That way, you know immediately where the problem lies if a user posts an error message without the stack trace.default=NO_DEFAULT
: Default value. Sometimes, there is a default in case the regexp doesn't match. If so, pass it in here.fatal=True
: If no default is given, this determines the behavior if the regular expression fails to match. True
: abort extraction and throw a detailed error; for instance if extracting the video URL fails. False
: Only omit a warning and go on; if searching for an optional field (e.g. description) fails.flags=0
- Explicit regular expression flags. Rarely used; see the Python standard library documentation for more information.group=None
- Match a different group but the first one. Rarely used, only sensible if your regular expression contains named groups. Refer to the Python standard library documentation (keyword named groups) for more details.