What is the purpose of the <figure> element?

I have been trying to understand the <figure> element; take a look at this from w3.org:

Self-contained in this context does not necessarily mean independent. For example,each sentence in a paragraph is self-contained; an image that is part of a sentence would be inappropriate for figure, but an entire sentence made of images would be fitting.

How can an image can be part of a sentence? What is this talking about? I have read many explanations, but yet I don't understand why I would want to use this element. What is the purpose of this tag?

Solution

HTML5 semantic tags are cryptic for everyone. Since the concepts are kept abstract and scientific in purpose, I will use a bunch of over-simplification to make it understandable, in spite of HTML5 gurus telling me how wrong I am.

Remember that semantic tags are created to be read by computers, not humans. They exist so that google's scripts (and everyone's scripts, but mainly google and hacker bots) can quickly search for figures and index them. To understand semantic tags, think like a script, not as a human.

"If I was a bot programmed by a hacker, how would I mine figures when crawling my website?"

If we code a script that looks for IMG tags it will end up gathering a bunch of garbage (i.e. icons). What we're interested in is real content, that's what "self-contained content" means: you cut & paste this element from your website into my gallery of mined images in your HTML, and it's useful.

Garbage: icons, decorative images, images created by javascript plugins to prettify things, smileys, etc.

Good finds: charts, photos, diagrams, maps, drawings, etc.

The "Good finds" make sense even if you steal them from your webpage with no context whatsoever, except things like the accompanying "caption" tag. These tags allow crawlers to associate your images with text tags, making categorization easier So you don't want to miss this content.

So we're interested in the figure caption, title, subtitle, whatever, and the figure should wrap all of this. The figure tag is not limited to images; it can contain text, video, audio, code blocks, anything as long as it's part of that "entity" in your document. So the following is a single figure:

Now, in documents like scientific papers you will often find stuff like this:

Although each image makes sense by itself, they make more sense as a group as they are connected by some kind of relationship; sometimes even the order matters (i.e. series of steps, or photos of states something goes through). That's a "sentence of figures" and you want to mine them together or you'll lose valuable information.

Our mining algorithm would have to understand your website's content with some kind of natural-language-processing AI, or it could just gather the FIGURE tags you provide for it. The later is good semantics.

Summarizing, imagine you're the Google AI algorithm mining figures in your code. Write your figure tags to make this script's job easy.