thread_local at block scope

What is the use of a thread_local variable at block scope?

If a compilable sample helps to illustrate the question, here it is:

#include <thread>
#include <iostream>

namespace My {
    void f(int *const p) {++*p;}
}

int main()
{
    thread_local int n {42};
    std::thread t(My::f, &n);
    t.join();
    std::cout << n << "\n";
    return 0;
}

Output: 43

In the sample, the new thread gets its own n but (as far as I know) can do nothing interesting with it, so why bother? Does the new thread's own n have any use? And if it has no use, then what is the point?

Naturally, I assume that there is a point. I just do not know what the point might be. This is why I ask.

If the new thread's own n wants (as I suppose) special handling by the CPU at runtime—perhaps because, at the machine-code level, one cannot access the own n in the normal way via a precalculated offset from the base pointer of the new thread's stack—then are we not merely wasting machine cycles and electricity for no gain? And yet even if special handling were not required, still no gain! Not that I can see.

So why thread_local at block scope, please?

References

Cppreference on thread_local and other storage classes
An earlier question: when exactly is a thread_local variable declared at global scope initialized?
Another earlier question: thread_local variables initialization
Yet another earlier question: the cost of thread_local

Solution

I find thread_local is only useful in three cases:

If you need each thread to have a unique resource so that they don't have to share, mutex, etc. for using said resource. And even so, this is only useful if the resource is large and/or expensive to create or needs to persist across function invocations (i.e. a local variable inside the function will not suffice).
An offshoot of (1) - you may need special logic to run when a calling thread eventually terminates. For this, you can use the destructor of the thread_local object created in the function. The destructor of such a thread_local object is called once for each thread that entered the code block with the thread_local declaration (at the end of the thread's lifetime).
You may need some other logic to be performed for each unique thread that calls it, but only once. For instance, you could write a function that registers each unique thread that called a function. This may sound bizarre, but I've found uses for this in managing garbage-collected resources in a library I'm developing. This usage is closely-related to (1) but doesn't get used after its construction. Effectively a sentry object for a thread's entire lifetime.