And even if you search for p or P, you'll not find р nor Р.
Why does Unicode use different codepoints for, say, a/A (latin) а/А (cyrillic)? What is the relevance of this from the software-design and i18n standpoint?
Let me explain where my curiosity lies:
a/A and а/А have the same identical shape, and not an entirely different pronounciation, so why are they not the same code point (well, the same two for upper case and lower case)?
The only reason I can think of, and which occurred to me only now that I ask the question, is that they belong to different alphabets, and the characters of a given alphabet are better if laid out sequentially, e.g. (in C++) assert(u'a' + 1 == u'b')
but assert(u'а' + 1 == u'б')
.
Is that the only true reason? Having alphabets occupy sequential codes?
This is all explained in Unicode Technical Note #26. In short:
Latin and Cyrillic are different scripts, even if some of their letters look very similar to one another. Visual appearance is not the only factor that defines a character’s identity, and letters are generally never unified across scripts. It would just make the Unicode Standard harder to use for everybody.