I saw this passage in the textbook of Computer System(Published by Tsinghua University Press, ISBN: 978-7-302-53021-3, Page 25):
the larger the radix of the mantissa of floating-point number, the more numbers it can represent.
I find this a little difficult to understand. My idea is that no matter how large the the radix of the mantissa is, the number of numbers that floating-point numbers can represent is determined by the mantissa. If the mantissa part can represent 210 numbers, no matter how the the radix of the mantissa changes, it just makes the 210 numbers bigger or smaller together, and does not change the quantity.
How should I understand this sentence?
The page you linked to does not appear to be from a textbook. It appears to be somebody asking a question and two people answering. One of them says that the higher a radix is (using 1, 2, 4, and 8 as examples), the more significands (the preferred term instead of “mantissa”) are normal instead of denormalized, given a fixed width for the significand. This is sort of true, but it does not mean more numbers can be represented.
An issue about denormal significands is they overlap representations with other exponents, except at the low end of the exponent range. For example, in base four, 0.123•41 equals 1.230•40, so there are two representations of that number. Given 28 = 256 significands (four digits of two bits each) and an exponent range of, say, 100 values, we have 256•100 = 25,600 representations of numbers. However, in 99 of the exponent values, all the significands with a first digit of 0 overlap a representation with a lower exponent. So there are 26•99 = 6,336 duplicates, leaving only 25,600−6,336 = 19,264 unique representable numbers.
If we use base two, there are still 28 = 256 significands (now grouped as eight digits of one bit each), so still 25,600 representations of numbers. In 99 of the exponent values, all the significands with a first digit of 0 overlap a representation with a lower exponent. But now that first digit is just one bit, so there are 27•99 = 12,672 duplicates, leaving 25,600−12,672 = 12,928 unique representable numbers.
So, yes, in general, if we allow any values in the significands and have a fixed number of bits for the significands and a fixed exponent range, then higher radixes allow us to represent more numbers. Maximum efficiency would be reached when the radix equals or exceeds the number of values the significand can have, at which point the only duplicate would be significands of zero. Then the floating-point number would be a single digit in some huge base multiplied by that base to a power. One problem with this format is it loses precision over much of the interval covered by a single exponent. For example, in base 100, significands would range from 00 to 99. We could represent 12 as 12•1000. But, when we want to multiple that by 10, we cannot represent 120. The closest we could get is 01•1001 = 100.
However, with base two, there is a trick we can use: We do not use any representations with a first digit of zero, except that we set aside one exponent code to handle tiny numbers. Whenever the exponent code is “normal”, the first digit of the significand must be 1, and we do not store it separately. (It is known from the exponent code.) When the exponent code is the special set-aside value, we instead treat it as the lowest exponent but it indicates the first digit of the significand is zero. Then there is no overlap between representable numbers. Continuing the example with 100 exponent values, for 99 exponent codes, there are 256•99 = 25,344 representable values, all different from each other, and, for one exponent code, there are 256•1 = 256 representable values, all different from each other and the former values. So there are 25,344+256 = 25,600 unique representable values, perfect efficiency.