How have/would you design an function that on each call returns the next value in a nominated numeric range in lexicographical order of string representation...?
Example: range 8..203 --> 10, 100..109, 11, 110..119, 12, 120..129, 13, 130..139, ..., 19, 190..199, 20, 200..203, 30..79, 8, 80..89, 9, 90..99.
Constraints: indices 0..~INT_MAX, fixed space, O(range-length) performance, preferably "lazy" so if you stop iterating mid way you haven't wasted processing effort. Please don't post brute force "solutions" iterating numerically while generating strings that are then sorted.
Utility: if you're generating data that ultimately needs to be lexicographically presented or processed, a lexicographical series promises lazy generation as needed, reduces memory requirements and eliminates a sort.
Background: when answering this question today, my solution gave output in numeric order (i.e. 8, 9, 10, 11, 12), not lexicographical order (10, 11, 12, 8, 8) as illustrated in the question. I imagined it would be easy to write or find a solution, but my Google-foo let me down and it was trickier than I expected, so I figured I'd collect/contribute here....
(Tagged C++ as it's my main language and I'm personally particularly interested in C++ solutions, but anything's welcome)
Somebody voted to close this because I either didn't demonstrate a minimal understanding of the problem being solved (hmmmm!?! ;-P), or an attempted solution. My solution is posted as an answer as I'm happy for it to be commented on and regailed in the brutal winds of Stack Overflow wisdom.... O_o
This is a bit of a mess, so I'm curious to see how other people tackle it. There are so many edge cases explicitly handled in the increment operator!
For range low
to high
:
high
are always followed by 0-appended versions (e.g. 12->120)high
that end in 0-8 are followed by the next integerlow
has as many digits as high
, you finish after high
(return sentinel high + 1
)
999...
with one less digit than high
low
template <typename T>
std::string str(const T& t)
{
std::ostringstream oss; oss << t; return oss.str();
}
template <typename T>
class Lex_Counter
{
public:
typedef T value_type;
Lex_Counter(T from, T to, T first = -1)
: from_(from), to_(to),
min_size_(str(from).size()), max_size_(str(to).size()),
n_(first != -1 ? first : get_first()),
max_unit_(pow(10, max_size_ - 1)), min_unit_(pow(10, min_size_ - 1))
{ }
operator T() { return n_; }
T& operator++()
{
if (n_ == 0)
return n_ = 1;
if (n_ < max_unit_ && n_ * 10 <= to_)
return n_ = n_ * 10; // e.g. 10 -> 100, 89 -> 890
if (n_ % 10 < 9 && n_ + 1 <= to_)
return ++n_; // e.g. 108 -> 109
if (min_size_ == max_size_
? n_ == to_
: (n_ == max_unit_ - 1 && to_ < 10 * max_unit_ - 10 || // 99/989
n_ == to_ && to_ >= 10 * max_unit_ - 10)) // eg. 993
return n_ = to_ + 1;
// increment the right-most non-9 digit
// note: all-9s case handled above (n_ == max_unit_ - 1 etc.)
// e.g. 109 -> 11, 19 -> 2, 239999->24, 2999->3
// comments below explain 230099 -> 230100
// search from the right until we have exactly non-9 digit
for (int k = 100; ; k *= 10)
if (n_ % k != k - 1)
{
int l = k / 10; // n_ 230099, k 1000, l 100
int r = ((n_ / l) + 1) * l; // 230100
if (r > to_ && r / 10 < from_)
return n_ = from_; // e.g. from_ 8, r 20...
while (r / 10 >= from_ && r % 10 == 0)
r /= 10; // e.g. 230100 -> 2301
return n_ = r <= from_ ? from_ : r;
}
assert(false);
}
private:
T get_first() const
{
if (min_size_ == max_size_ ||
from_ / min_unit_ < 2 && from_ % min_unit_ == 0)
return from_;
// can "fall" from e.g. 321 to 1000
return min_unit_ * 10;
}
T pow(T n, int exp)
{ return exp == 0 ? 1 : exp == 1 ? n : 10 * pow(n, exp - 1); }
T from_, to_;
size_t min_size_, max_size_;
T n_;
T max_unit_, min_unit_;
};
I can count from 0 to 1 billion in under a second on a standard Intel machine / single threaded, MS compiler at -O2.
The same machine / harness running my attempt at Shahbaz's solution - below - takes over 3.5 second to count to 100,000. Maybe the std::set
isn't a good heap/heap-substitute, or there's a better way to use it? Any optimisation suggestions welcome.
template <typename T>
struct Shahbaz
{
std::set<std::string> s;
Shahbaz(T from, T to)
: to_(to)
{
s.insert(str(from));
for (int n = 10; n < to_; n *= 10)
if (n > from) s.insert(str(n));
n_ = atoi(s.begin()->c_str());
}
operator T() const { return n_; }
Shahbaz& operator++()
{
if (s.empty())
n_ = to_ + 1;
else
{
s.erase(s.begin());
if (n_ + 1 <= to_)
{
s.insert(str(n_ + 1));
n_ = atoi(s.begin()->c_str());
}
}
return *this;
}
private:
T n_, to_;
};
Perf code for reference...
void perf()
{
DWORD start = GetTickCount();
int to = 1000 *1000;
// Lex_Counter<int> counter(0, to);
Shahbaz<int> counter(0, to);
while (counter <= to)
++counter;
DWORD elapsed = GetTickCount() - start;
std::cout << '~' << elapsed << "ms\n";
}