Question :
I’ve got some example Python code that I need to mimic in C++. I do not require any specific solution (such as co-routine based yield solutions, although they would be acceptable answers as well), I simply need to reproduce the semantics in some manner.
Python
This is a basic sequence generator, clearly too large to store a materialized version.
def pair_sequence():
for i in range(2**32):
for j in range(2**32):
yield (i, j)
The goal is to maintain two instances of the sequence above, and iterate over them in semi-lockstep, but in chunks. In the example below the first_pass
uses the sequence of pairs to initialize the buffer, and the second_pass
regenerates the same exact sequence and processes the buffer again.
def run():
seq1 = pair_sequence()
seq2 = pair_sequence()
buffer = [0] * 1000
first_pass(seq1, buffer)
second_pass(seq2, buffer)
... repeat ...
C++
The only thing I can find for a solution in C++ is to mimic yield
with C++ coroutines, but I haven’t found any good reference on how to do this. I’m also interested in alternative (non general) solutions for this problem. I do not have enough memory budget to keep a copy of the sequence between passes.
Answer #1:
Generators exist in C++, just under another name: Input Iterators. For example, reading from std::cin
is similar to having a generator of char
.
You simply need to understand what a generator does:
- there is a blob of data: the local variables define a state
- there is an init method
- there is a “next” method
- there is a way to signal termination
In your trivial example, it’s easy enough. Conceptually:
struct State { unsigned i, j; };
State make();
void next(State&);
bool isDone(State const&);
Of course, we wrap this as a proper class:
class PairSequence:
// (implicit aliases)
public std::iterator<
std::input_iterator_tag,
std::pair<unsigned, unsigned>
>
{
// C++03
typedef void (PairSequence::*BoolLike)();
void non_comparable();
public:
// C++11 (explicit aliases)
using iterator_category = std::input_iterator_tag;
using value_type = std::pair<unsigned, unsigned>;
using reference = value_type const&;
using pointer = value_type const*;
using difference_type = ptrdiff_t;
// C++03 (explicit aliases)
typedef std::input_iterator_tag iterator_category;
typedef std::pair<unsigned, unsigned> value_type;
typedef value_type const& reference;
typedef value_type const* pointer;
typedef ptrdiff_t difference_type;
PairSequence(): done(false) {}
// C++11
explicit operator bool() const { return !done; }
// C++03
// Safe Bool idiom
operator BoolLike() const {
return done ? 0 : &PairSequence::non_comparable;
}
reference operator*() const { return ij; }
pointer operator->() const { return &ij; }
PairSequence& operator++() {
static unsigned const Max = std::numeric_limts<unsigned>::max();
assert(!done);
if (ij.second != Max) { ++ij.second; return *this; }
if (ij.first != Max) { ij.second = 0; ++ij.first; return *this; }
done = true;
return *this;
}
PairSequence operator++(int) {
PairSequence const tmp(*this);
++*this;
return tmp;
}
private:
bool done;
value_type ij;
};
So hum yeah… might be that C++ is a tad more verbose 🙂
Answer #2:
In C++ there are iterators, but implementing an iterator isn’t straightforward: one has to consult the iterator concepts and carefully design the new iterator class to implement them. Thankfully, Boost has an iterator_facade template which should help implementing the iterators and iterator-compatible generators.
Sometimes a stackless coroutine can be used to implement an iterator.
P.S. See also this article which mentions both a switch
hack by Christopher M. Kohlhoff and Boost.Coroutine by Oliver Kowalke. Oliver Kowalke’s work is a followup on Boost.Coroutine by Giovanni P. Deretta.
P.S. I think you can also write a kind of generator with lambdas:
std::function<int()> generator = []{
int i = 0;
return [=]() mutable {
return i < 10 ? i++ : -1;
};
}();
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;
Or with a functor:
struct generator_t {
int i = 0;
int operator() () {
return i < 10 ? i++ : -1;
}
} generator;
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;
P.S. Here’s a generator implemented with the Mordor coroutines:
#include <iostream>
using std::cout; using std::endl;
#include <mordor/coroutine.h>
using Mordor::Coroutine; using Mordor::Fiber;
void testMordor() {
Coroutine<int> coro ([](Coroutine<int>& self) {
int i = 0; while (i < 9) self.yield (i++);
});
for (int i = coro.call(); coro.state() != Fiber::TERM; i = coro.call()) cout << i << endl;
}
Answer #3:
Since Boost.Coroutine2 now supports it very well (I found it because I wanted to solve exactly the same yield
problem), I am posting the C++ code that matches your original intention:
#include <stdint.h>
#include <iostream>
#include <memory>
#include <boost/coroutine2/all.hpp>
typedef boost::coroutines2::coroutine<std::pair<uint16_t, uint16_t>> coro_t;
void pair_sequence(coro_t::push_type& yield)
{
uint16_t i = 0;
uint16_t j = 0;
for (;;) {
for (;;) {
yield(std::make_pair(i, j));
if (++j == 0)
break;
}
if (++i == 0)
break;
}
}
int main()
{
coro_t::pull_type seq(boost::coroutines2::fixedsize_stack(),
pair_sequence);
for (auto pair : seq) {
print_pair(pair);
}
//while (seq) {
// print_pair(seq.get());
// seq();
//}
}
In this example, pair_sequence
does not take additional arguments. If it needs to, std::bind
or a lambda should be used to generate a function object that takes only one argument (of push_type
), when it is passed to the coro_t::pull_type
constructor.
Answer #4:
All answers that involve writing your own iterator are completely wrong. Such answers entirely miss the point of Python generators (one of the language’s greatest and unique features). The most important thing about generators is that execution picks up where it left off. This does not happen to iterators. Instead, you must manually store state information such that when operator++ or operator* is called anew, the right information is in place at the very beginning of the next function call. This is why writing your own C++ iterator is a gigantic pain; whereas, generators are elegant, and easy to read+write.
I don’t think there is a good analog for Python generators in native C++, at least not yet (there is a rummor that yield will land in C++17). You can get something similarish by resorting to third-party (e.g. Yongwei’s Boost suggestion), or rolling your own.
I would say the closest thing in native C++ is threads. A thread can maintain a suspended set of local variables, and can continue execution where it left off, very much like generators, but you need to roll a little bit of additional infrastructure to support communication between the generator object and its caller. E.g.
// Infrastructure
template <typename Element>
class Channel { ... };
// Application
using IntPair = std::pair<int, int>;
void yield_pairs(int end_i, int end_j, Channel<IntPair>* out) {
for (int i = 0; i < end_i; ++i) {
for (int j = 0; j < end_j; ++j) {
out->send(IntPair{i, j}); // "yield"
}
}
out->close();
}
void MyApp() {
Channel<IntPair> pairs;
std::thread generator(yield_pairs, 32, 32, &pairs);
for (IntPair pair : pairs) {
UsePair(pair);
}
generator.join();
}
This solution has several downsides though:
- Threads are “expensive”. Most people would consider this to be an “extravagant” use of threads, especially when your generator is so simple.
- There are a couple of clean up actions that you need to remember. These could be automated, but you’d need even more infrastructure, which again, is likely to be seen as “too extravagant”. Anyway, the clean ups that you need are:
- out->close()
- generator.join()
- This does not allow you to stop generator. You could make some modifications to add that ability, but it adds clutter to the code. It would never be as clean as Python’s yield statement.
- In addition to 2, there are other bits of boilerplate that are needed each time you want to “instantiate” a generator object:
- Channel* out parameter
- Additional variables in main: pairs, generator
Answer #5:
You should probably check generators in std::experimental in Visual Studio 2015 e.g: https://blogs.msdn.microsoft.com/vcblog/2014/11/12/resumable-functions-in-c/
I think it’s exactly what you are looking for. Overall generators should be available in C++17 as this is only experimental Microsoft VC feature.
Answer #6:
Using range-v3:
#include <iostream>
#include <tuple>
#include <range/v3/all.hpp>
using namespace std;
using namespace ranges;
auto generator = [x = view::iota(0) | view::take(3)] {
return view::cartesian_product(x, x);
};
int main () {
for (auto x : generator()) {
cout << get<0>(x) << ", " << get<1>(x) << endl;
}
return 0;
}
Answer #7:
If you only need to do this for a relatively small number of specific generators, you can implement each as a class, where the member data is equivalent to the local variables of the Python generator function. Then you have a next function that returns the next thing the generator would yield, updating the internal state as it does so.
This is basically similar to how Python generators are implemented, I believe. The major difference being they can remember an offset into the bytecode for the generator function as part of the “internal state”, which means the generators can be written as loops containing yields. You would have to instead calculate the next value from the previous. In the case of your pair_sequence
, that’s pretty trivial. It may not be for complex generators.
You also need some way of indicating termination. If what you’re returning is “pointer-like”, and NULL should not be a valid yieldable value you could use a NULL pointer as a termination indicator. Otherwise you need an out-of-band signal.
Answer #8:
Something like this is very similar:
struct pair_sequence
{
typedef pair<unsigned int, unsigned int> result_type;
static const unsigned int limit = numeric_limits<unsigned int>::max()
pair_sequence() : i(0), j(0) {}
result_type operator()()
{
result_type r(i, j);
if(j < limit) j++;
else if(i < limit)
{
j = 0;
i++;
}
else throw out_of_range("end of iteration");
}
private:
unsigned int i;
unsigned int j;
}
Using the operator() is only a question of what you want to do with this generator, you could also build it as a stream and make sure it adapts to an istream_iterator, for example.