August 23, 2020

Standard library development made easy with C++20

When he created Go as a knee-jerk Reaction to C++11, Rob Pike proclaimed Less is exponentially more. While it is anyone’s guess what that is supposed to mean, in this blog post we will see how C++20 is less verbose, yet more expressive than prior C++ versions.

I am a big fan of concepts-driven designs. The use of concepts leads to less coupled, more generic interfaces that work with more types. Take std::tuple for example. A tuple can be constructed from a std::pair. But it can’t be constructed from a QPair or a boost::tuple. Nor can it be constructed from a std::array, even if std::array follows the tuple protocol. Isn’t that a bit arbitrary?

Shouldn’t instead construction and comparison support all objects following the tuple protocol? Especially for comparison, if a different part of my system uses a different tuple type (of the same arity), why should I need to do an expensive conversion to compare their value ?

A better question, of course, would be why is std::pair a distinct type from std::tuple?, the answer to that being legacy, as always. Could we fix it? Probably not.

But anyway, this led me to propose P2165 - Comparing pair and tuples. This, in turn, led me to look at libstdc++’s tuple implementation. That is the topic of today’s blog post.

std::tuple is a C++11 addition, and like many standard types, its goal of being a universal vocabulary type makes it quite challenging to implement.

You can also checkout this article which explain how std::tuple is implemented in C++17.

But since then the language changed a lot, and many new features target easier library development. Let see how C++20 can help us simplify std::tuple’s implementation.

Down with EBO

std::tuple can be implemented as a recursive tuple.

template <typename T, int index>
struct Wrapper {
    T type;
};

template <int index, typename T, typename... Tail>
struct Tuple
    : Wrapper<T, index>
    , Tuple<index + 1, Tail...>
{
};

Unfortunately, we do need to explicitly define constructors to be able to use that type, which ends up with 14 or so constructors and a few assignment operators. Half of those being for allocator support.

On top of that, libstdc++ optimises for empty types:

struct empty{};
static_assert(sizeof(std::tuple<empty, empty>) == 1);

To that end, it implements empty base class optimization

template <typename T, int index, bool /* is the class empty */>
struct Wrapper;

template <typename T, int index>
struct Wrapper<T, index, std::is_empty_v<T>> {
    // LOTS OF CONSTRUCTORS
};

template <typename T, int index>
struct Wrapper<T, index, !std::is_empty_v<T>> {
    // LOTS OF CONSTRUCTORS
    T type;
};

template <int index, typename T, typename... Tail>
struct Tuple
    : Wrapper<T, index>
    , TupleImpl<index + 1, Tail...>
{
    // LOTS OF CONSTRUCTORS
};

The actual libstdc++ implementation is a bit more complicated still. Overall, about 70 lines of code - mostly duplicated - are there to support empty objects. This small optimization is, therefore, anything but free in terms of maintenance. It also has some compilation cost, as more templates need to be instantiated. Is it even worth it?

In C++20 all of that complexity goes away:

template <typename T, int index>
struct Wrapper {
    [[no_unique_address]] T type;
};

template <int index, typename T, typename... Tail>
struct TupleImpl
    : Wrapper<T, index>
    , TupleImpl<index + 1, Tail...>
{
};

[[no_unique_address]] gives us the same optimization as Empty-Base-Class-Optimization, without adding any complexity to our design. Less code, more expressiveness, less maintenance.

C++20 is more implicit about being explicit.

Like watermelons are mostly water, std::tuple is mostly constructors. I counted 24 constructors in libstdc++’s implementation. Let’s remove some.

The standard specifies that a tuple constructor is explicit if any of its elements would have an explicit constructor.

struct S {};
struct E { explicit E(S); };
struct I { I(S); };

std::tuple<E> x = S{};  // ill-formed
std::tuple<I> x = S{};  // OK

Pretty neat. But how does it work ?

Simply by having defining 2 constructors, one explicit and one not, and SFINAE away the one that should not be considered:

template<bool _Cond, typename... _Args>
using _ImplicitCtor =
    __enable_if_t<_TCC<_Cond>::template __is_implicitly_constructible<_Args...>(), bool>;

template<bool _Cond, typename... _Args>
using _ExplicitCtor =
    __enable_if_t<_TCC<_Cond>::template __is_explicitly_constructible<_Args...>(), bool>;


template<bool _NotEmpty = (sizeof...(_Elements) >= 1),
    _ImplicitCtor<_NotEmpty, const _Elements&...> = true>
constexpr tuple(const _Elements&... __elements);

template<bool _NotEmpty = (sizeof...(_Elements) >= 1),
    _ExplicitCtor<_NotEmpty, const _Elements&...> = false>
explicit // This is the explicit constructor
constexpr
tuple(const _Elements&... __elements);

And this is for every single constructor! Don’t worry, this is not the way, for in C++20 explicit was amended to support another form: explicit(boolean-expression). As such, explicit(true) is equivalent to explicit while explicit(false) is not explicit. We can then rewrite the above code as:

template<bool _NotEmpty = (sizeof...(_Elements) >= 1)>
explicit(is_explicit_v<const _Elements&...>)
constexpr
tuple(const _Elements&... __elements);

This little trick lets us remove about 10 constructors and about 150 lines of codes. Don’t reinvent yourself!

Constrained copy-pasta

With so many constructors, std::tuple is a template SFINAE bonanza. So it is not surprising that the best way to improve std::tuple is to go all-in on C++ concepts. Or rather, on requires clauses.

This is the code in libstdc++ for the default constructor:

template<typename _Dummy = void, _ImplicitDefaultCtor<is_void<_Dummy>::value> = true>
constexpr
tuple()
noexcept(__and_<is_nothrow_default_constructible<_Elements>...>::value);

It can be rewritten using a requires clause easily: No std::enable_if wizardry.

explicit(!(__and_<__is_implicitly_default_constructible<_Elements>::value...>::value))
constexpr tuple()
noexcept((__and_<is_nothrow_default_constructible<_Elements>::value...>::value))
requires((__and_<std::is_default_constructible<_Elements>::value>...>::value));

If we look at the standard, the code above is a direct translation of the wording! Each Constraint in the wording draft can be converted to a C++ constraint!

Example image

One other thing we should do is to use fold expressions and template variables to cut down on the clutter. Both these are actually C++17 features!

explicit(!(__is_implicitly_default_constructible_v<_Elements> &&...))
constexpr tuple()
noexcept((std::is_nothrow_default_constructible_v<_Elements> &&...))
requires((std::is_default_constructible<_Elements> &&...));

Because requires clauses make it a lot easier to express constraint, and allow us to implement the standard line by line, we can remove a lot of code while increasing readability!

You could wonder about the cost of all these concepts on compile-time, that would be very legitimate indeed. Concepts are in fact a bit taxing on compilers. They are however not as costly than the equivalent enable_if machinery, so these transformations end up having no negative impact on compile times.

A pair by any other name would compile just as fast

Because a tuple is constructible from a std::pair, libstdc++ elects to specialize tuple for 2 elements, for the sake of these additional constructors. But what is a pair, if not a tuple? Following this logic, we could remove that entire specialization, a whopping 360 lines of code. So satisfying!

We could also keep the constructors for pair by simply adding a requires (sizeof...(_Elements) == 2) to these constructors, no need for specialization!

🛸 Nothing as futuristic as a spaceship!

Tuple provides all the comparison operators which are implemented in about 100 lines of code. We can replace them by 2 functions, operator== and operator<=>:

template <typename...Elem, typename...OElem, template <typename...> typename B>
requires (tuple_like<B<OElem...>>  && sizeof...(Elem) == sizeof...(OElem))
bool operator==(const tuple<Elem...> & a, const B<OElem...> & b) {
    return [&]<std::size_t... I>(index_sequence<I...>)
    {
        return  ((std::get<I>(a) == std::get<I>(b)) && ...);
    }
    (std::make_index_sequence<sizeof...(Elem)>{});
}

template <typename...Elem, typename...OElem, template <typename...> typename B>
requires (tuple_like<B<OElem...>>  && sizeof...(Elem) == sizeof...(OElem))
auto operator<=>(const tuple<Elem...> & a, const B<OElem...> & b) {
    using ret = common_comparison_category_t<__detail::__synth3way_t<Elem, OElem>...>;
    return [&]<std::size_t... I>(index_sequence<I...>)
    {
        ret c = ret::equivalent;
        // http://eel.is/c++draft/expos.only.func#2
        return  ((c = synth3way(std::get<I>(a), std::get<I>(b)), c != 0) || ...), c;
    }
    (std::make_index_sequence<sizeof...(Elem)>{});
}

Both these functions will support comparing a tuple to any tuple-like object. They could be hidden friends, which would improve compile times, if not for the dependency on std::get.

To implement them, I have used a few C++20 features. Notably, generic lambdas, which are of the form: []<_template-parameters_>(){}. These allow us to unpack the tuple to make use of fold expressions directly in the functions. Notice also the requires clause. In the definition of operator<=> a nasty comma expression is used to return when 2 elements compare equal. I understand this style might appear novel, maybe even scary, but properly documented it is easier to document than something which relies on a lot of enable_if and other meta-programming heavy approaches.

Interestingly these functions are not constrained by the standard.

We can’t have nice things

Using these tricks, I shaved over 700 lines from <tuple>; Almost a third. And I am sure tons of improvements could still be made. I probably introduced a few bugs doing this experiment, but I feel like I made the code easier to read and to maintain. It might even compile faster!

But here is the thing. Both libstd++ and libc++ are designed to work in many languages modes. As such <tuple> is designed to compile with a C++11 compiler.

So none of the features presented here are likely to be used by actual standard library maintainers. And when they are, they are conditionally supported leading to even more code!

The backward compatibility constraints standard library maintainers put on themselves have many costs: code that is harder to maintain, slower to compile and hard to evolve. This, in turn, might impede new features as we often hear in the committee that some classes are too complex to modify.

Does it need to be that way? Should there be a standard library implementation which tracks the standard and doesn’t support multiple C++ versions in the same branch? Can we imagine an implementation where _Ugly names would not be used, instead of relying on non exported symbols name in modules? This might be a pipe dream, a fresh standard library would be a herculean effort nobody is likely to entertain.

In some way, C++ is becoming a more accessible, more expressive language. And as we keep making complex library types easier to implement, standard libraries might appear increasingly complex and obscure.

But at least they support C++03.