April 19, 2025

A patchwork of Clang patches

Clang 20 was released in early March. But there is no rest for the wicked; Clang 21 is poised for a release in September.

While we are hard at work on C++23, C++26, and C23 features, I am also trying to find some time to work on miscellaneous bits and bobs.

I figured I should talk about some of them even if it’s all listed in the in-progress changelog, which we do our best to keep as accurate as possible.

Looonger fold expressions

Folds expressions in clang are tied to -fbracket-depth, which kinda-sorta lets us put a cap on recursion in Clang’s parser.

That limit defaulted to 256, but ever since the introduction of fold expressions, there has been a desire to increase that limit. But letting the compiler do unbounded recursion caused some instabilities on some platforms, simply because Clang was running out of stack space. And maybe we could have gotten away with adding another flag or ignoring this limit for fold expressions. But that would have put an additional burden on the user, so instead, I went hunting for some stack space to reclaim and was able to find a few hundred bytes here and there. It doesn’t seem like a lot, but over a large number of recursions, it adds up.

And so, we raised default the limit to 2048.

std::size_t a = [] {
    return []<std::size_t...I>(std::index_sequence<I...>) {
        return (I + ...);
    }(std::make_index_sequence<2048>{});
}();

[Compiler Explorer]

It doesn’t mean the stack won’t blow up in some instances, but in trivial cases, Clang is more frugal in its consumption of stack space.

I hope the new default will cover the needs of most of our users. Given that fold expressions are always preferable to a solution based on recursive template instantiation, it’s nice to be able to use them without moderation!

Constexpr all the things, asm edition.

When they implemented user-generated static_assert messages (P2741R3), the good folks working on GCC figured they could also support these constexpr strings in asm statements. And so they did.

I had a few users asking me if Clang could do the same. Yes, we could.

int foo() {
   asm((std::string_view("nop")) ::: (std::string_view("memory")));
}

I was told this would be useful for people writing SIMD libraries. You could also use it to write a C++ compiler in C++.

Note that neither GCC nor clang support constexpr strings in asm label, so you cannot write your own constexpr mangler. Which is probably a good thing!

int foo () asm ((std::string_view("bar")); // *NOT* Supported

[Compiler Explorer].

You can check Clang’s and GCC’s documentation for this feature. I’m looking forward to seeing how it gets used!

It was also a nice opportunity to refactor our constexpr string evaluation code, which will make it easier to adopt Extending User-Generated Diagnostic Messages if that ends up being adopted by the C++ committee.

__builtin_structured_binding_size

Clang 21 will support packs in structured bindings (thanks to a contribution from Jason Rice).

One could think that having the ability to deduce the number of bindings would let us detect whether a type can be decomposed.

template <typename T>
concept decomposable = requires(const T & t) {
    auto [...p] = t; // ❌ error
};

Alas, neither SFINAE nor constraints allow us to check if a declaration would be valid - and that would be a fairly difficult undertaking.

However, knowing if a type can be used with structured bindings and how many bindings would be created is quite useful.

So I added a new builtin for that: __builtin_structured_binding_size.

If T can be decomposed, __builtin_structured_binding_size(T) returns the number of bindings as a size_t, and if it can’t, it produces SFINAE-friendly errors.

The decomposable concept can be written as:

template <typename T>
concept decomposable = requires {
 {__builtin_structured_binding_size(T)};
};

The combination of __builtin_structured_binding_size and packs in structured binding can even let us generically access members of aggregates:

template<std::size_t Idx, typename T>
requires (Idx < __builtin_structured_binding_size(T))
decltype(auto) constexpr get_binding(T&& obj) {
    auto && [...p] = std::forward<T>(obj);
    return p...[Idx];
}
struct S { int a = 0, b = 42; };
static_assert(__builtin_structured_binding_size(S) == 2);
static_assert(get_binding<1>(S{}) == 42);

Adding this builtin was necessary to make sender/receivers implementable:

template <typename T>
requires (__builtin_structured_binding_size(T) >=2)
consteval auto tag_of_impl(T& t) {
    auto && [tag, ..._] = t;
    return std::type_identity<decltype(auto(tag))>{};
}

template <typename T>
using tag_of_t = decltype(tag_of_impl(std::declval<T&>()))::type;

Faster subsumption

Subsumption (the process by which we determined which viable candidate is the most constrained and, therefore, the best match) requires us to produce a Conjunctive Normal Form and Disjunctive Normal Form of the constraints.

Unfortunately, this process has exponential worst-case complexity, and some (somewhat artificial) test cases would cause clang to crash, hang, or consume unreasonable amounts of RAM.

Not well versed in boolean algebra, I fell into deep the rabbit holes of Tseytin (not applicable, which is a shame because in the cases where it works, it is very efficient!), Tarjan’s, and SAT (overkill). It doesn’t help that what C++ calls subsumption is really an implication relationship rather than subsumption.

It was fun but quite counterproductive. I managed to crawl out of that hole, and having re-learned that I know nothing, I opted for a simple solution of eliminating redundant sub-clauses, which seems to be enough.

I also rewrote that whole process to be more space efficient. It does not make a lot of difference for most code, but pathological cases have a chance of compiling.

We could still do a few things to remove more subclauses, sort them by size, etc. I even investigated this great paper from Microsoft, but realistically, that’s probably not worth it until the formula has thousands of terms, and that’s not a reasonable scenario in the context of C++ constraints.

We could throw a SAT solver at it - after all Clang Tidy comes with z3. But that would not do anything good for performance in the common case.

After all, constraint subsumption in C++ is designed to avoid the need for a SAT solver (which is why negations are atomic constraints) at the cost of the need for both a CNF and a DNF transformation per constraint (which, in the common case of a C++ program, should still be the faster option).

[Compiler Explorer]

Faster overload resolution

Last but not least, I implemented P3606 R0 On Overload Resolution, Exact Matches and Clever Implementations because it’s always nice to implement your own papers!

During overload resolution, if we know that a non-template candidate will always be picked because a specialization of a template candidate cannot be better, we no longer instantiate the template candidate.

This fixes a few issues where Clang complained about concepts depending on themselves. This was correct from the point of view of the standard but confusing as diverging from other implementations.

[Compiler Explorer]

This should let us implement the resolution to CWG2369 which quite a few users have been asking about.

It also makes Clang 4% faster at compiling itself, which is a nice bonus!

That’s all folks!

This is what I’ve been up to over the past few weeks. Many people (not enough) contribute to Clang, and Clang 21 will have a lot of exciting features and bug fixes.

If you are wondering what other toolchains are up to, GCC is making improvements to its diagnostics, and MSVC is working on C++23.

April 19, 2025

A patchwork of Clang patches

Looonger fold expressions

Constexpr all the things, asm edition.

__builtin_structured_binding_size

Faster subsumption

Faster overload resolution

That’s all folks!

Share on