One of the common pitfalls I've seen in my time is someone writing a language they are familiar with in a language that just doesn't fit; trying to apply idioms that flow well with one language to another language where that's just not a good way to achieve the same ends.
An example I've seen a lot is a C thinker writing C++ classes with an init() function; sure, it works, but the C++ way is to do that in constructors. (All those about to start listing exceptions to that C++ idiom, please save it to the end, thanks!) The C thinker is still thinking about objects as "allocate memory, then set values" rather than the C++ way where allocation and initialisation are wrapped together into a single action (from the programmer's point of view).
So what are these pitfalls for a C++ thinker when writing Rust? This "phrasebook" is embracing the idea of taking a C++ way of thinking and applying it to Rust, which I'm sure will be fine for many situations, but what are the C++ phrases that are just the wrong way to do things in Rust?
To be fair, there's a reason for the pattern with init methods you're describing.
C++ constructors can't return values. If construction is fallible, the only way to communicate the error is via C++ exceptions. If you're in a code base that embraces exceptions, that's fine. But (C++) exceptions kind of suck, so many code bases don't, and then you have to find some alternatives.
Over the years, I've increasingly adopted a pattern where the constructor is private in this case and the object construction can be done with static methods - which is a bit more like Rust, actually.
> the object construction can be done with static methods
I've done that a lot too, but I found that free functions are much better for this than static member functions, because you can't get CTAD from static member functions. For example, with constructors we could write:
vector{1, 2, 3}; // deduces vector<int>
And with a static member, we would need: vector<int>::init(1, 2, 3);
With a free function, we could write: make_vector(1, 2, 3); // returns vector<int>
This is fine if the method you’re talking about is static — as you point out, it’s really all Rust has — but is absolutely a design mistake if it is not, which I think is what the poster above is referring to. It’s a common anti pattern and means you have an object that is at-best useless and at-worst completely broken after you call the constructor but before you call some member function on it
Two-phase initialization also has the added benefit of usually making the object have a constexpr constructor (usually a default constructor) and therefore making it eligible for constinit.
That said, construct_at also exists.
Nothing prevents std::vector from having a `constexpr` default-constructor except that it's not considered useful to do if you cannot follow that up with initializing its data in a constant context. For instance, this isn't very useful:
constinit vector<int> v;
But this would be more so: constinit vector<int> v(16, 1); // Fill with 16 1's.
And the reason we can't do this wouldn't be solved by splitting it into multiple functions.EDIT: Actually, come to think of it, C++20's vector already supports the first example. It's just not used much that way because it's not very helpful.
Right, when construction is fallible you need a factory.
Constructors are called in surprising places when running a C++ program. For example, think of a copy constructor failing somewhere far away from the code you are writing. If C++ allowed construction to fail, the control flow of propagating these errors would be tedious and invasive.
Hence exceptions as the only way to fail in a construction.
To be fair, there's a reason for the pattern with init methods you're describing.
Without prejudice on any other reasons, the most common reason for this pattern I've seen is people thinking in languages that basically don't have constructors, yet writing C++. It's not a good reason.
How would you deal with fallible construction of objects while avoiding exceptions in idiomatic C++?
The standard idiom is to have a sentinel state for the object indicating it is invalid. For objects without trivial destructors or which may be read after being moved-from (a valid behavior in some systems code contexts) then you need a sentinel state anyway because moves in C++ are non-destructive.
C++ uses deferred destruction as a standard tool to solve a variety of problems.
> which may be read after being moved-from (a valid behaviour in some systems code contexts)
std::move as applied to standard library types will leave the object in a "valid but unspecified state".[1] If you're leaving the object in an invalid state (one where the invariants are broken), you're not writing idiomatic C++.
I am using “invalid” here in the semantic sense of not containing a meaningful value. It is not invalid in a structural sense.
This makes sense for objects that can enter an equivalent invalid state after successful construction as the result of a method call (e.g. a file or stream).
For objects that don't have that property, you're just exchanging one kind of badness in the design for a different but ultimately equivalent badness.
So the existence of an object of the type does not act as a static proof that the state is valid?
This is correct (and I am using “invalid” here in a semantic sense, it is still structurally valid). There are a number contexts in low-level systems code where a static proof is not possible even in theory, so there needs to be a way for code to inspect object validity at runtime. Process address space isn’t entirely private, external actors that your process doesn’t entirely control can modify it e.g. via DMA.
The C++ compiler largely assumes that such static proof is possible by default and has no way of knowing if it is not. To address this, the C++ language has added features for annotating objects to indicate that static proofs of state are not possible at compile-time (e.g. std::launder).
Database kernels are the most extreme example of this because most objects in the address space don’t own their memory address and the mechanism that temporarily puts an object at a particular memory address is not visible at compile-time. Consequently, object location and state has to be resolved dynamically at runtime.
Definitely agree that there's plenty of cases in systems code where static proofs are impossible. That makes it all the worse when you give up on static proofs in places where they are possible.
This was one of the best decisions that Rust and Go did; not have constructors. In C# this is super annoying too, specially when you need an async operation to construct a type. This is usually done by having an private constructor and then using a static public method to create the type.
Rust and Go have no form of a conversion operator (even if not a constructor), which makes scripting a type system essentially impossible. Numeric libraries in both of those languages are extremely cumbersome, largely for this reason.
I don't understand this comment at all.
Rust not only has the 'as' operator for this exact purpose, but it also has the suite of traits From, Into, TryFrom and TryInto for the infallible and fallible conversions respectively.
As is an infix operator, it can never be invoked implicitly to draw these relationships between different types.
Idiomatic C++ uses exceptions.
The standard doesn't allow to disable language features.
Anyone that goes into the dark side of disabling language features is writing unidiomatic C++ with compiler specific extensions.
Can you think of good reasons why an organization would hesitate to use C++ exceptions?
Legacy code writen as if it was C with a C++ compiler, or that predates the C++98 standard (during the 1980-90's, where C++ARM was the only guidance), the Orthodox C++ folks, claiming that they are too slow or bloated (most of the time based on hearsay and not profiled), on embedded computers better than everything I owned since 1980's until 2000's, put together.
The same folks won't have a second thought distributing statically linked binaries that triple the size, while using languages that don't do exceptions, but then it isn't bloat, talk about being coherent.
I think another reason is that C++ mutexes typically don't poison on throwing an exception.
That's a bold statement, considering that many of the largest C++ code bases - including at least one of the few remaining C++ compilers! - don't use exceptions.
I love bold statements, and if you mention either LLVM or Chrome, it isn't as if the Google's C++ style guide is any piece of art.
Which anyone that bothers to make such claims, should be aware what it actually says regarding exceptions.
"On their face, the benefits of using exceptions outweigh the costs, especially in new projects. However, for existing code, the introduction of exceptions has implications on all dependent code. If exceptions can be propagated beyond a new project, it also becomes problematic to integrate the new project into existing exception-free code. Because most existing C++ code at Google is not prepared to deal with exceptions, it is comparatively difficult to adopt new code that generates exceptions."
They don't use exceptions, because already started on the wrong foot, and like legacy code of Titanic size there is no turning around now.
What does the Bible of idiomatic C++ says, aka C++ Core Guidelines?
It has several advices on E section, regarding exception coding best practices.
The rationale is probably that it’s better for C++ devs to write non idiomatic Rust than to keep writing unsafe C++. Like unless they use unsafe and completely circumvent the borrow checker, it’s still gonna be safer. Not letting perfect be the enemy of good and all.
Plus idiomatic rust isn’t that strict a definition. Clippy will guide you for most of the simple stuff and the rest isn’t always worth following. Like people who try to do stuff “correctly” with traits often end up with way more complexity than it’s worth.
I hate with passion two phase initialisation, C++ libraries that are bare bones C libraries wrapped in an extern "C" { }, malloc()/free(), C style coding and such.
A resource like this is a good place to discuss where the two languages are near and far. Of course there are going to be styles within each language that differ as much as the languages themselves.
The worst pitfall is Rust references == pointers.
They are implemented as pointers, but their role is to give temporary (often exclusive) access that is restricted to a statically know scope, which is pretty specific and fits only some uses of some pointers/C++ references. In C++ pointers typically mean avoiding copying, but Rust references avoid storing/keeping the data. When these goals don't overlap, people get stuck with a dreadful "does not live long enough" whack-a-mole.
>their role is to give temporary (often exclusive) access that is restricted to a statically know scope, which is pretty specific and fits only some uses of some pointers/C++ references
You could have a vector of references to heap allocated data, as long as the references were parametrized by the same lifetime. You might do this if implementing a tree iterator using a vector as a stack, for instance. That goes beyond a statically known scope. But implementing a mutable iterator the same way would require a stack of mutable pointers (and therefore unsafe code whenever you dereference them), since mutable references have to be unique. That does seem like a bad limitation.