Item 44000516 - HN

shadowdev1 • 1 day ago

Heh, low comments on C++ posts now. A sign of the times. My two cents anyway.

I've been using C++ for a decade. Of all the warts, they all pale in comparison to the default initialization behavior. After seeing thousands of bugs, the worst have essentially been caused by cascading surprises from initialization UB from newbies. The easiest, simplest fix is simply to default initialize with a value. That's what everyone expects anyway. Use Python mentality here. Make UB initialization an EXPLICIT choice with a keyword. If you want garbage in your variable and you think that's okay for a tiny performance improvement, then you should have to say it with a keyword. Don't just leave it up to some tiny invisible visual detail no one looks at when they skim code (the missing parens). It really is that easy for the language designers. When thinking about backward compatibility... keep in mind that the old code was arguably already broken. There's not a good reason to keep letting it compile. Add a flag for --unsafe-initialization-i-cause-trouble if you really want to keep it.

C++, I still love you. We're still friends.

8

juliangmp • 1 day ago

> When thinking about backward compatibility... keep in mind that the old code was arguably already broken. There's not a good reason to keep letting it compile.

Oh how I wish the C++ committee and compiler authors would adopt this way of thinking... Sadly we're dealing with an ecosystem where you have to curate your compiler options and also use clang-tidy to avoid even the simplest mistakes :/

Like its insane to me how Wconversion is not the default behavior.

motorest • 1 day ago

> Oh how I wish the C++ committee and compiler authors would adopt this way of thinking...

I disagree. If you expect anyone to adopt your new standard revision, the very least you need to do is ensure their code won't break just by flipping s flag. You're talking about production software, many of which has decades worth of commit history, which you simply cannot spend time going through each and every single line of code of your >1M LoC codebase. That's the difference between managing production-grade infrastructure and hobbyist projects.

dwattttt • 1 day ago

> If you expect anyone to adopt your new standard revision, the very least you need to do is ensure their code won't break just by flipping s flag.

Why would you expect that a new revision can't cause existing code to compile? It means that "new" revisions can't fix old problems, and one thing you always get more of over time is perspective.

If you don't want your code "broken", don't migrate to a new standard. That's the point of supporting old standards. Don't hobble new standards because you both want new things, but don't want old things to change.

motorest • 13 hours ago

> Why would you expect that a new revision can't cause existing code to compile?

For staters, because that would violate the goals and prioritites of the C++ as established by the C++ standardization committee.

I could go on and on, but it's you who should provide any semblance of rationale: why do you believe existing software should break? What value do you see in it? Does it have any value at all?

dwattttt • 11 hours ago

> why do you believe existing software should break? What value do you see in it? Does it have any value at all?

How exactly can a _new_ standard cause existing software compiling on an old standard to break?

EDIT: As to the value, I've mentioned elsewhere in this thread; we know a lot more about the practice of programming than when C++ was created and standardised. Why would we say those design decisions can never be questioned or changed? It locks into place choices we never knew were bad, until they caused decades of problems.

> that would violate the goals and prioritites of the C++ as established by the C++ standardization committee.

Correct. It's these goals and priorities that are being criticised.

nickysielicki • 1 day ago

> You're talking about production software, many of which has decades worth of commit history, which you simply cannot spend time going through each and every single line of code of your >1M LoC codebase.

They can keep using the old standard.

motorest • 1 day ago

> They can keep using the old standard.

But they cannot upgrade. Ever. At least without requiring major maintenance work. Which means never. Do you understand what that means?

nickysielicki • 1 day ago

Every compiler continues to support old standards. What risk am I missing? This feels like a perfectly acceptable outcome for icky legacy code that is not essential enough to maintain.

johannes1234321 • 1 day ago

The option there is better tooling, for which the foundation exists which can do such maintenance somewhat automatically, in the simplest case by just adding the Keywords to request old behavior.

But the annoyance comes when dealing with multiple compilers and versions. Then you have to add more compatibility macros all over. Say, when being a library vendor trying to support broad range of customers.

motorest • 1 day ago

> The option there is better tooling (...)

The tooling already exists. The bulk of the criticism in this thread is clearly made from a position of ignorance. For example, all major compilers already provide flags to enable checks for uninitialized variables being used. Onboarding a static code analysis tool nowadays requires setting a flag in CMake.

These discussions would be constructive if those engaging in them had any experience at all with the language and tooling. But no, it seems the goal is to parrot cliches out of ignorance. Complaining that they don't know what a reserved word means and using that as an argument to rewrite software in other languages is somehow something worth stating.

mystified5016 • 1 day ago

Python seems to still be pretty popular despite breaking most extant code with language updates

monkeyelite • 1 day ago

And the cost of this is that every time I open a project in another language it’s broken and I have to make changes to fix all their little breaking changes.

zahlman • 1 day ago

>Oh how I wish the C++ committee and compiler authors would adopt this way of thinking

Many different committees, organizations etc. could benefit, IMO.

josefx • 1 day ago

> keep in mind that the old code was arguably already broken

The code is only broken if the data is used before anything is written to it. A lot of uninitialized data is wrapped by APIs that prevent reading before something was written to it, for example the capacity of a standard vector, buffers for IO should only access bytes that where already stored in them. I have also worked with a significant number of APIs that expect a large array of POD types and then tell you how many entries they filled.

> for a tiny performance improvement

Given how Linux allocates memory pages only if they are touched and many containers intentionally grow faster then they are used? It reduces the amount of page faults and memory use significantly if only the used objects get touched at all.

riehwvfbk • 1 day ago

You are very very unlikely to trigger Linux overcommit behavior by not initializing a member variable. It's even more unlikely for this to be a good thing.

In effect, you are assuming that your uninitialized and initialized variables straddle a page boundary. This is obviously not going to be a common occurrence. In the common case you are allocating something on the heap. That heap chunk descriptor before your block has to be written, triggering a page fault.

Besides: taking a page fault, entering the kernel, modifying the page table page (possibly merging some VMAs in the process) and exiting back to userspace is going to be A LOT slower than writing that variable.

OK you say, but what if I have a giant array of these things that spans many pages. In that case your performance and memory usage are going to be highly unpredictable (after all, initializing a single thing in a page would materialize that whole page).

OK, but vectors. They double in size, right? Well, the default allocator for vectors will actually zero-initialize the new elements. You could write a non-initializing allocator and use it for your vectors - and this is in line with "you have to say it explicitly to get dangerous behavior".

josefx • 1 day ago

> In effect, you are assuming that your uninitialized and initialized variables straddle a page boundary

You are assuming that I am working with small data structures, don't use arrays of data, don't have large amounts of POD members, ... .

> That heap chunk descriptor before your block has to be written, triggering a page fault.

So you allocate one out of hundreds of pages? The cost is significantly less than the alternative.

> In that case your performance and memory usage are going to be highly unpredictable (after all, initializing a single thing in a page would materialize that whole page).

As opposed to initializing thousands of pages you will never use at once? Or allocating single pages when they are needed?

> Well, the default allocator for vectors will actually zero-initialize the new elements.

I reliably get garbage data after the first reserve/shrink_to_fit calls. Not sure why the first one returns all zero, I wouldn't rely on it.

jchw • 1 day ago

> You are assuming that I am working with small data structures, don't use arrays of data, don't have large amounts of POD members, ... .

Sounds like a great set of use cases for explicit syntax to opt out of automatic initialization.

riehwvfbk • 22 hours ago

Reserve vs resize, yes.

Reserve will not initialize, but then you have to keep track of the real vector size on the side, inevitably leading to bugs. Alternatively, something like this https://stackoverflow.com/questions/15967293/how-to-make-my-... will make resize() leave the elements uninitialized.

josefx • 13 hours ago

> Reserve will not initialize, but then you have to keep track of the real vector size on the side

You already do that when you use push_back. It tracks the size for you, overallocates to amortize the cost of growing and most importantly does not initialize the overallocated memory before it is used, meaning pages will not be touched / mapped by the OS unless you actually end up using it. Giving you the benefit of amortizing vector growth without paying for the uninitialized memory it allocates behind the scenes for future use.

Directly accessing reserved memory instead of using resize was to check if the allocator zero initialized that overallocated memory. That the parts that are used end up initialized at a later point is entirely irrelevant to my point.

So your previous point:

> They double in size, right? Well, the default allocator for vectors will actually zero-initialize the new elements.

They double in capacity, not size when used with push_back. Which means exactly one new element will be initialized no matter how much uninitialized/unused/unmapped capacity the vector allocates for future use.

motorest • 1 day ago

> You are very very unlikely to trigger Linux overcommit behavior by not initializing a member variable.

The problem with your assumption is that you're just arguing that it's ok for code to be needlessly buggy if you believe the odds this bug is triggered are low. OP points out a known failure mode and explains how a feature eliminates it. You intentionally ignore it for no reason.

This assumption is baffling when, in the exact same thread, you see people whining about C++ for allowing memory-related bugs to exist.

yorwba • 1 day ago

Linux overcommit is not a bug, it's a feature. The argument isn't that it's okay for code to be buggy if the odds of triggering the bug are low, it's that it's okay for code to not make use of a feature if the odds of benefiting from that feature are low.

motorest • 1 day ago

> Linux overcommit is not a bug, it's a feature.

You failed to read what I wrote. I referred to why clients would choose to not initialize early to avoid scenarios such as Linux over committing, not that Linux had a bug.

yorwba • 1 day ago

Overcommit is an optimization where virtual memory that is allocated but unused is not mapped to physical memory. If you want to avoid this (for some reason), choosing not to initialize early is not going to have the intended effect.

motorest • 1 day ago

> Overcommit is an optimization where virtual memory that is allocated but unused is not mapped to physical memory.

Either you're replying without bothering to read the messages you're replying to, or you're failing to understand what is being written.

> If you want to avoid this (for some reason), choosing not to initialize early is not going to have the intended effect.

Read PP's comment.

fooker • 1 day ago

> keep in mind that the old code was arguably already broken.

Reminder than compiler devs are usually paid by trillion dollar companies that make billions with 'old code'.

tails4e • 1 day ago

Especially when doing the right/safe thing by default is at worst a minor performance hit. They could change the default to be sane and provide a backwards compatible switch to pragma to revert to the less safe version. They could, but for some reason never seem to make such positive changes

redandblack • 1 day ago

stupid question as I have not tpuched C++ since the 90s - can the IDEs not do this with all these now almost universal linters and AI assists. Maybe something that prompts before a commit and autoprompts before/after fixes to only the inititaization. Maybe simple as a choice in the refactoring menu? Rust - where are you for proposing this fix to C++ or, is it javascript?

vrighter • 1 day ago

that's the undefined keyword in zig. I love it. It makes UB opt-in and explicit

loeg • 1 day ago

Compilers should add this as a non-standard extension, right? -ftrivial-auto-var-init=zero is a partial solution to a related problem, but it seems like they could just... not have UB here. It can't be that helpful for optimization.

Matheus28 • 1 day ago

Yes but it’s not portable. If zero initialization were the default and you had to opt-in with [[uninitialized]] for each declaration it’d be a lot safer. Unfortunately I don’t think that will happen any time soon.

tialaramex • 1 day ago

You probably don't want zero initialization if you can help it.

Ideally, what you want is what Rust and many modern languages do: programs which don't explain what they wanted don't compile, so, when you forget to initialize that won't compile. A Rust programmer can write "Don't initialize this 1024 byte buffer" and get the same (absence of) code but it's a hell of a mouthful - so they won't do it by mistake.

The next best option, which is what C++ 26 will ship, is what they called "Erroneous Behaviour". Under EB it's defined as an error not to initialize something you use but it is also defined what happens so you can't have awful UB problems, typically it's something like the vendor specifies which bit pattern is written to an "unintialized" object and that's the pattern you will observe.

Why not zero? Unfortunately zero is too often a "magic" value in C and C++. It's the Unix root user, it's often an invalid or reserved state for things. So while zero may be faster in some cases, it's usually a bad choice and should be avoided.

motorest • 1 day ago

> Ideally, what you want is what Rust and many modern languages do: programs which don't explain what they wanted don't compile, so, when you forget to initialize that won't compile.

I think you're confusing things. You're arguing about static code analysis being able to identify uninitialized var reads. All C++ compilers already provide support for flags such as -Wuninitiaized.

bluGill • 1 day ago

Uninitialized variables reads can only sometimes be detected statically. -Wuninitialized is still good, but it will miss a lot of cases when the read is in a different translation unit. Whole program analysis could get more cases, but with large programs (multi-million lines of code) it is unlikely we can analyze everything (everything being more than just variables) before the universe ends - see the halting problem.

motorest • 13 hours ago

> Uninitialized variables reads can only sometimes be detected statically. -Wuninitialized is still good, but it will miss a lot of cases when the read is in a different translation unit.

From my experience, you're exaggerating the number of false negatives, which is more a factor of how you write your code than what static code analyzers do.

Also, your comment reads like an attempt at moving the goal post. We start this discussion being very adamant in accusing C++ of being impossible to detect uninitialized var reads. Once that assertion is thoroughly proven to be false and resulting from clueless ignorance, now we try to reframe it as being... Imperfect in some hypothetical scenarios? So what's supposed to be the actual complain?

The main problem with C++ is that some people somehow are personally invested in criticizing it from a position of complete ignorance. The problem is not technical, it's social.

bluGill • 6 hours ago

I agree false negatives are rare - I was intending to temper expectations because perfection is impossible.

dwattttt • 1 day ago

> You're arguing about static code analysis being able to identify uninitialized var reads.

(Safe) Rust does guarantee to identify uninitialised variable reads, but I believe the point is that you can get the optimisation of not forcing early initialisation in Rust, you just have to be explicit that that's what you want (you use the MaybeUninit type); you're forced to be clear that that's what you meant, not just by forgetting parens.

tialaramex • 1 day ago

You can even write e.g. this:

  let mut jim: Goat;
  // Potentially much later ...
  if some_reason {
    jim = make_a_new_goat();
  } else {
    jim = get_existing_goat();
  }
  use(jim); // In some way we use that goat now

The compiler can see OK, we eventually initialized this variable before we used it, there's no way we didn't initialize it so that's fine, this compiles.

But, if we screw up and make it unclear whether jim is initialized, probably because in some cases it wouldn't be - that doesn't compile.

This is the usual "avoid early initialization" C++ programmers are often thinking of and it doesn't need MaybeUninit, since it's definitely fine if you're correct, it's just that the C++ compiler is happy (before C++ 26) with just having Undefined Behaviour if you make any mistakes and the Rust compiler will reject that.

[Idiomatically this isn't good Rust, Rust is an expression language so we can just write all that conditional if-else block in the initializer itself and that's nicer, but if you're new to this the above works fine.]

motorest • 1 day ago

> (Safe) Rust does guarantee to identify uninitialised variable reads (...)

That's great. You can get that check on C++ projects by flipping a compiler flag.

Aren't we discussing C++?

loeg • 22 hours ago

What flag do you have in mind? (To the best of my knowledge, no such flag exists -- unless you're talking about one of the heavily penalized sanitizer modes.)

motorest • 13 hours ago

> To the best of my knowledge, no such flag exists

Your knowledge doesn't seem to even reach the point of having googled the topic. If you googled it once, you'd not be commenting it doesn't exist. Hell, you don't even seem to have read the thread, let alone the discussion.

loeg • 2 hours ago

Again with the personal attacks.

I'll note that you have failed to name this "obvious" flag that I'm missing.

leni536 • 1 day ago

Something like that is heading into C++26 actually. Except the initialization is not to zero, but to some unspecified value (with explicit intention of not allowing leaking garbage) and allowing to trap. It's called "erroneous values".

loeg • 1 day ago

I don't really care if it isn't portable. I only have to work with Clang, personally.

> If zero initialization were the default and you had to opt-in with [[uninitialized]] for each declaration it’d be a lot safer.

I support that, too. Just seems harder than getting a flag into Clang or GCC.

motorest • 1 day ago

> I don't really care if it isn't portable.

You don't care because your job is not to ensure that a new release of C++ doesn't break production code. You gaze at your navel and pretend that's the universe everyone is bound to. But there are others using C++, and using it in production software. Some of them care, and your subjective opinions don't have an impact in everyone else's requirements.

> I only have to work with Clang, personally.

Read Clang's manual and check what compiler flags you need to flip to get that behavior. It's already there.

loeg • 22 hours ago

Lmao. You've misread both of my upthread comments and have somehow arrived at the conclusion that this justifies personal attacks. There's just no discussion to be had here.

ryandrake • 1 day ago

Portability is always for the other guy’s sake, not your own. That’s why so many people don’t care about it.

loeg • 1 day ago

Again, I'm not opposed to the idea, it just seems more challenging logistically.

TuxSH • 1 day ago

Gcc already has [[gnu::uninitialized]] (clang doesn't, AFAIK), as well as -ftrivial-auto-var-init=pattern which exactly matches the new C++26 semantics, if I'm not mistaken

loeg • 22 hours ago

> -ftrivial-auto-var-init=pattern

I believe this only helps for trivial automatic variables; not non-trivial automatic variables (structs/classes) that contain uninitialized trivial members.

MichaelRo • 1 day ago

>> Of all the warts, they all pale in comparison to the default initialization behavior.

Come on. That's nothing compared to the horrors that lay in manual memory management. Like I've never worked with a C++ based application that doesn't have crashes lurking all around, so bad that even a core dump leaves you clueless as to what's happening. Couple OOP involving hundreds of classes and 50 levels deep calls with 100s of threads and you're hating your life when trying to find the cause for yet another crash.

bluGill • 1 day ago

I can write bad code in rust too. Rust makes it more difficult, but if you try hard you can abuse it to get the same hundreds of classes and 50 level deep calls, and 100s of threads. You can even do manual memory management in Rust - it isn't built into the language but you can call system APIs to allocate memory if you really want to be stupid. Don't do that is the answer.

Good programmers have long ago written best practices guides based on hard learned experience. Newer languages (like Rust) were designed by people who read those guides and made a language that made using those features hard.

kaashif • 1 day ago

50 levels deep? With some of the template metaprogramming I've seen, looking at just the types for just one level will not only fill your screen, but take up megabytes on disk...

motorest • 1 day ago

> Come on. That's nothing compared to the horrors that lay in manual memory management. Like I've never worked with a C++ based application that doesn't have crashes lurking all around, so bad that even a core dump leaves you clueless as to what's happening.

Have you tried fixing the bugs in your code?

That strategy has been followed by people writing code in every single language, and when used (even with C++) you do drive down the number of these crashes to a residual/purely theoretical frequency.

Scenarios such as those you've described are rare. There should be more to them than the tool you're using to do your job. So why blame the tool?