Item 44001949

riehwvfbk • 1 day ago

You are very very unlikely to trigger Linux overcommit behavior by not initializing a member variable. It's even more unlikely for this to be a good thing.

In effect, you are assuming that your uninitialized and initialized variables straddle a page boundary. This is obviously not going to be a common occurrence. In the common case you are allocating something on the heap. That heap chunk descriptor before your block has to be written, triggering a page fault.

Besides: taking a page fault, entering the kernel, modifying the page table page (possibly merging some VMAs in the process) and exiting back to userspace is going to be A LOT slower than writing that variable.

OK you say, but what if I have a giant array of these things that spans many pages. In that case your performance and memory usage are going to be highly unpredictable (after all, initializing a single thing in a page would materialize that whole page).

OK, but vectors. They double in size, right? Well, the default allocator for vectors will actually zero-initialize the new elements. You could write a non-initializing allocator and use it for your vectors - and this is in line with "you have to say it explicitly to get dangerous behavior".

josefx • 1 day ago

> In effect, you are assuming that your uninitialized and initialized variables straddle a page boundary

You are assuming that I am working with small data structures, don't use arrays of data, don't have large amounts of POD members, ... .

> That heap chunk descriptor before your block has to be written, triggering a page fault.

So you allocate one out of hundreds of pages? The cost is significantly less than the alternative.

> In that case your performance and memory usage are going to be highly unpredictable (after all, initializing a single thing in a page would materialize that whole page).

As opposed to initializing thousands of pages you will never use at once? Or allocating single pages when they are needed?

> Well, the default allocator for vectors will actually zero-initialize the new elements.

I reliably get garbage data after the first reserve/shrink_to_fit calls. Not sure why the first one returns all zero, I wouldn't rely on it.

2 replies

jchw • 1 day ago

> You are assuming that I am working with small data structures, don't use arrays of data, don't have large amounts of POD members, ... .

Sounds like a great set of use cases for explicit syntax to opt out of automatic initialization.

riehwvfbk • 21 hours ago

Reserve vs resize, yes.

Reserve will not initialize, but then you have to keep track of the real vector size on the side, inevitably leading to bugs. Alternatively, something like this https://stackoverflow.com/questions/15967293/how-to-make-my-... will make resize() leave the elements uninitialized.

1 reply

josefx • 12 hours ago

> Reserve will not initialize, but then you have to keep track of the real vector size on the side

You already do that when you use push_back. It tracks the size for you, overallocates to amortize the cost of growing and most importantly does not initialize the overallocated memory before it is used, meaning pages will not be touched / mapped by the OS unless you actually end up using it. Giving you the benefit of amortizing vector growth without paying for the uninitialized memory it allocates behind the scenes for future use.

Directly accessing reserved memory instead of using resize was to check if the allocator zero initialized that overallocated memory. That the parts that are used end up initialized at a later point is entirely irrelevant to my point.

So your previous point:

> They double in size, right? Well, the default allocator for vectors will actually zero-initialize the new elements.

They double in capacity, not size when used with push_back. Which means exactly one new element will be initialized no matter how much uninitialized/unused/unmapped capacity the vector allocates for future use.

motorest • 1 day ago

> You are very very unlikely to trigger Linux overcommit behavior by not initializing a member variable.

The problem with your assumption is that you're just arguing that it's ok for code to be needlessly buggy if you believe the odds this bug is triggered are low. OP points out a known failure mode and explains how a feature eliminates it. You intentionally ignore it for no reason.

This assumption is baffling when, in the exact same thread, you see people whining about C++ for allowing memory-related bugs to exist.

1 reply

yorwba • 1 day ago

Linux overcommit is not a bug, it's a feature. The argument isn't that it's okay for code to be buggy if the odds of triggering the bug are low, it's that it's okay for code to not make use of a feature if the odds of benefiting from that feature are low.

1 reply

motorest • 1 day ago

> Linux overcommit is not a bug, it's a feature.

You failed to read what I wrote. I referred to why clients would choose to not initialize early to avoid scenarios such as Linux over committing, not that Linux had a bug.

1 reply

yorwba • 1 day ago

Overcommit is an optimization where virtual memory that is allocated but unused is not mapped to physical memory. If you want to avoid this (for some reason), choosing not to initialize early is not going to have the intended effect.

1 reply

motorest • 1 day ago

> Overcommit is an optimization where virtual memory that is allocated but unused is not mapped to physical memory.

Either you're replying without bothering to read the messages you're replying to, or you're failing to understand what is being written.

> If you want to avoid this (for some reason), choosing not to initialize early is not going to have the intended effect.

Read PP's comment.