Item 44156883

quotemstr • 5 days ago

The problem isn't relying on inode numers; it's inode numbers being too short. Make them GUIDs and the problems of uniqueness disappear. As for stability: that's just a matter of filesystem durability in general.

the_mitsuhiko • 5 days ago

> The problem isn't relying on inode numers; it's inode numbers being too short.

It's a bit of both. inodes are conflating two things in a way. They are used by the file system to identify a record but they are _also_ exposed in APIs that are really cross file system (and it comes to a head in case of network file systems or overlayfs).

What's a more realistic path is to make inodes just an FS thing, let it do it's thing, and then create a set of APIs that is not relying on inodes as much. Linux for instance is trying to move towards file handles as being that API layer.

bastawhiz • 4 days ago

You could make it bigger, but then your inode table gets pretty big. If an inode number is 32 bits today, then UUIDs would take up four times the space. I'd also guess that the cost of hashing the UUIDs is significant enough that you'd see a user-visible performance hit.

And really, it's not even super necessary. 64-bit inode numbers already exist in modern file systems. You don't need UUIDs to have unique IDs forever: you'll never run out of 64-bit integers. But the problem wasn't really ever that you'd run out, the problem is in the way they're handled.

1 reply

quotemstr • 4 days ago

> You could make it bigger, but then your inode table gets pretty big.

You could do it like Java's Object.identityHashCode() and allocate durable IDs only on demand.

> If an inode number is 32 bits today, then UUIDs would take up four times the space.

We probably waste more space on filesystems that lack tail-packing.

> I'd also guess that the cost of hashing the UUIDs is significant enough that you'd see a user-visible performance hit.

We're hashing filenames for H-tree indexing anyway, aren't we?

> you'll never run out of 64-bit integers

Yeah, but with 128-bit ones you'll additionally never collide.

2 replies

inkyoto • 4 days ago

> You could do it like Java's Object.identityHashCode() and allocate durable IDs only on demand

The two real issues here are 1) inode numbers are no longer sequentially allocated as back in the 1970-80's, and 2) an inode number in a modern file system usually carries extra significant information.

In XFS, the inode number is derived from the disk address (block address) of the inode within the file system. Specifically, the inode number encodes:

  • The allocation group (AG) number.

  • The block offset within the AG.

  • The inode’s index within its block.

It is not entirely unfathomable that other modern file systems likely apply similar heuristics to support the dynamic on-disk data structure allocation. So turning to pseudorandom hash values is not going to work – not easily anyway, and not without a redesign of the on-disk data structures.

Then, there is an issue of the finite size (or width) of the inode field (64 bits, 128 bits etc). On an exascale file system with a very high file turnover and a huge number of files, at some point a previously allocated inode will have to be recycled, irrespective of whether it was a pseudorandom or a calculated number. It is not a problem for most installations as the exascale is not there, but I don't think the problem can be solved using non-esoteric approaches.

account42 • 4 days ago

> Yeah, but with 128-bit ones you'll additionally never collide.

dd if=/dev/sda of=/dev/sdb would like a word with you.

1 reply

quotemstr • 3 days ago

Touche. TBD, all forms of stored ID can collide in that case, so I'm not sure it counts