Item 44090261

IshKebab • 5 days ago

You don't need to use a proprietary binary format. There are plenty of existing options, Protobuf is probably the most obvious.

In my experience zipped JSON is not a magical panacea. It usually isn't as small as a binary format (especially a compressed one), and you usually need to decompress the whole thing in memory before you can use any of it. It's a lazy bodge, not a proper solution.

crazygringo • 5 days ago

Working with protobuf has been a huge pain for me, personally -- sites that make API data available only in older versions of protobuf where there's no available library to decode in a popular language. JSON and zip are easy, universal, accessible standards in ways that protobuf simply isn't.

So that's why I'm saying, there's really something to be said for zipped JSON. You point out that you "usually need to decompress the whole thing in memory", and that's precisely what most of my comment was about -- handling it efficiently so you don't.

And it's not like protobuf is inherently any better at that anyways -- if you want to access a record in the middle of the file, you've got to stream the whole file up to that point to get to at it. It doesn't support random access in any native way.

So I'm kind of bummed out my original comment is being downvoted -- I'm making an actual serious proposal in it. I think zipped JSON should actually be taken seriously as a file format in its own right, especially with much more memory-efficient decoding.

2 replies

IshKebab • 4 days ago

> And it's not like protobuf is inherently any better at that anyways -- if you want to access a record in the middle of the file, you've got to stream the whole file up to that point to get to at it.

Not true. Libraries may not commonly expose the functionality but Protobuf uses tag-length-value so you can very quickly skip to the part of a file you want. In the worst case it will still be orders of magnitude faster than JSON.

Your proposal sounds very complicated and fragile, and doesn't solve significant issues with JSON like zero-copy, storing numbers and byte arrays directly, etc.

1 reply

crazygringo • 4 days ago

Protobuf doesn't include any kind of indexing. You can't just jump to a 5,000th record without traversing through the previous 4,999 ones. So no, you can't generally skip to some specific part. You've got to stream from the beginning, just like JSON.

And my proposal is literally the opposite of complicated and fragile. It's just taking two extremely robust and relatively simple existing standards.

It's definitely not perfect, but the entire point is its simplicity and robustness and support, unlike protobuf. And the fact that a library could actually make it much more performant to use, without sacrificing any compatibility with zip or JSON.

jlouis • 5 days ago

Zipped JSON is just massive waste of energy in a battery. We don't want that.

1 reply

crazygringo • 4 days ago

Of all of the drains on batteries, I think it's close to the bottom...

codedokode • 4 days ago

I don't think protobuf is good because it allows reordering fields (which means that the decoder will have to do it too). A good binary format is a format that maps to a C struct, doesn't have field names (or ids or tags) and requires zero decoding.