Item 44090022

On the one hand, I understand your disappointment with JSON due to its gigantic size.

On the other hand, I wouldn't want another proprietary binary format to deal with.

And it seems like zipped JSON is almost the best of both worlds -- text-based so debugging and manual editing in a pinch is easy, but around as small as a binary format would be.

Sometimes I wonder if there shouldn't be a kind of optimized library that translates directly between 1) an in-memory structure based on e.g. class attributes rather than dictionary keys so the repeated keys aren't using up all your memory, and 2) writing out to zipped JSON format but automatically creating zip dictionary entries for keys, syntax like braces, numbers, repeated values, etc. It could be vastly faster than regular zip compression because it wouldn't be creating a dynamic dictionary, but would know the repeated bits and frequencies in advance. (I.e. by default wouldn't compress English text within JSON values, but you could also enable regular slower global zip compression if you wanted.)

So the file format would still just be zipped JSON that any tool can read. But you could use this optimized library to convert directly between a much smaller size on disk and a small size in memory, without ever having to e.g. hold the entire JSON uncompressed string in memory at any stage.

Maybe something like this already exists? I haven't come across it though.

IshKebab • 5 days ago

You don't need to use a proprietary binary format. There are plenty of existing options, Protobuf is probably the most obvious.

In my experience zipped JSON is not a magical panacea. It usually isn't as small as a binary format (especially a compressed one), and you usually need to decompress the whole thing in memory before you can use any of it. It's a lazy bodge, not a proper solution.

2 replies

crazygringo • 5 days ago

Working with protobuf has been a huge pain for me, personally -- sites that make API data available only in older versions of protobuf where there's no available library to decode in a popular language. JSON and zip are easy, universal, accessible standards in ways that protobuf simply isn't.

So that's why I'm saying, there's really something to be said for zipped JSON. You point out that you "usually need to decompress the whole thing in memory", and that's precisely what most of my comment was about -- handling it efficiently so you don't.

And it's not like protobuf is inherently any better at that anyways -- if you want to access a record in the middle of the file, you've got to stream the whole file up to that point to get to at it. It doesn't support random access in any native way.

So I'm kind of bummed out my original comment is being downvoted -- I'm making an actual serious proposal in it. I think zipped JSON should actually be taken seriously as a file format in its own right, especially with much more memory-efficient decoding.

2 replies

IshKebab • 4 days ago

> And it's not like protobuf is inherently any better at that anyways -- if you want to access a record in the middle of the file, you've got to stream the whole file up to that point to get to at it.

Not true. Libraries may not commonly expose the functionality but Protobuf uses tag-length-value so you can very quickly skip to the part of a file you want. In the worst case it will still be orders of magnitude faster than JSON.

Your proposal sounds very complicated and fragile, and doesn't solve significant issues with JSON like zero-copy, storing numbers and byte arrays directly, etc.

1 reply

crazygringo • 4 days ago

Protobuf doesn't include any kind of indexing. You can't just jump to a 5,000th record without traversing through the previous 4,999 ones. So no, you can't generally skip to some specific part. You've got to stream from the beginning, just like JSON.

And my proposal is literally the opposite of complicated and fragile. It's just taking two extremely robust and relatively simple existing standards.

It's definitely not perfect, but the entire point is its simplicity and robustness and support, unlike protobuf. And the fact that a library could actually make it much more performant to use, without sacrificing any compatibility with zip or JSON.

jlouis • 5 days ago

Zipped JSON is just massive waste of energy in a battery. We don't want that.

1 reply

crazygringo • 4 days ago

Of all of the drains on batteries, I think it's close to the bottom...

codedokode • 4 days ago

I don't think protobuf is good because it allows reordering fields (which means that the decoder will have to do it too). A good binary format is a format that maps to a C struct, doesn't have field names (or ids or tags) and requires zero decoding.

panstromek • 4 days ago

zipped JSON is basically what every websites uses, the zip is just handled by the network layer.

Now, I understand the general argument for JSON, but for lottie, there's not much to read or edit in it. It's just incomprehensible bag of numbers, or even worse - base64 strings. You need to process it with program anyway. It's not better than any binary format, just bigger.

JSON itself is the least of lottie's problems though. The semantics of is the real issue, because it imposes a lot of work on the implementation.

youngtaff • 4 days ago

CBOR (https://cbor.io/) seems the obvious data format

But really need browsers to expose their current CBOR decoders / encoders via a web accessible API