Retr0id 5 days ago

gzip (and zstandard) decompression is inherently serial, whereas many bloom filter lookups can be as parallel as you like.

3
hxtk 4 days ago

The bloom filter lookups here are fundamentally not parallel because they are coupled to a serial iteration.

Ultimately the data they are storing is tuples of (x,y,r,g,b), which means that it is not sufficient to ask "does (x1,y1) pass the bloom filter?" which would be embarrassingly parallel; you have tobe able to match it to an index in your list of (r,g,b) data.

The way this is done is to iterate over all (x,y) in a consistent order for encoding and decoding, and for each (x1,y1), if it passes the bloom filter, then the next (r1,g1,b1) tuple is the pixel data for that pixel.

thesz 4 days ago

You may perform n parallel lookups into Bloom filter then use state machine table to perform k (most often k < n) actions, oftentimes in parallel (SIMD) too because it is just byte shuffle.

hxtk 4 days ago

Hmm, that's a good point, since the bloom filter check is CPU-harder than the action taken on lookup, you could just do all the lookups in parallel and then do all of the copies separately.

alexjurkiewicz 5 days ago

You can de/compress multiple streams in parallel, at minimal cost of overall efficiency. For gzip, look at pigz. For video, look at how x264/x265 slice up the video frame.

croemer 5 days ago

Parallelism isn't speed. Zstandard decompression is insanely fast, no need for occupying multiple cores.

Retr0id 5 days ago

Bloom filter lookups are embarrassingly parallel to the point that you could occupy 0 CPU cores and do it in a GPU fragment shader.

There are probably other reasons why this is a bad idea but I'm curious to try it.