hxtk 4 days ago

The bloom filter lookups here are fundamentally not parallel because they are coupled to a serial iteration.

Ultimately the data they are storing is tuples of (x,y,r,g,b), which means that it is not sufficient to ask "does (x1,y1) pass the bloom filter?" which would be embarrassingly parallel; you have tobe able to match it to an index in your list of (r,g,b) data.

The way this is done is to iterate over all (x,y) in a consistent order for encoding and decoding, and for each (x1,y1), if it passes the bloom filter, then the next (r1,g1,b1) tuple is the pixel data for that pixel.

1
thesz 4 days ago

You may perform n parallel lookups into Bloom filter then use state machine table to perform k (most often k < n) actions, oftentimes in parallel (SIMD) too because it is just byte shuffle.

hxtk 4 days ago

Hmm, that's a good point, since the bloom filter check is CPU-harder than the action taken on lookup, you could just do all the lookups in parallel and then do all of the copies separately.