I have trouble following the motivation here.
Isn't the idea of lossless pretty meaningless for all consumer grade purposes, especially if the input example was already mangled through YouTube's transcoders?
Besides possibly going from some MPEG1 or older to .h264/like lossless transcoding, I see no benefit in lossless methods here.
From my personal experience, I tried archiving some of my old DVDs where no better sources are available for purchase for the forseeable future by transcoding to even .265 at absurdly high bitrates, but they all looked worse at higher bitrates than simply re-containerized MPEG2 for media-server streaming.
All you do is transcode the output of an existing compressing wasting information with conserving artifacts from a prior reductive step.
4:4:4 LOG or something would be a target to benchmark here.
Even "lossy" Apple ProRes start out at 275 Mbit/s for ProRes 4444 at HD25p already.
I think you're in a misunderstanding indeed.
This is not about lossless being practical now with this, or about reeconding YouTube videos with this providing any practical utility, or anything. It's just about using bloom filters for compression. The motivation was just the technical interest in bloom filters. They say as much in the readme:
> This project explores an unconventional approach: repurposing Bloom filters—typically used for membership testing—as a lossless video compression mechanism.
The video source is practically irrelevant as long as it's an actual video.
I just don't get why it's reinventing the whole wheel here – try using an off-the-shelf codec and tack on a sparse correction. For example, encode frames with a modern lossless/near-lossless codec (AV1 with QP=0) and then append a tiny bitmask+delta residual for perfect reconstruction. These codecs already exploit motion compensation, intra-prediction and DCT-like transforms to minimize frame-to-frame deltas.
In practice you’d likely get better compression (and faster progress) by piggybacking on AV1 strengths – then use Bloom-filter trick just on the leftover sparse differences – rather than building a new codec from scratch.
The residuals can be expected to be quite noisy and will likely not profit from any intra frame predictability anymore.
JPEG XL’s lossless engine already improves on PNG by ~35%, that and other general purpose compression methods would then be the per frame benchmark here on the residuals to beat.
In short: use the proven gear (motion-compensated blocks + modern transforms) to do the heavy lifitng, then let the bloom filter chase the hopefully comparably small residual.
As a showcase of what bloom filters are this would be still worthwhile, but I don't see any practical benefit here yet.
Not to forget, there is a reason visually lossless is the de facto the norm now, even in production grade environments, storage space is still not free while the average uncompressed display stream easily reaches way north of 5Gbps now easily, there is only so much lossless can resonably do here.
Yes, they could do a lot of other things, but those other things would not be this. I think your expectations are a bit misplaced. Maybe try giving all this a read again a day later?
As mentioned, OP is not expecting people to use the compression algo on production stuff. You can think of it as an experiment to see how well bloom filters would apply to video compression.
Are the results impressive? Probably not, but it's still an interesting idea.
Of course every round lossy of encoding further discards data. RemoveData(RemoveData(source)) is always going to look worse than just RemoveData(source). Newer encoders manage to remove less visual data per byte of storage used but there’s no way re-encoding is going to ever look better.
If I understand it right, some lossy codecs can be implemented in an idempotent (and still standard-compliant) way, so there would be no generational loss (in select specific cases, e.g. matched settings). I'm also aware that e.g. JPEG-XL can reencode JPEGs without generational loss, while still improving compression efficiency a bit. But I never looked too deep into the math.
lossless isn't meaningless. Re-encoding introduces a lot of artifacts. Imagine your comment in 2001 when someone stored their movies in mjpeg or whatever. The moved to MP4, than the h264 than perhaps to HEVC. You realize how shit that movie would look after all those re-encode cycles?
That's exactly what im talking about.
Re-Containerizing MPEG-TS as-is to something like mkv, vs. Transcoding is exactly what I'm talking about here.
There are currently not any meaningful ways known to me, to even make MPEG2 files significantly smaller in way more modern and advanced codecs without loosing perceived quality even at the same bitrate.
Not even talking about interlacing issues here.
So for anything MPEG2 and newer lossless reencoding seems quite a futile excersise to me from my personal experience.
If there is a promising way I'm all here for it, but this sadly doesnt look like it.