I don't believe the document does a great job in explaining what is otherwise a very simple idea (assuming I understood it well):
1. It creates a bitmap where each bit is a pixel in the image, if from frame 0 to frame 1 a given pixel changed, the corresponding bit is 1, otherwise it is 0.
2. All the 1s are added to the bloom filter, hashing their offsets. Now the bloom filter will be positive for all such indexes plus a percentage of false positive indexes.
3. We query the bloom filter to see all the indexes that are positive, and for all such pixels we store the raw pixel data of what changed. So we can reconstruct the next frame easily.
You can think at this like as storing the delta between two frames as: x,y,r,g,b of all the pixels that changed, but compressing a lot the x,y part at the cost of storing a bit more r,g,b than needed.
I have the feeling that since the pixels that changes from frame 0 to frame 1 are often similar (in their location) to what will change from frame 1 to frame 2, there is the possibility of further compressing that as well, by setting the right flags in the next frame and storing verbatim the only offsets that changed in addition to the previous or alike.
This comment is why I go to the comments first.
Oh hey you're the guy who made kilo. Good job.
[edit] lol he edited it... they always edit it
I always love when people recognize me for kilo or dump1090 or hping and not for Redis :D Side projects for the win. Thanks for your comment!
I've literally never even used Redis, let alone know what it is or does. I dunno how I was able to make money in software since 2008 without figuring that out... or learning SQL, or C++. There's far more that I don't know than I do know. But anyway if you wrote Redis or something then congrats, I've definitely heard of it.
I have a theory, that I call "of the equivalence of modern software systems" that tells a lot about how unimportant Redis and other technologies are, that is: modern computing is so developed that pick any random language, kernel, and database, any of the top ones available, and I can create every project without too much troubles. PHP / Win32 / SQLite? Ok, I can make it work. Ruby / Linux / Redis? Well, fine as well.
I've noticed that too, LAMP stack vs MEAN stack etc.
Part of it seems to be that software languages have "flavors" like natural spoken language does, so that one seems natural to one person and foreign to another, much like how DHH took TypeScript out of Rails, and I can't read Rails code or live without static types.
Also college kids are always learning what the last generation already knew (like Lisp) and reinvent everything in it with the vigor of their youth and it gets popular, which I imagine is how Rails and Redis both started and caught on.
But sometimes there are genuine innovations, and we can't build on them until someone comes up with them, much like how we couldn't invent machines until someone figured out that we could just use the lever to make gears to transport energy, which Archimedes didn't think of. And the more we learn collectively, the more these things spread.
I wonder if this means it'll all stabilize into a JavaScript-like toolchain one day. Maybe Gary Bernhardt was right all along.
I still use CFML / MySQL (more specifically Lucee / MariaDB). Aside from not being one of the cool kids, I haven’t seen anything sufficiently compelling to justify changing my stack.
Heh. Me too! Throw in Coldbox and its ecosystem and you have most of what you need.
Software engineers have known for decades that the choice of implementation language is one of the smaller costs in a large project. Also, I personally never use a language that is less than 20 years old. Life cycle costs:
post-release fixes: 50%.
pre-release testing and debugging: 25%.
requirements, design, implementation, etc: 25%
redis is designed for scaling so if you don’t have a large project you don’t need it
Ain't it cute 'splaining what redis is designed for to the person who designed it
One of my CS professors told the story about when MCSFT demonstrated their implementation of the Korn shell at a USENIX conference. Gasbag on stage tells the guy in the question line that he (questioner) has misinterpreted something in the software's original intention. Gasbag figures out something is up when half the room cracked up. The guy at the mic was David Korn, author of the Korn shell.
You win the "someone on HN made me laugh out loud" award. The last person to win it was like a month ago so good job.
Honestly, the relatively high probability of this happening is part of what makes HN great :)
Redis is not designed for scaling. The 'default' version is a single-core app without any focus on availability.
Yes there's Redis Cluster, but that's a separate thing, and most Redis installations don't use it.
Redis is absolutely not designed for scaling. Maybe valkey is but I've yet to use it
A.k.a. all you need is PG and something to serve your app over HTTPS. :joy:
I think PG might insist on using a lisp too.
Does Postgres support Lisp?
I think u/messe was punning. I used "PG" to mean PostgreSQL, and u/messe probably knew that and probably meant PG as in Paul Graham, who is a huge fan of Lisp. I thought it was funny.
All you need to build a great app server is PG, certbot and a great app server.
Yeh. Data driven model syncing machines will be what kills languages, software stacks.
All the chip world talk of single function machines and how tech is now energy constrained industry, means pruning the state we store; most software is software to deliver software.
Single function pipeline hardware platforms will act as little more than sync engines from models.
I mean it’s is 2025. Still write software like it’s the 1970s. So high tech.
Edit: https://arxiv.org/abs/2309.10668
From LLMs to energy models that transform electromagnetic geometry. Saving what’s needed to recreate cat pics and emails to mom. Pruning the tail as interest dips with generational churn.
Also think about it this way:
Redis will eventually become obsolete. It may take 10 or 50 years, but it will happen.
But kilo taught many developers about C, editors, terminals, and how simple it is to do syntax highlighting. This epiphany inspired me and many others, and the things we make because of that will last much longer than Redis.
Redis does a lot of things most people don't know about and its all super optimised.. I am not so sure. I would not want that happen simple as I would be really bored using one way to do everything ( prb sql )
Most people use it as a cache,.. you can build a lot of things with it that are far outside this narrow view... ( say a bidding system or something like that )
And for the other comment Scaling is possible via the commercial offering.. sofaik..
I didn't get to persuade anyone to let me use it for that.. I really wanted to.
Super optimized? Veto. It could use much better modern thread-safe hashmaps. With at least twice the performance.
But it has a fantastic ease of use
dump1090 (but not Redis) user here! Hi! (We're about to publish a paper that depends on data taken with dump1090, funny enough...).
Awesome! Thanks! I had in mind of doing a V2 of dump1090 soon (time permitting) I have a few ideas that may improve the performances very significantly.
Is there any plan to have a JSON/JSONlines TCP endpoint in the V2? That’s not currently there, right? I want to process data in real time but then I have to decode the raw messages on port 30002 myself.
I wonder how good compression rations this really has...
It reminds me of experiments I did with wavelets for image compression some.. 22 years ago.
The inverse transform starts with a small pixel image and then uses just as many coefficients to transform it to an image twice as large (in width or height). This is done over and over again.
The point is that most of your data consists of these coefficients, and most of them are close to zero: so you can flush them to zero. Then the problem is mostly a matter of how to encode where you don't have zeroes: i.e. you have like a bitmap and an array of the non-zeroes. There were different algorithms that encoded non-zeroes, more or less conservatively, but they did often exploit the property that these non-zeroes typically were quite clustered — which is the opposite of a typical hash function used for a bloom filter.
Image compression this way had obviously very poor locality, first just for the transform and then for the coefficient compression, so it was quite slow. Therefore it did feel like a dead end.
Thing is, if you store delta change from one frame to another, then pixels which aren't changed are just zeroes. Compressing zero sequences is the most trivial exercise for lossless compression and unlike the bloom filter, it has no false positives.
I can see bloom filters as part of a complicated hybrid strategy compressor. The more tools on the belt of such a compressor, the better, but I don't think it'll improve things much on average.
The comparison is: for your use case, which is better? Decompress and matrix sum or grab keys and increment individual values?
There's some flexibility for how one would do that second increment, but of course naively it might look like constructing a sparse matrix and summing so that feels pretty similar. But the flexibility might be nice in some cases, and bloom filters are an extremely space efficient representation with arbitrarily low false positive rate
So how does the bloom filter help vs something like a hash table?
It sounds like your instinct here is that we need to store (x,y,r,g,b) tuples for everything that changed, and leave other pixels unchanged from the previous frame.
You could store all of that in a single map from (x,y) to (r,g,b), which would cost you, say, 4 bytes per pixel for the (x,y) data and then, say, 30 bits or (rounding up) 4 bytes per pixel for the (r,g,b) with 10-bit color. A hash map would optimize lookup, but for efficient storage you really just want an array of 8-byte chunks where the first two bytes are x, next 2 are y, next 10 bits are r, then g, then b. To decode, you just start with the previous frame and then iterate down the list and apply each diff in the list.
If you iterate over (x,y) in a consistent order, you avoid the need to store it because you can simply say that the next (r,g,b) tuple in your list is the pixel data for the next (x,y) value that passes the bloom filter. This effectively cuts the amount of data you have to store per pixel in half, but increases the number of pixels you have to store by a probabilistically small amount because bloom filters have false-positives.
You start out with the position of all the 1s (positions of changed pixels). A hashtable would be great for speeding up the query time, but does not compress anything. The bloom filter takes up much less space. The price is that you can get false positives.
You don't need to store the diffs for the coordinates that are not in the bloom filter. If the number of coordinates that changed is small, the size of the bloom filter will be small, and this is a significant optimization.
So it's just compressing consecutive 0s? Like most of compression algorithms do...?
It's more consistent.
As a rule of thumb, Bloom filters require 10 bits per element for a 1% false positive rate. Say 100 pixels changed between the frames, that 1000 bits or ~125 bytes to store the which-pixel-changed-map.
Runlength encoding of the (in)active bits can use anything from ~10-200 bytes for 100 pixels (Say 1 byte per run, 200 runs of active vs. inactive in the worst case).
A lot of video compression is about motion. How do you handle the same pixels sliding two pixels to the left due to a pan?
You could definitely layer block-level motion prediction in before this step, though
How do you go from frame n+1 to frame n+2?
Similar to other compression methods, it uses keyframes (which are saved frames that exist throughout the video) as the starting point, after which the next frame is a diff of that until you reach the next keyframe. It seems to generate keyframes when the compression drops below a certain level (at which point you might as well just store the compressed frame instead of the bloom bitmap).
https://github.com/ross39/new_bloom_filter_repo/blob/4798d90...