Item 44101496

antirez • 5 days ago

I don't believe the document does a great job in explaining what is otherwise a very simple idea (assuming I understood it well):

1. It creates a bitmap where each bit is a pixel in the image, if from frame 0 to frame 1 a given pixel changed, the corresponding bit is 1, otherwise it is 0.

2. All the 1s are added to the bloom filter, hashing their offsets. Now the bloom filter will be positive for all such indexes plus a percentage of false positive indexes.

3. We query the bloom filter to see all the indexes that are positive, and for all such pixels we store the raw pixel data of what changed. So we can reconstruct the next frame easily.

You can think at this like as storing the delta between two frames as: x,y,r,g,b of all the pixels that changed, but compressing a lot the x,y part at the cost of storing a bit more r,g,b than needed.

I have the feeling that since the pixels that changes from frame 0 to frame 1 are often similar (in their location) to what will change from frame 1 to frame 2, there is the possibility of further compressing that as well, by setting the right flags in the next frame and storing verbatim the only offsets that changed in addition to the previous or alike.

90s_dev • 5 days ago

This comment is why I go to the comments first.

Oh hey you're the guy who made kilo. Good job.

[edit] lol he edited it... they always edit it

1 reply

antirez • 5 days ago

I always love when people recognize me for kilo or dump1090 or hping and not for Redis :D Side projects for the win. Thanks for your comment!

3 replies

90s_dev • 5 days ago

I've literally never even used Redis, let alone know what it is or does. I dunno how I was able to make money in software since 2008 without figuring that out... or learning SQL, or C++. There's far more that I don't know than I do know. But anyway if you wrote Redis or something then congrats, I've definitely heard of it.

1 reply

antirez • 5 days ago

I have a theory, that I call "of the equivalence of modern software systems" that tells a lot about how unimportant Redis and other technologies are, that is: modern computing is so developed that pick any random language, kernel, and database, any of the top ones available, and I can create every project without too much troubles. PHP / Win32 / SQLite? Ok, I can make it work. Ruby / Linux / Redis? Well, fine as well.

7 replies

taneq • 5 days ago

All modern stacks are CRUD-complete as well as Turing-complete? ;)

90s_dev • 5 days ago

I've noticed that too, LAMP stack vs MEAN stack etc.

Part of it seems to be that software languages have "flavors" like natural spoken language does, so that one seems natural to one person and foreign to another, much like how DHH took TypeScript out of Rails, and I can't read Rails code or live without static types.

Also college kids are always learning what the last generation already knew (like Lisp) and reinvent everything in it with the vigor of their youth and it gets popular, which I imagine is how Rails and Redis both started and caught on.

But sometimes there are genuine innovations, and we can't build on them until someone comes up with them, much like how we couldn't invent machines until someone figured out that we could just use the lever to make gears to transport energy, which Archimedes didn't think of. And the more we learn collectively, the more these things spread.

I wonder if this means it'll all stabilize into a JavaScript-like toolchain one day. Maybe Gary Bernhardt was right all along.

simondotau • 5 days ago

I still use CFML / MySQL (more specifically Lucee / MariaDB). Aside from not being one of the cool kids, I haven’t seen anything sufficiently compelling to justify changing my stack.

1 reply

catmacey • 4 days ago

Heh. Me too! Throw in Coldbox and its ecosystem and you have most of what you need.

very-old-sw • 3 days ago

Software engineers have known for decades that the choice of implementation language is one of the smaller costs in a large project. Also, I personally never use a language that is less than 20 years old. Life cycle costs:

post-release fixes: 50%.

pre-release testing and debugging: 25%.

requirements, design, implementation, etc: 25%

tehjoker • 5 days ago

redis is designed for scaling so if you don’t have a large project you don’t need it

3 replies

tie_ • 5 days ago

Ain't it cute 'splaining what redis is designed for to the person who designed it

5 replies

butterfi • 4 days ago

One of my CS professors told the story about when MCSFT demonstrated their implementation of the Korn shell at a USENIX conference. Gasbag on stage tells the guy in the question line that he (questioner) has misinterpreted something in the software's original intention. Gasbag figures out something is up when half the room cracked up. The guy at the mic was David Korn, author of the Korn shell.

tehjoker • 5 days ago

I thought I was replying to a subcommenter... oopsie

90s_dev • 5 days ago

You win the "someone on HN made me laugh out loud" award. The last person to win it was like a month ago so good job.

pmarreck • 4 days ago

Honestly, the relatively high probability of this happening is part of what makes HN great :)

qingcharles • 5 days ago

I read it in Ralph Wiggum's voice

joaohaas • 4 days ago

Redis is not designed for scaling. The 'default' version is a single-core app without any focus on availability.

Yes there's Redis Cluster, but that's a separate thing, and most Redis installations don't use it.

secondcoming • 5 days ago

Redis is absolutely not designed for scaling. Maybe valkey is but I've yet to use it

cryptonector • 5 days ago

A.k.a. all you need is PG and something to serve your app over HTTPS. :joy:

2 replies

messe • 5 days ago

I think PG might insist on using a lisp too.

1 reply

metadat • 5 days ago

Does Postgres support Lisp?

2 replies

cryptonector • 5 days ago

I think u/messe was punning. I used "PG" to mean PostgreSQL, and u/messe probably knew that and probably meant PG as in Paul Graham, who is a huge fan of Lisp. I thought it was funny.

nix-zarathustra • 5 days ago

No, but it has a stutter.

nssnsjsjsjs • 3 days ago

All you need to build a great app server is PG, certbot and a great app server.

braaaahp • 5 days ago

Yeh. Data driven model syncing machines will be what kills languages, software stacks.

All the chip world talk of single function machines and how tech is now energy constrained industry, means pruning the state we store; most software is software to deliver software.

Single function pipeline hardware platforms will act as little more than sync engines from models.

I mean it’s is 2025. Still write software like it’s the 1970s. So high tech.

Edit: https://arxiv.org/abs/2309.10668

From LLMs to energy models that transform electromagnetic geometry. Saving what’s needed to recreate cat pics and emails to mom. Pruning the tail as interest dips with generational churn.

90s_dev • 5 days ago

Also think about it this way:

Redis will eventually become obsolete. It may take 10 or 50 years, but it will happen.

But kilo taught many developers about C, editors, terminals, and how simple it is to do syntax highlighting. This epiphany inspired me and many others, and the things we make because of that will last much longer than Redis.

1 reply

mattbis • 5 days ago

Redis does a lot of things most people don't know about and its all super optimised.. I am not so sure. I would not want that happen simple as I would be really bored using one way to do everything ( prb sql )

Most people use it as a cache,.. you can build a lot of things with it that are far outside this narrow view... ( say a bidding system or something like that )

And for the other comment Scaling is possible via the commercial offering.. sofaik..

I didn't get to persuade anyone to let me use it for that.. I really wanted to.

1 reply

rurban • 4 days ago

Super optimized? Veto. It could use much better modern thread-safe hashmaps. With at least twice the performance.

But it has a fantastic ease of use

cozzyd • 5 days ago

dump1090 (but not Redis) user here! Hi! (We're about to publish a paper that depends on data taken with dump1090, funny enough...).

1 reply

antirez • 5 days ago

Awesome! Thanks! I had in mind of doing a V2 of dump1090 soon (time permitting) I have a few ideas that may improve the performances very significantly.

1 reply

echoangle • 4 days ago

Is there any plan to have a JSON/JSONlines TCP endpoint in the V2? That’s not currently there, right? I want to process data in real time but then I have to decode the raw messages on port 30002 myself.

1 reply

antirez • 4 days ago

That's a good idea. Adding to the TODO list.

1 reply

echoangle • 4 days ago

Awesome, thanks!

Findecanor • 5 days ago

I wonder how good compression rations this really has...

It reminds me of experiments I did with wavelets for image compression some.. 22 years ago.

The inverse transform starts with a small pixel image and then uses just as many coefficients to transform it to an image twice as large (in width or height). This is done over and over again.

The point is that most of your data consists of these coefficients, and most of them are close to zero: so you can flush them to zero. Then the problem is mostly a matter of how to encode where you don't have zeroes: i.e. you have like a bitmap and an array of the non-zeroes. There were different algorithms that encoded non-zeroes, more or less conservatively, but they did often exploit the property that these non-zeroes typically were quite clustered — which is the opposite of a typical hash function used for a bloom filter.

Image compression this way had obviously very poor locality, first just for the transform and then for the coefficient compression, so it was quite slow. Therefore it did feel like a dead end.

3cats-in-a-coat • 5 days ago

Thing is, if you store delta change from one frame to another, then pixels which aren't changed are just zeroes. Compressing zero sequences is the most trivial exercise for lossless compression and unlike the bloom filter, it has no false positives.

I can see bloom filters as part of a complicated hybrid strategy compressor. The more tools on the belt of such a compressor, the better, but I don't think it'll improve things much on average.

1 reply

wrsh07 • 5 days ago

The comparison is: for your use case, which is better? Decompress and matrix sum or grab keys and increment individual values?

There's some flexibility for how one would do that second increment, but of course naively it might look like constructing a sparse matrix and summing so that feels pretty similar. But the flexibility might be nice in some cases, and bloom filters are an extremely space efficient representation with arbitrarily low false positive rate

bilsbie • 5 days ago

So how does the bloom filter help vs something like a hash table?

3 replies

hxtk • 4 days ago

It sounds like your instinct here is that we need to store (x,y,r,g,b) tuples for everything that changed, and leave other pixels unchanged from the previous frame.

You could store all of that in a single map from (x,y) to (r,g,b), which would cost you, say, 4 bytes per pixel for the (x,y) data and then, say, 30 bits or (rounding up) 4 bytes per pixel for the (r,g,b) with 10-bit color. A hash map would optimize lookup, but for efficient storage you really just want an array of 8-byte chunks where the first two bytes are x, next 2 are y, next 10 bits are r, then g, then b. To decode, you just start with the previous frame and then iterate down the list and apply each diff in the list.

If you iterate over (x,y) in a consistent order, you avoid the need to store it because you can simply say that the next (r,g,b) tuple in your list is the pixel data for the next (x,y) value that passes the bloom filter. This effectively cuts the amount of data you have to store per pixel in half, but increases the number of pixels you have to store by a probabilistically small amount because bloom filters have false-positives.

johanvts • 4 days ago

You start out with the position of all the 1s (positions of changed pixels). A hashtable would be great for speeding up the query time, but does not compress anything. The bloom filter takes up much less space. The price is that you can get false positives.

returningfory2 • 5 days ago

You don't need to store the diffs for the coordinates that are not in the bloom filter. If the number of coordinates that changed is small, the size of the bloom filter will be small, and this is a significant optimization.

1 reply

raincole • 5 days ago

So it's just compressing consecutive 0s? Like most of compression algorithms do...?

1 reply

grumbelbart2 • 4 days ago

It's more consistent.

As a rule of thumb, Bloom filters require 10 bits per element for a 1% false positive rate. Say 100 pixels changed between the frames, that 1000 bits or ~125 bytes to store the which-pixel-changed-map.

Runlength encoding of the (in)active bits can use anything from ~10-200 bytes for 100 pixels (Say 1 byte per run, 200 runs of active vs. inactive in the worst case).

hinkley • 5 days ago

A lot of video compression is about motion. How do you handle the same pixels sliding two pixels to the left due to a pan?

2 replies

01HNNWZ0MV43FF • 5 days ago

You could definitely layer block-level motion prediction in before this step, though

djmips • 5 days ago

It doesn't as far as I can tell.

macleginn • 5 days ago

How do you go from frame n+1 to frame n+2?

1 reply

Salgat • 5 days ago

Similar to other compression methods, it uses keyframes (which are saved frames that exist throughout the video) as the starting point, after which the next frame is a diff of that until you reach the next keyframe. It seems to generate keyframes when the compression drops below a certain level (at which point you might as well just store the compressed frame instead of the bloom bitmap).

https://github.com/ross39/new_bloom_filter_repo/blob/4798d90...