They run Leta on diskless servers, just like the VPN:
>We run the Leta servers on STBooted RAM only servers, the same as our VPN servers. These servers run the latest Ubuntu LTS, with our own stripped down custom Mullvad VPN kernel which we tune in-house to remove anything unnecessary for the running system. > >The cached search results are stored in an in-memory Redis key / value store.
This is surprising given that they try to cache results for 30 days:
>Each search that has not already been cached is saved in RAM for 30 days. The idea is that the more searches performed, the larger and more substantial the cached results become, therefore aiding with privacy.
That's surprising because presumably they lose all results if they have to reboot the server.
With a VPN service, there's not much they have to store past the lifetime of the VPN session, but if they're storing search results for 30 days, I wonder how they deal with this? Maybe best effort is fine because they don't strictly need to cache the results, as it just provides marginal privacy improvements.
"That's surprising because presumably they lose all results if they have to reboot the server."
Strictly speaking they only lose all results, FOR SURE, if they have to reboot ALL the servers at the same time. If they implemented a system where the cached results are shared and replicated among all their servers, it can in theory be kept cached indefinitely.
From the FAQ:
> Each time the Leta application is restarted (due to an upgrade, or new version) server side, a new secret hash is generated, meaning that all previous search queries are no longer visible to Leta.
If I read this correctly, the cached data is per-instance, there would be no way to share cached data among instances if each one has its own secret hash and they are cycled on each start.
yes, they state in the FAQ, any updates to the system clear the cache. Caching is due to query cost.
Cost that's external, too: Brave or Google are behind the results. Things would be terrible without the cache... but that doesn't mean every request needs to be cached. Can't - gotta source it.
Wouldn't want to hang onto things too long, current events run out of currency :)
> This is surprising ... as it just provides marginal privacy improvements.
Diskless does not mean SSH-less or network-less. The "data" can be pulled / pushed just the same, which is to say, Diskless, in this case, is no better than verifiably read-only partitions (like on ChromeOS & Android, for example).
Sorry, I don't know what you mean. When I said it provides marginal privacy improvements, I meant the caching, not the disklessness.
Diskless does provide privacy improvements, as it drastically reduces the odds of something accidentally persisting to storage.
Diskless (edit: with OS in initramfs) is indeed a golden standard against local persistence, but requires quite a bit of extra RAM - few GB for "latest Ubuntu LTS".
With regards to preventing accidental persistence, disk with only dm-verity partitions is as good, with extra advantage of only adding a little bit of extra RAM usage (/tmp, /var/run, ...)
For that matter, even something as sloppy as booting with rootfs wich can't be remounted rw (iso9660, squashfs, etc..) and is the only mounted fs, is also perfectly good against accidental persistence.
You could run from NFS and not need much extra ram. Plus you save like $25/node by not having a local disk.
You could go the extreme and boot off Google Drive (or any other fuse FS).
If they are running in a VM they could live migrate the VM to a different machine if they need to reboot. That or a cluster of Redis caches.
So running a diskless host OS for a hypervisor and then diskless VMs on top of that? Sounds like a nightmare before even considering live migrations on top. Also what if they need to reboot the VM itself?
The cache is per-instance. A cluster of Redis caches would also limit the whole cache to the RAM size of one machine, so that is a non-starter.