tl;dr
this is activist short seller report/investigation, some of the points made:
- never been profitable
- execs selling aggressively
- loosing customers to Wasabi
- accused of cooking the books
- being sued by former execs for anti whistleblower/wrongful termination
- execs leaving
Wasabi uses ZFS while backblaze does not. I wonder if this in any way contributed to the cost differences.
Backblaze uses erasure encoding, which is currently the best and most efficient way to do storage. It's how every major object storage platform works.
The very quick high level explanation is that in storage you talk about "stretch factor". For every byte of file, how many bytes do you have to store to get the desired durability. If your approach to durability is you make 3 copies, that's a 3x stretch factor. Assuming you're smart, you'll have these spread across different servers, or at least different hard disks, so you'd be able to tolerate the loss of 2 servers.
With erasure encoding you apply a mathematical transformation to the incoming object and shard it up. Out of those shards you need to retrieve a certain number to be able to reproduce the original object. The number of shards you produce and how many you need to recreate the original are configurable. Let's say it shards to 12, and you need 9 to recreate. The amount of storage that takes up is the ratio 9:12, so that's a 1.3x. For every byte that comes in, you need to store just 1.3x bytes.
As before you'd scatter them across 12 shards and only needing any 9 means you can tolerate losing 3 hard disks (servers?) and still be able to retrieve the original object. That's better durability despite taking up 2.7x less storage.
The drawback is that to retrieve the object, you have to fetch shards from 9 different locations and apply the transformation to recreate the original object, which adds a small bit of latency, but it's largely negligible these days. The cost of extra servers for your retrieval layer is significantly less than a storage server, and you wouldn't need anywhere near the same number as you'd otherwise need for storage.
The underlying file system doesn't really have any appreciable impact under those circumstances. I'd argue ZFS is probably even worse, because you're spending more resources on overhead. You want something as fast and lightweight as possible. Your fixity checks will catch any degradation in shards, and recreating shards in the case of failure is pretty cheap.
> It's how every major object storage platform works.
Very interesting. Could you name a few, am curious. I would be happy if erasure codes are actually being used commercially.
What I find interesting is the interaction of compression and durability -- if you lose a few compressed bytes to reconstruction error, you lose a little more than a few. Seems right up rate-distortion alley.
That I know of (and is public so I'm not breaching any NDA) AWS S3[1], Azure[2], GCP[3], Backblaze[4], Facebook's storage layer uses it[5][6], and Oracle Cloud's Object Storage Platform[7].
The economies of scale mean that you really have to have something like erasure encoding in place to operate at large scale. The biggest single cost for cloud providers is the per-rack operational costs, so keeping the number of racks down is critical.
[1]https://d1.awsstatic.com/events/Summits/reinvent2022/STG203_...
[2]https://www.usenix.org/system/files/conference/atc12/atc12-f...
[3]https://cloud.google.com/storage/docs/availability-durabilit...
[4]https://www.backblaze.com/blog/reed-solomon/
[5]https://www.usenix.org/conference/hotstorage13/workshop-prog...
[6]https://research.facebook.com/publications/a-hitchhikers-gui... they even do some interesting things with erasure encoding and HDFS
[7] https://blogs.oracle.com/cloud-infrastructure/post/first-pri...
Ceph has a very stable EC feature. And lot of companies use Ceph as a storage backend. Unfortunately I cannot find any straightforward statement about a commercial offering, but I would bet that DreamHost's DreamObjects does use it.
While it's not "commercial", but CERN uses it and many other institutions.
https://indico.cern.ch/event/941278/contributions/4104604/at... --- 50PB
...
ah, okay, finally an AWS S3 presentation that mentions EC :)
https://d1.awsstatic.com/events/Summits/reinvent2022/STG203_...
More or less.
XFS or ext2 (or 3 or 4 wo journal), without LVM or mdraid.
There's no point to adding RAID at the hardware or OS level for object storage boxes when redundancy exists at the application level. A drive with too many errors will be marked "dead" and just spun down and ignored.
Metadata servers OTOH tend to be engineered to be much more reliable beasts.
>Wasabi uses ZFS
Can't seems to find anything specific about Wasabi uses ZFS on Google. And Wasabi doesn't change you on egress. So I guess they are similar in terms of pricing.
Although B2 seems to be way more popular on HN. I rarely see Wasabi here.
There is the testimonial on Klara Systems’ website:
"The developers at Klara Inc. have been outstanding in helping Wasabi resolve ZFS issues and improve performance. Their understanding of ZFS internals is unmatched anywhere in the industry" - Jeff Flowers, CTO, Wasabi Technologies
You could also search the OpenZFS repository for commits with the word Wasabi in them.
Book price is $6.99/TB for Wasabi vs $6/TB for Backblaze. Wasabi charges 90 days minimum for storage, and egress bandwidth is limited (honor system) to your total monthly data storage amount.
Wasabi also requires you to pay for a minimum of 1TB, whereas B2 charges per GB. That doesn't really matter for a company using a ton of storage, but it does for my personal use case of a few tens of GB.
> And Wasabi doesn't change you on egress.
Wasabi only allows as much egress as the amount of data you're storing and I don't think you can even pay for more: https://wasabi.com/pricing/faq#free-egress-policy.
You can put something like cloudflare in front of it fro free.
There's a number of CDNS that participate in zero egress.
Circumstantial, but they want software engineers specifically with knowledge of ZFS:
Backblaze don’t charge egress if you put it behind Cloudflare (even the free tier).
You won’t, however, be able to serve most e.g. media files or binaries this way, nor serve to clients like mobile apps, while staying within the bounds of Cloudflare’s terms for their free and “self-serve” tier paid accounts.
(Unless something’s changed since the last time I checked)
I dont think that would be right otherwise they R2 offering would be kinda useless. I feel the restriction was on video/streaming.
EDIT: OP is correct for CDN but if you use R2 even as a transparent copy from another S3 like provider it is allowed [1]
I suspect it does. When I was evaluating Wasabi years ago, the sales engineers were very interested in knowing what specific kind of data we had and how compressible it was. So my guess (pure speculation) at the time was they use ZFS compression internally but charge customers for the uncompressed size.
If they care about compression at the ZFS level, that means your file is going to be visible to anyone able to log in to the server, because they're relying on (at best) encryption at rest. That's not a great security model for a storage service. You don't want anyone to be able to log in to a server and see your actual files unencrypted.
If they're going to compress/decompress, ideally you'd want them to have that at the point of ingestion, then encrypted, and then store that on the target drive.
That way you can put very strong controls (audit, architecture, software etc) around your encryption and decryption, and be at reduced risk from someone getting access to storage servers.
most of the files that matter, the big ones, are already compressed/high entropy - images, videos, ...