bawolff 8 days ago

98% sounds good enough for the usecase suggested here.

2
pastage 8 days ago

Writing good validators for data is hard. You can be 100% sure that there will be bad data in those 98%. From my own experience I thought I had 50% of the books converted correctly and then I found I still had junk data and gave up, it is not an impossible problem I just was not motivated to fix it on my own. Working with your own copies is fine, but when you try to share that you get into legal issues that I just do not feel are that interesting to solve.

Edit: my point is that I would like to share my work but that is hard to do in a legal way. That is the main reason I gave up.

landl0rd 8 days ago

2% garbage, if some of that garbage falls out the right way, is more than enough to seriously degrade search result quality.

carlosjobim 8 days ago

It's better than nothing, and nothing is what we currently have.