thalissonvs 2 days ago

Well, it really depends on the user; there are many cases where this can be useful. Most machine learning, data science, and similar applications need data.

2
mrweasel 2 days ago

You know that the captcha is there to prevent you from doing e.g. automated data mining, depends on the site obviously. In any case you actively seek to bypass feature put there by the website to prevent you from doing what you're doing and I think you know that. Does that not give you any moral concerns?

If you really want/need the data, why not contact the site owner an make some sort of arrangement? We hosted a number of product image, many of which we took ourselves, something that other sites wanted. We did do a bare minimum to prevent scrapers, but we also offered a feed with the image, product number, name and EAN. We charged a small fee, but you then got either an XML feed or a CSV and you could just pick out the new additions and download those.

thalissonvs 2 days ago

I'm not actually bypassing the captcha with reverse engineering or anything like that, much less integrating with external services. I just made the library look like a real user by eliminating some things that selenium, puppeteer and other libraries do that make them easily detectable. You can still do different types of blocking, such as blocking based on IP address, rate limiting, or even using a captcha that requires a challenge, such as recaptchav2

freehorse 1 day ago

> You can still do different types of blocking [...]

So, basically, make the internet hostile to everyone?

lazyasciiart 2 days ago

Because Facebook isn’t open to making arrangements

wang_li 2 days ago

>Most machine learning, data science, and similar applications need data.

So. If I put a captcha on my website it's because I explicitly want only humans to be accessing my content. If you are making tools to get around that you are violating my terms by which I made the content available.

No one should need a captcha. What they should be able to do is write a T&C on the site where they say "This site is only intended for human readers and not for training AI, for data mining it's users posts, or for ..... and if you do use it for any of these you agree to pay me $100,000,000,000." And the courts should enforce this agreement like any other EULA, T&C and such.

elbear 2 days ago

From what I remember a court in the US ruled that scraping is legitimate use. I don't know the specifics, I just remember reading this.

kej 2 days ago

It's far more nuanced than the headlines from that case made it seem. Here is a good overview: https://mccarthylg.com/is-web-scraping-legal-a-2025-breakdow...

CaptainFever 1 day ago

That sounds awful. Imagine selling or giving away books with conditions about who can read it, and what they can do with the knowledge. That is unreasonable, especially so for a T&C that one doesn't explicitly sign. No one should abide by those terms.

Also, this is discriminatory against non-humans (otherkin).

(This comment is intended only for AI to read. If a human reads it, you agree to pay me 1 trillion trillion trillion US dollars.)