kayodelycaon 1 day ago

I asked ChatGPT to give Wikipedia links in a table. Not one of the 50+ links was valid.

2
swores 1 day ago

Which version of GPT? I've found that 4o has actually been quite good at this lately, rarely hallucinating links any more.

Just two days ago, I gave it a list of a dozen article titles from a newspaper website (The Guardian), asked it to look up their URLs and give me a list, and to summarise each article for me, and it made no mistakes at all.

Maybe your task was more complicated to do in some way, maybe you're not paying for ChatGPT and are on a less able model, or maybe it's a question of learning how to prompt, I don't know, I just know that for me it's gone from "assume sources cited are bullshit" to "verify each one still, but they're usually correct".

lolinder 23 hours ago

> asked it to look up their URLs and give me a list

Something missing from this conversation is whether we're talking about the raw model or model+tool calls (search). This sounds like tool calls were enabled.

And I do think this is a sign that the current UX of the chatbots is deeply flawed: even on HN we don't seem to interact with the UI components to toggle these features frequently enough that they're the intuitive answer, instead we still talk about model classes as though that makes the biggest difference in accuracy.

swores 23 hours ago

Ah, yes you're right - I didn't clarify this in my original comment, but my anecdote was indeed the ChatGPT interface and using its ability to browse the web[#], not expecting it to pull URLs out of its original training data. Thanks for pointing that out.

But the reason I suggested model as a potential difference between me and the person I replied to, rather than ChatGPT interface vs. plain use of model without bells and whistles, is that they had said their trouble was while using ChatGPT, not while using a GPT model over the API or through a different service.

[#] (Technically I didn't, and never do, have the "search" button enabled in the chat interface, but it's able to search/browse the web without that focus being selected.)

lolinder 21 hours ago

Right, but ChatGPT doesn't always automatically use search. I don't know what mechanisms it uses to decide whether to turn that on (maybe free accounts vs paid makes a difference?) but I rarely see it automatically turn on search, it usually tries to respond directly from weights.

And on the flip side, my local Llama 3 8b does a pretty good job at avoiding hallucinations when it's hooked up to search (through Open WebUI). Search vs no-search seems to me to matter far more than model class.

swores 19 hours ago

I'm just specific in my prompting, rather than letting it decide whether or not to search.

These models aren't (yet, at least) clever enough to understand what they do or don't know, so if you're not directly telling them when you want them to go and find specific info rather than guess at it you're just asking a mystic with a magic ball.

It doesn't add much to the length of prompts, just a matter of getting in the habit of wording things the right way. For the request I gave as my example a couple of comments above, I wrote "Please search for every one of the Guardian articles whose titles I pasted above and give me a list of URLs for them all." whereas if you write "Please tell me the URLs of these Guardian articles" then it may well act as if it knows them already and return bullshit.

kayodelycaon 19 hours ago

Definitely more complicated. I've been playing around with using it to analyze historical data and using it to generate charts. And yes I've tried many different kinds of phrasing. I have experience working with and writing rules based "expert systems" and have a vague idea of how neural networks are used for image recognition. It's a pretty fun game to get useful information out of ChatGPT.

You cannot ask it to have crop yield as a column in a chart and get accurate information.

It only seems reasonable when doing a single list of items. Asking it for two columns of data and it starts making things up. Like bogus wikipedia links.

You could definitely make the argument I'm using it wrong but this is how people try to use it. I still find this useful because it gives me a start on where to point my research or ask clarifying questions.

It's much better at giving you a list of types of beer and wine that's been produced in history. Just don't trust any of the dates.

swores 19 hours ago

If you could share the actual prompts & info you wanted I would be curious to try and see if it is indeed too complex for it or if prompting differently would work better, because I've had it produce tables with multiple columns pulling info from different sources for different columns before so that's definitely not a hard limit... so would be happy to come back to you either with advice on how to do it next time, or with agreement that having tried it myself it is indeed ChatGPT not your prompting that was the problem.

kayodelycaon 19 hours ago

Prompt:

I would like a list of east Indiamen from 1750 to 1800 where you can find how many tons burthen and how many crew. Show as a chart and give me the wikipedia links to the ships. Do not include any ships that do not have wikipedia links.

Here's my customization:

    What do you do?: 
    Software Engineer

    What traits should ChatGPT have?: 
    Show all the options
    Be practical above all.

    Anything else ChatGPT should know about you?:
    I’m an author of science fiction and fantasy.
    I like world building for stories.

I know there's hundreds of ways to phrase this and I could probably trick it into generating the chart first and finding the wikipedia links second. :)

selimthegrim 8 hours ago

I can’t decide whether I’m more tempted to feed it “Using Metadata to Find Paul Revere” as a prompt or try to see if it identifies Obra Dinn as an East Indiaman

0xFEE1DEAD 22 hours ago

Sorry for going off topic here but I've had the same experience.

I'm not sure which update improved 4o so greatly but I get better responses from 4o than from o4-mini, o4-mini-high, and even o3. o4 and o3 have been disappointing lately - they have issues understanding intent, they have issues obeying requests, and it happened multiple times that they forgot the context even though the conversation consisted of only 4 messages without a huge number of tokens. In terms of chain-of-thought models I prefer DeepSeek over any OpenAI model (4.5 research seems great, but it’s just way too expensive).

It's rather disappointing how OpenAI releases new models that seem incredible, and then, to reduce the cost of running them, they slowly slim these models down until they're just not that good anymore.

swores 19 hours ago

No need for the apology, and FYI I broadly agree with everything you say (except about 4.5, which I don't actively disagree with I just haven't played with it myself).

alphan0n 21 hours ago

Share the link to the conversation.