Item 44131546 - HN

Buttons840 • 7 days ago

LLMs aren't my rubber duck, they're my wrong answer.

You know that saying that the best way to get an answer online is to post a wrong answer? That's what LLMs do for me.

I ask the LLM to do something simple but tedious, and then it does it spectacularly wrong, then I get pissed off enough that I have the rage-induced energy to do it myself.

8

Buttons840 • 7 days ago

I'm probably suffering undiagnosed ADHD, and will get stuck and spend minutes picking a function name and then writing a docstring. LLMs do help with this even if they get the code wrong, because I usually won't bother to fix their variables names or docstring unless needed. LLMs can reliably solve the problem of a blank-page.

linotype • 7 days ago

This. I have ADHD and starting is the hardest part for me. With an LLM it gets me from 0 to 20% (or more) and I can nail it for the rest. It’s way less stressful for me to start now.

raihansaputra • 7 days ago

very much agree. although lately with how good it is i get hyperfocused and spent more time then i allocated because i ended up wanting to implement more than i planned.

linotype • 7 days ago

It’s a struggle right? First world LLM problems.

bayesianbot • 7 days ago

Been suffering the same, I'm used to having so many days (weeks/months) when I just don't get that much done. With LLMs I can take these days and hack around / watch videos / play games while the LLM is working on background and just check the work. Best part is it often leads to some problematic situation that gets me involved and often I'll end up getting a real day of work out of it after I get started.

albrewer • 7 days ago

> LLMs can reliably solve the problem of a blank-page.

This has been the biggest boost for me. The number of choices available when facing a blank page is staggering. Even a bad/wrong implementation helps collapse those possibilities into a countable few that take far less time to think about.

msgodel • 7 days ago

Yeah, keeping me in the flow when I hit one of those silly tasks my brain just randomly says "no let's do something else" to has been the main productivity improving feature of LLMs.

mystified5016 • 7 days ago

Yes! So many times my brain just skips right over some tasks because it takes too much effort to start. The LLM can give you something to latch onto and work with. It can lay down the starting shape of a function or program and even when it's the wrong shape, you still have something to mold into the correct shape.

The thing about ADHD is that taking a task from nothing to something is often harder than turning that something into the finished product. It's really weird and extremely not fun.

pipes • 7 days ago

This is the complete opposite for me! I really like a blank page, the thought of writing a prompt destroys my motivation as does reviewing the code that an LLM produces.

As an aside, I'm seeing more an more crap in PRs. Nonsensical use of language features. Really poorly structured code but that is a different story.

I'm not anti LLMs for coding. I use them too. Especially for unit tests.

carlmr • 7 days ago

So much this, the blank page problem is almost gone. Even if it's riddled with errors.

materiallie • 7 days ago

This is my experience, too. As a concrete example, I'll need to write a mapper function to convert between a protobuf type and Go type. The types are mirror reflections of each other, and I feed the complete APIs of both in my prompt.

I've yet to find an LLM that can reliability generate mapping code between proto.Foo{ID string} to gomodel.Foo{ID string}.

It still saves me time, because even 50% accuracy is still half that I don't have to write myself.

But it makes me feel like I'm taking crazy pills whenever I read about AI hype. I'm open to the idea that I'm prompting wrong, need a better workflow, etc. But I'm not a luddite, I've "reached up and put in the work" and am always trying to learn new tools.

lazyasciiart • 7 days ago

An LLM ability to do a task is roughly correlated to the number of times that task has been done on the internet before. If you want to see the hype version, you need to write a todo web app in typescript or similar. So it's probably not something you can fix with prompts, but having a model with more focus on relevant training data might help.

KTibow • 6 days ago

These days, they'll sometimes also RL on a task if it's easy to validate outputs and if it seems worth the effort.

akoboldfrying • 7 days ago

This honestly seems like something that could be better handled with pre-LLM technology, like a 15-line Perl script that reads one on stdin, applies some crufty regexes, and writes the other to stdout. Are there complexities I'm not seeing?

bsder • 7 days ago

LLMs are a decent search engine a la Google circa 2005.

It's been 20 years since that, so I think people have simply forgotten that a search engine can actually be useful as opposed to ad infested SEO sewage sludge.

The problem is that the conversational interface, for some reason, seems to turn off the natural skepticism that people have when they use a search engine.

AdieuToLogic • 7 days ago

> LLMs are a decent search engine a la Google circa 2005.

Statistical text (token) generation made from an unknown (to the user) training data set is not the same as a keyword/faceted search of arbitrary content acquired from web crawlers.

> The problem is that the conversational interface, for some reason, seems to turn off the natural skepticism that people have when they use a search engine.

For me, my skepticism of using a statistical text generation algorithm as if it were a search engine is because a statistical text generation algorithm is not a search engine.

pixl97 • 7 days ago

Search engines can be really good still if you have a good idea what you're looking for in the domain you're searching.

Search engines can suck when you don't know exactly what you're looking for and the phrases you're using have invited spammers to fill up the first 10 pages.

magicalhippo • 7 days ago

They also suck if you want to find something that's almost exactly like a very common thing, but different in some key aspect.

For example, I wanted to find some texts on solving a partial differential equation numerically using 6th-order or higher finite differences, as I wanted to know how to handle boundry conditions (interior is simple enough).

Searching only turned up the usual low-order methods that I already knew.

Asking some LLMs I got some decent answer and could proceed.

Back in the day you could force the search engines to restrict their search scope, but they all seem so eager to return results at all cost these days, making them useless in niche topics.

notsydonia • 7 days ago

I agree completely. Personally, I actually like the list of links because I like to compare different takes on a topic. It's also fascinating to see how a scientific study propagates through the media or the way the same news story is treated over time, as trends change. I don't want a single mashed-up answer to a question and maybe that makes me weird but more worrying, whenever I've asked a LLM for an answer to a question on a topic I happen to know a LOT about, the response has been either incorrect or inadequate - "there is currently no information collected on that topic" I do like Perplexity for questions like "without any preamble whatsoever, what is the fastest way to remove a <whatever>stain from X material?"

wvenable • 7 days ago

I almost never bother using Google anymore. When I search for something, I'm usually looking for an answer to question. Now I can just ask the question and get the answer without all the other stuff.

I will often ask the LLM to give me web pages to look at it when I want to do further reading.

As LLMs get better, I can't see myself going back to Google as it is or even as it was.

codr7 • 7 days ago

You get an answer.

If that's the answer, or even the best answer, is impossible to tell without doing the research you're trying to avoid.

wvenable • 7 days ago

If I do research, I get an answer. If that's the answer, or even the best answer, it's impossible to tell. When do I stop looking for the best answer?

If ChatGPT needs to, it will actually do the search for me and then collate the results.

lazyasciiart • 7 days ago

By that logic, it's barely worth reading a newspaper or a book. You don't know if they're giving you accurate information without doing all the research you're trying to avoid.

lores • 7 days ago

Recognised newspapers will curate by hiring smart, knowledgeable reporters and funding them to get reliable information. Recognised books will be written by a reliably informed author, and reviewed by other reliably informed people. There are no recognised LLMs, and their method of working precludes reliability.

lazyasciiart • 6 days ago

Malcolm Gladwell, Jonah Lehrer, Daniel Kahneman, Matthew Walker, Stephen Glass? The New York Times, featuring Judith Miller on the existence of WMD, or their award winning podcast "Caliphate"? (Award returned when it became known the whole thing was made up, in case you haven't heard of that one).

lores • 6 days ago

As opposed to a LLM trained on all the Sh1tL0rd69 of the web?

mystified5016 • 7 days ago

Not anymore, not for a long time. There are very few truly reliable and trustworthy sources these days. More and more "recognized" publications are using LLMs. If a "recognized" authority gives you LLM slop, that doesn't make it any more trustworthy.

drob518 • 7 days ago

It’s only a matter of time before Google merges search with Gemini. I don’t think you’ll have to wait long.

johnb231 • 7 days ago

Already happened.

Google search includes an AI generated response.

Gemini prompts return Google search results.

drob518 • 6 days ago

See. They saw my comment and got it done. Dang, that was quick.

codr7 • 7 days ago

Once search engines merge fully with AI, the Internet is over.

otabdeveloper4 • 7 days ago

> Statistical text (token) generation made from an unknown (to the user) training data set is not the same as a keyword/faceted search of arbitrary content acquired from web crawlers.

Well, it's roughly the same under the hood, mathematically.

johnb231 • 7 days ago

All of the current models have access to Google and will do a search (or multiple searches), filter and analyze the results, then present a summary of results with links.

pjmlp • 7 days ago

Except a search engine isn't voice controlled, and able to write code for me.

Recently I did some tests with coding agents, and being able to translate a full application from AT&T Assembly into Intel Assembly compatible with NASM, in about half an hour of talking with agent, and having the end result actually working with minor tweeks isn't something a "decent search engine a la Google circa 2005." would ever been able to achieve.

In the past I would have given such a task to a junior dev or intern, to keep them busy somehow, with a bit more tool maturity I have no reason to do it in the future.

And this is the point many developers haven't yet grasped about their future in the job market.

skydhash • 7 days ago

> being able to translate a full application from AT&T Assembly into Intel Assembly compatible with NASM, [...] isn't something a "decent search engine a la Google circa 2005." would ever been able to achieve

No you would have searched for "difference between at&t assembly and intel assembly", and if not found, the manuals for both and compiling the difference. Then write an awk or perl script to get it done. And if you happens to be good at both assembly versions and awk. I believe that could have been done in less than an hour. Or you could use some vim macros.

> In the past I would have given such a task to a junior dev or intern, to keep them busy somehow, with a bit more tool maturity I have no reason to do it in the future.

The reason to give tasks to junior is to get them to learn more. Or the task needs to be done, but it's not critical. Unless it takes less time to do it than to delegate it to someone else, or you have no junior to guide, it's a good reason to hand out the task to a junior if it will help them grow.

pjmlp • 7 days ago

Except that awk or Perl script is something that would take me more than half an hour from idea to production.

There might not exist a junior to give tasks to, if the amount of available juniors is decreased.

andrekandre • 7 days ago

  > the conversational interface, for some reason, seems to turn off the natural skepticism that people have

n=1 but after having chatgpt "lie" to me more than once i am very skeptical of it and always double check it, whereas something like tv or yt videos i still find myself being click-baited or grifted (iow less skeptical) much more easily still... any large studies about this would be very interesting...

myvoiceismypass • 7 days ago

I get irrationally frustrated when ChatGPT hallucinates npm packages / libraries that simply do not exist.

This happens… weekly for me.

protocolture • 7 days ago

"Hey chatgpt I want to integrate a slidepot into this project"

>from PiicoDev_SlidePot import PiicoDev_SlidePot

Weird how these guys used exactly my terminology when they usually say "Potentiometer"

Went and looked it up, found a resource outlining that it uses the same class as the dial potentiometer.

"Hey chatgpt, I just looked it up and the slidepots actually use the same Potentiometer class as the dialpots."

scurries to fix its stupid mistake

wvenable • 7 days ago

Weird. I used to have that happen when it first came out but I haven't experienced anything like that in a long time. Worst case it's out of date rather than making stuff up.

mhast • 7 days ago

My experience with this is that it is vital to have a system where the system can iterate on its own.

Ideally by having a test or endpoint you can call to actually run the code you want to build.

Then you ask the system to implement the function and run the test. If it hallucinates anything it will find that and fix it.

IME OpenAI is below Claude and Gemini for code.

bdangubic • 7 days ago

just ask it to write and publish them and you good :)

gessha • 7 days ago

Jia Tan will have to work 24/7 :)

floydnoel • 7 days ago

tell it that you won’t accept any new installed packages, use language features only. i have that in my coding prompt i made.

seattle_spring • 7 days ago

This has been my experience as well. The biggest problem is that the answers look plausible, and only after implementation and experimentation do you find them to be wrong. If this happened every once in a while then it wouldn't be a big deal, but I'd guess that more than half of the answers and tutorials I've received through ChatGPT have ended up being plain wrong.

God help us if companies start relying on LLMs for life-or-death stuff like insurance claim decisions.

dabraham1248 • 7 days ago

I'm not sure if you're being sarcastic, but in case you're not... From https://arstechnica.com/health/2023/11/ai-with-90-error-rate...

"UnitedHealth uses AI model with 90% error rate to deny care, lawsuit alleges" Also "The use of faulty AI is not new for the health care industry."

pwdisswordfishz • 7 days ago

> If this happened every once in a while then it wouldn't be a big deal, but I'd guess that more than half of the answers and tutorials I've received through ChatGPT have ended up being plain wrong.

It would actually have been more pernicious that way, since it would lull people into a false sense of security.

Affric • 7 days ago

Yep.

I like maths, I hate graphing. Tedious work even with state of the art libraries and wrappers.

LLMs do it for me. Praise be.

lanstin • 7 days ago

Yeah, I write a lot of little data analysis scripts and stuff, and I am happy just to read the numbers, but now I get nice PNGs of the distributions and so on from LLM, and people like that.

xarope • 7 days ago

I have to upvote this, because this is how I felt after trying three times (that I consciously decided to give an LLM a try, versus having it shoved down my throat by google/ms/meta/etc) and giving up (for now).

therealpygon • 7 days ago

LLMs follow instructions. Garbage in = garbage out generally. When attention is managed and a problem is well defined and necessary materials are available to it, they can perform rather well. On the other hand, I find a lot of the loosely-goosey vibe coding approach to be useless and gives a lot of false impressions about how useful LLMs can be, both too positive and too negative.

GiorgioG • 7 days ago

So what you’re saying is you need to be very specific and detailed when writing your specifications for the LLM to spit out the code you want. Sounds like I can just skip the middle man and code it myself.

AndrewKemendo • 7 days ago

Not in 10 seconds

Zamaamiro • 7 days ago

You probably didn’t write up a detailed prompt with perfect specifications in 10 seconds, either.

In my experience, it doesn’t matter how good or detailed the prompt is—after enough lines of code, the LLM starts making design decisions for you.

This is why I don’t accept LLM completions for anything that isn’t short enough to quickly verify that it is implemented exactly as I would have myself. Usually, that’s boilerplate code.

abalashov • 7 days ago

> This is why I don’t accept LLM completions for anything that isn’t short enough to quickly verify that it is implemented exactly as I would have myself. Usually, that’s boilerplate code.

^ This. This is where I've landed as far as the extent of LLM coding assistants for me.

dingnuts • 7 days ago

I've seen very long prompts that are as long as a school essay and those didn't take ten seconds either

darkwater • 7 days ago

To some extent those fail in the same category of cheaters that put way more effort into cheating an exam than doing it properly. Or people paying 10/15 bucks a month to access a private Usenet server to download pirate content.

anonzzzies • 7 days ago

The advantage of a llm in that case is that you can skip a lot of syntax: make a LOT of typos in your spec, even pseudo code, will result in a working program. Not so with code. Also small logjcal mistakes, messing up left/right, x/y etc are auto fixed, maybe to your frustration if they were not mistakes, but often they are and you won't notice as they are indeed just repaired for you.

therealpygon • 6 days ago

No, but the better specifications you provide to your “development team”, the more likely you are to get what you expected… like always.

troupo • 7 days ago

> LLMs follow instructions.

They don't

> Garbage in = garbage out generally.

Generally, this statement is false

> When attention is managed and a problem is well defined and necessary materials are available to it, they can perform rather well.

Keyword: can.

They can also not perform really well despite all the management and materials.

They can also work really well with loosey-goosey approach.

The reason is that they are non-deterministic systems whose performance is affected more by compute availability than by your unscientific random attempts at reverse engineering their behavior https://dmitriid.com/prompting-llms-is-not-engineering

AndrewKemendo • 7 days ago

This seems to be what’s happened

People are expecting perfection from bad spec

Isn’t that what engineers are (rightfully) always complaining about to BD?

darkwater • 7 days ago

Indeed. But that's the price an automated tool has to pay to take a job from humans' hands. It has to do it better with the same conditions. The same applies to self-driving cars: you don't want an accident rate equals to human drivers. You want two or three orders of magnitude better.

gpm • 7 days ago

This hasn't been my experience (using the latest claude and gemini models). They'll produce poor code even when given a well defined easily achievable task with specific instructions. The code will usually more or less work with today's models, but it will do things like call a function to recreate a value that is already stored in a local variable... (and worse issues prop us the more design-work you leave to the LLM, even dead simple design work with really only one good answer)

I've definitely also found that the poor code can sometimes be a nice starting place. One thing I think it does for me is make me fix it up until it's actually good, instead of write the first thing that comes to mind and declare it good enough (after all my poorly written first draft is of course perfect). In contrast to the usual view of AI assisted coding, I think this style of programming for tedious tasks makes me "less productive" (I take longer) but produces better code.

geraneum • 7 days ago

> LLMs follow instructions.

Not really, not always. To anyone who’s used the latest LLMs extensively, it’s clear that this is not something you can reliably assume even with the constraints you mentioned.

myvoiceismypass • 7 days ago

They should maybe have a verifiable specification for said instructions. Kinda like a programming language maybe!

otabdeveloper4 • 7 days ago

> LLMs follow instructions.

No they don't, they generate a statistically plausible text response given a sequence of tokens.

AndrewKemendo • 7 days ago

Out of curiosity can you give me an example prompt(s) you’ve used and been disappointed

I see these comments all the time and they don’t reflect my experience so I’m curious what your experience has been

anonzzzies • 7 days ago

There are so many examples where all current top models just will loop forever even if you instruct them literally the code. We know many of them, but for instance in a tailwind react project with some degree of complexity (nested components), if you ask for something to scroll in it's space, it will never figure out min-h-0 even if you tell it. It will just loop forever rewriting the code adding and removing things, to the point of it just putting comments like 'This will add overflow' and writing js to force scroll, and it will never work even if you literally tell it what to do. Don't know why, all big and small models have this, and I found Gemini is currently the only model that sometimes randomly has the right idea but then still cannot resolve it. For this we went back to not using tailwind and back to global vanilla css, which I never thought I would say, is rather nice.

auggierose • 7 days ago

This is probably not so much an indictment of the AI, as of that garbage called Tailwind. As somebody here said before, garbage in, garbage out.

anonzzzies • 7 days ago

Yeah, guess so, but we like garbage these days in the industry; nextjs, prisma, npm, react, ts, js, tailwind, babel, the list of inefficient and badly written shite goes on and on; as a commercial person it's impossible to avoid that though as shadcn is the only thing 'the youth' makes apps with now.

Buttons840 • 7 days ago

I asked Chat GPT 4o to write an Emacs function to highlight a line. This involves setting the "mark" at the beginning, and the "point" at the end. It would only set the point, so I corrected it "no, you have to set both", but even after correction it would move the point to the beginning, and then moved the point again to the end, without ever touching the mark.

DevDesmond • 7 days ago

From my experience, (and to borrow terminology from a HN thread not long ago), I've found that once a chat goes bad, your context is "poisoned"; It's auto completing from previous text that is nonsense, so, further text generation from there exist in the world of nonexistent nonsense as well. It's much better to edit your message and try again.

I also think that language matters - An Emacs function is much more esoteric than say, JavaScript, Python, or Java. If I ever find myself looking for help with something that's not in the standard library, I like provide extra context, such as examples from the documentation.