Don’t use AI to summarize documents — it’s worse than humans in every way

David Gerard@awful.systems · 2 months ago

Don’t use AI to summarize documents — it’s worse than humans in every way

swlabr@awful.systems · 2 months ago

LLMs, and everyone who uses them to process information:

swlabr@awful.systems · 2 months ago

ATTN: If you’re coming into this thread to say, “The output of AI is bad because your prompts suck,” I’m just proud that you managed to figure out how to use the internet at all. Good job, you!

froztbyte@awful.systems · 2 months ago

remember remember, eternal september

(not that I much agree with the classist overtones of the original, but fuck me does it come to mind often)

Lvxferre@mander.xyz · edit-2 2 months ago

You could use them to know what the text is about, and if it’s worth your reading time. In this situation, it’s fine if the AI makes shit up, as you aren’t reading its output for the information itself anyway; and the distinction between summary and shortened version becomes moot.

However, here’s the catch. If the text is long enough to warrant the question “should I spend my time reading this?”, it should contain an introduction for that very purpose. In other words if the text is well-written you don’t need this sort of “Gemini/ChatGPT, tell me what this text is about” on first place.

EDIT: I’m not addressing documents in this. My bad, I know. [In my defence I’m reading shit in a screen the size of an ant.]

queermunist she/her@lemmy.ml · edit-2 2 months ago

ChatGPT gives you a bad summary full of hallucinations and, as a result, you choose not to read the text based on that summary.

Lvxferre@mander.xyz · 2 months ago

(For clarity I’ll re-emphasise that my top comment is the result of misreading the word “documents” out, so I’m speaking on general grounds about AI “summaries”, not just about AI “summaries” of documents.)

The key here is that the LLM is likely to hallucinate the claims of the text being shortened, but not the topic. So provided that you care about the later but not the former, in order to decide if you’re going to read the whole thing, it’s good enough.

And that is useful in a few situations. For example, if you have a metaphorical pile of a hundred or so scientific papers, and you only need the ones about a specific topic (like “Indo-European urheimat” or “Argiope spiders” or “banana bonds”).

That backtracks to the OP. The issue with using AI summaries for documents is that you typically know the topic at hand, and you want the content instead. That’s bad because then the hallucinations won’t be “harmless”.

queermunist she/her@lemmy.ml · 2 months ago

But the claims of the text are often why you read it in the first place! If you have a hundred scientific papers you’re going to read the ones that make claims either supporting or contradicting your research.

You might as well just skim the titles and guess.

Lvxferre@mander.xyz · 2 months ago

But the claims of the text are often why you read it in the first place!

By “not caring about the former” [claims], I mean in the LLM output, because you know that the LLM will fuck them up. But it’ll still somewhat accurately represent the topic of the text, and you can use this to your advantage.

You might as well just skim the titles and guess.

Nirvana fallacy.

queermunist she/her@lemmy.ml · 2 months ago

Unless it doesn’t accurately represent the topic, which happens, and then a researcher chooses not to read the text based on the chatbot’s summary.

Nirvana fallacy.

All these chatbots do is guess. I’m just saying a researcher might as well cut out the hallucinating middleman.

self@awful.systems · 2 months ago

not reading the fucking sidebar and thinking this is high school debate club fallacy

Lvxferre@mander.xyz · 2 months ago

not reading the fucking sidebar

Yeah, I get that this is a place to vent. And I get why to vent about this. LLMs and other A"I" systems (with quotation marks because this shite is not intelligent!) are being shoved down every bloody where, regardless of actual usefulness, safety, or user desire. Telling you to put glue on your pizza, to eat poisonous mushrooms, that “cherish” has five letters, that Latin had no [w], that the Chinese are inferior to Westerners.

While a crowd of irrationals tell you “it is intelligent, you can’t prove otherwise! CHRUST IT YOU DIRTY SCEPTIC/INFIDEL/LUDDITE REEEE! LALALA I’M PRETENDING TO NOT SEE THE HALLUCINATION LALALA”.

I also get the privacy nightmare that this shit is. And the whole deal behind “we’re using your content as training data, and then selling the result back to you”. Or that it’s eating electricity like there’s no tomorrow, in a planet where global warming is a present issue.

I get it. I get it all. That’s why I’m here. And if you (or anyone else) think that I’m here for any other reason, by all means, check my profile - you’ll find plenty pieces of criticism against those stupid corporate AI takes from vulture capital. (And plenty instances of me calling HN “Redditors LARPing as Hax0rz”. )

However. Pretending that there’s no use case ever for LLMs is the wrong way to go.

and thinking this is high school debate club fallacy

If calling it “nirvana fallacy” rubs you the wrong way, here’s an alternative: “this argument is fucking stupid, in a very specific way: it pretends that either something is perfect or it’s useless, with no middle ground.”

The other user however does not deserve the unnecessary abrasiveness so I’ll keep simply calling it “nirvana fallacy”.

self@awful.systems · 2 months ago

holy shit, imagine getting a second chance to not be a fucking debatelord and doubling down this hard

off you fuck

Home Taping Is Killing Music@mastodon.me.uk · 2 months ago

@lvxferre @dgerard have you bumped your head?

Lvxferre@mander.xyz · 2 months ago

No, it’s just rambling. My bad.

I focused too much on using AI to summarise and ended not talking about it summarising documents, even if the text is about the later.

And… well, the later is such a dumb idea that I don’t feel like telling people “the text is right, don’t do that”, it’s obvious.

David Gerard@awful.systems · 2 months ago

You’d think so, but guess what precise use case LLMs are being pushed hard for.

RagnarokOnline@programming.dev · 2 months ago

I had GPT 3.5 break down 6x 45-minute verbatim interviews into bulleted summaries and it did great. I even asked it to anonymize people’s names and it did that too. I did re-read the summaries to make sure no duplicate info or hallucinations existed and it only needed a couple of corrections.

Beats manually summarizing that info myself.

Maybe their prompt sucks?

Home Taping Is Killing Music@mastodon.me.uk · 2 months ago

@RagnarokOnline @dgerard “They failed to say the magic spells correctly”

froztbyte@awful.systems · 2 months ago

“Are you sure you’re holding it correctly?”

christ, every damn time

Jakeroxs@sh.itjust.works · 2 months ago

That is how tools tend to work, yes.

David Gerard@awful.systems · 2 months ago

we find they tend to post here, though not for long

froztbyte@awful.systems · 2 months ago

it makes me feel fucking ancient to find that this dipshit didn’t seem to get the remark, and it wasn’t even that long ago

istewart@awful.systems · 2 months ago

Jobs is Tech Jesus, but Antennagate is only recorded in one of the apocryphal books

V0ldek@awful.systems · 2 months ago

Said like a person who wouldn’t be able to correctly hold a hammer on first try

Steve@awful.systems · 2 months ago

“tools” doesn’t mean “good”

good tools are designed well enough so it’s clear how they are used, held, or what-fucking-ever.

fuck these simpleton takes are a pain in the arse. They’re always pushed by these idiots that have based their whole world view on fortune cookie aphorisms

David Gerard@awful.systems · 2 months ago

I got AcausalRobotGPT to summarise your post and it said “I’m not saying it’s always programming.dev, but”

TexasDrunk@lemmy.world · 2 months ago

I also use it for that pretty often. I always double check and usually it’s pretty good. Once in a great while it turns the summary into a complete shitshow but I always catch it on a reread, ask a second time, and it fixes things up. My biggest problem is that I’m dragged into too many useless meetings every week and this saves a ton of time over rereading entire transcripts and doing a poor job of summarizing because I have real work to get back to.

I also use it as a rubber duck. It works pretty well if you tell it what it’s doing and tell it to ask questions.

YourNetworkIsHaunted@awful.systems · 2 months ago

Isn’t the whole point of rubber duck debugging that the method works when talking to a literal rubber duck?

self@awful.systems · 2 months ago

what if your rubber duck released just an entire fuckton of CO2 into the environment constantly, even when you weren’t talking to it? surely that means it’s better

z00s@lemmy.world · edit-2 2 months ago

The problem is not the LLMs, but what people are trying to do with them.

They are currently spoons, but people are desperately wishing they were katanas.

They work really well for soup, but they can’t cut steak. But they’re being hyped as super ninja steak knives, and people are getting pissed when they can’t cut steak.

If you give them watery, soupy tasks they can do successfully, they can lighten your workload, as long as you’re aware of what they are and aren’t good at.

What people want LLMs to be able to do, ie. “Steak” tasks:

write complex documents
apply complex knowledge/rules to a situation
Write complex code and create entire programs based on vague description

What LLMs can currently do ie. “Soup” tasks:

check this document and fix all spelling, punctuation and grammatical errors
summarise this paragraph as dot points
write a python program that sorts my photographs into folders based on the year they were taken

Half of Lemmy is hyping katanas, the other half is yelling “Why won’t my spoon cut this steak?!! AI is so dumb!!!”

Update: wow, the pure vitriol pouring out of the replies is just stunning. Seems there are a lot of you out there who have, in one way or another, tied your ego very strongly to either the success or failure of AI.

Take a step back, friends, and go outside for a while.

blakestacey@awful.systems · 2 months ago

I’d offer congratulations on obfuscating a bad claim with a poor analogy, but you didn’t even do that very well.

David Gerard@awful.systems · 2 months ago

more of a Trabant analogy than a Corvette analogy

sc_griffith@awful.systems · 2 months ago

“spoons and katanas” has got to be the most baby brained analogy. are you a child

fuzzzerd@programming.dev · 2 months ago

Who cares? It paints the correct picture and adds useful context.

froztbyte@awful.systems · 2 months ago

you do realize steaks arriving purple or green are bad things, right

self@awful.systems · 2 months ago

they don’t do any of that soup shit reliably either and reading the article might have told you that

z00s@lemmy.world · 2 months ago

They absolutely do, and I have no idea why you’re so angry

self@awful.systems · 2 months ago

hahaha ok fuck off now

FredFig@awful.systems · edit-2 2 months ago

Food analogy

This level of discourse wouldn’t fly on 4chan, how is it so popular with LLM fans?

David Gerard@awful.systems · 2 months ago

needs to be a car analogy

What people want LLMs to do, i.e. Corvette tasks
What LLMs actually do, i.e. Trabant tasks

lightnsfw@reddthat.com · 2 months ago

Ok? I don’t have another human available to skim a shitload of documents for me to find answers I need and I don’t have time to do ot myself. AI is my best option.

s3p5r@lemm.ee · 2 months ago

So long as you don’t care about whether they’re the right or relevant answers, you do you, I guess. Did you use AI to read the linked post too?

jaemo@sh.itjust.works · 2 months ago

Yep. Go ahead and ignore all the cases where it’s getting answers correct and actually helping. We’re all just hallucinating, it’s in no way my lived experience. Your reality is the prime reality and we’re the NPC’s.

fruitdealer@lemmy.world · 2 months ago

And I wish only my good grades counted in school too.

V0ldek@awful.systems · edit-2 2 months ago

Go ahead and ignore all the cases where it’s getting answers correct

Sir, half of the patients are dead!

Ye sure, just ignore the half that survived then!

YourNetworkIsHaunted@awful.systems · 1 month ago

Only it’s even worse because without redoing all the work yourself you can’t even tell which ones are dead or alive.

David Gerard@awful.systems · 2 months ago

sir has failed to achieve the reading comprehension level for this sub

lightnsfw@reddthat.com · 2 months ago

I didn’t read the post at all because its premise is irrelevant to my situation. If I had another human to read documentation for me I would do that. I don’t so the next best thing is AI. I have to double check its findings but it gets me 95% of the way there and saves hours of work. It’s a useful tool.

V0ldek@awful.systems · 2 months ago

This is hall of fame shit right here, someone should study the way you use the internet sir

ebu@awful.systems · 2 months ago

I didn’t read the post at all

rather refreshing to have someone come out and just say it. thank you for the chuckle

self@awful.systems · 2 months ago

we really do need “my source is that I made it the fuck up” for people who aggressively don’t want to read any of the text they’re allegedly commenting on

sc_griffith@awful.systems · 2 months ago

absolutely superb posting, thank you

David Gerard@awful.systems · 2 months ago

everyone, we have a new worst poster