clumma's blurblog

BullshitBench v2 just came out, and unlike most evals where every new release “wins,” this one says a lot of models are basically not improving at detecting confident nonsense.

What changed in v2:

- 100 new questions

- domain split: coding (40), medical (15), legal (15), finance (15), physics (15)

- 70+ model variants tested

- fully open: questions, scripts, responses, judgments

Main takeaways:

- Anthropic’s latest models are crushing it

- Qwen is also very strong

- OpenAI + Google models reportedly still struggling here

- domain barely changes outcomes (BS detection is similarly hard across fields)

- reasoning mode doesn’t help much, maybe even hurts

- newer model ≠ better model on this task

Data explorer is honestly the best part you can inspect question-by-question and see where models confidently hallucinate.

Links:

- https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html

- https://github.com/petergpt/bullshit-benchmark

Curious what people think this is actually measuring: calibration? epistemic humility? something else?

Because whatever it is, most models still look shaky.

submitted by /u/snakemas to r/CompetitiveAI
[link] [comments]

Read the whole story

clumma

5 days ago

reply

Berkeley, CA

France to increase nuclear arsenal, stop sharing warhead numbers, and potentially deploy weapons across Europe by /u/Afrogthatribbits
Wednesday March 4^th, 2026 at 11:07 AM

upvoted by clumma

France to increase nuclear arsenal, stop sharing warhead numbers, and potentially deploy weapons across Europe

In a speech at the SSBN base in Ile Longue, French President Macron said that due to "an increasing risk of conflicts globally crossing the nuclear threshold" France would increase their nuclear arsenal and will "no longer communicate the number of nuclear warheads."

France also plans to potentially deploy French nuclear forces in other countries, and have invited Germany, Greece, Poland, the Netherlands, Belgium, and Denmark to participate in nuclear drills. The US currently already deploys weapons across several European countries under a so-called nuclear umbrella.

France currently has an estimated 290 warheads, the UK ~225, while the US and Russia both have well over 5,000.

https://www.reuters.com/world/europe/macron-says-france-will-increase-size-its-nuclear-arsenal-2026-03-02/

https://www.wsj.com/world/europe/france-floats-nuclear-deployment-across-europe-056a5cbc

submitted by /u/Afrogthatribbits to r/nuclearweapons
[link] [comments]

Read the whole story

clumma

5 days ago

reply

Berkeley, CA

NRC issues first commercial reactor construction approval in 10 years [pdf] by Anon84 Saturday March 7th, 2026 at 7:56 PM

Apple Studio Display and Studio Display XDR by victorbjorklund Saturday March 7th, 2026 at 7:56 PM

AI-generated art can’t be copyrighted after Supreme Court declines review by duggan Saturday March 7th, 2026 at 7:56 PM

Claude’s Cycles - Don Knuth by /u/mttd Wednesday March 4th, 2026 at 11:07 AM

BullshitBench v2 dropped and… most models still can’t smell BS (Claude mostly can) by /u/snakemas Wednesday March 4th, 2026 at 11:07 AM

France to increase nuclear arsenal, stop sharing warhead numbers, and potentially deploy weapons across Europe by /u/Afrogthatribbits Wednesday March 4th, 2026 at 11:07 AM

NRC issues first commercial reactor construction approval in 10 years [pdf] by Anon84
Saturday March 7^th, 2026 at 7:56 PM

Apple Studio Display and Studio Display XDR by victorbjorklund
Saturday March 7^th, 2026 at 7:56 PM

AI-generated art can’t be copyrighted after Supreme Court declines review by duggan
Saturday March 7^th, 2026 at 7:56 PM

Claude’s Cycles - Don Knuth by /u/mttd
Wednesday March 4^th, 2026 at 11:07 AM

BullshitBench v2 dropped and… most models still can’t smell BS (Claude mostly can) by /u/snakemas
Wednesday March 4^th, 2026 at 11:07 AM

France to increase nuclear arsenal, stop sharing warhead numbers, and potentially deploy weapons across Europe by /u/Afrogthatribbits
Wednesday March 4^th, 2026 at 11:07 AM