1-dan master of the unyielding fist of Bayesian inference
6228 stories
·
1 follower

Claude’s Cycles - Don Knuth

1 Share
submitted by /u/mttd to r/compsci
[link] [comments]
Read the whole story
clumma
3 hours ago
reply
Berkeley, CA
Share this story
Delete

BullshitBench v2 dropped and… most models still can’t smell BS (Claude mostly can)

1 Share

This benchmark is kind of brutal.

BullshitBench v2 just came out, and unlike most evals where every new release “wins,” this one says a lot of models are basically not improving at detecting confident nonsense.

What changed in v2:

- 100 new questions

- domain split: coding (40), medical (15), legal (15), finance (15), physics (15)

- 70+ model variants tested

- fully open: questions, scripts, responses, judgments

Main takeaways:

- Anthropic’s latest models are crushing it

- Qwen is also very strong

- OpenAI + Google models reportedly still struggling here

- domain barely changes outcomes (BS detection is similarly hard across fields)

- reasoning mode doesn’t help much, maybe even hurts

- newer model ≠ better model on this task

Data explorer is honestly the best part you can inspect question-by-question and see where models confidently hallucinate.

Links:

- https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html

- https://github.com/petergpt/bullshit-benchmark

Curious what people think this is actually measuring: calibration? epistemic humility? something else?

Because whatever it is, most models still look shaky.

submitted by /u/snakemas to r/CompetitiveAI
[link] [comments]
Read the whole story
clumma
3 hours ago
reply
Berkeley, CA
Share this story
Delete

France to increase nuclear arsenal, stop sharing warhead numbers, and potentially deploy weapons across Europe

1 Share
France to increase nuclear arsenal, stop sharing warhead numbers, and potentially deploy weapons across Europe

In a speech at the SSBN base in Ile Longue, French President Macron said that due to "an increasing risk of conflicts globally crossing the nuclear threshold" France would increase their nuclear arsenal and will "no longer communicate the number of nuclear warheads."

France also plans to potentially deploy French nuclear forces in other countries, and have invited Germany, Greece, Poland, the Netherlands, Belgium, and Denmark to participate in nuclear drills. The US currently already deploys weapons across several European countries under a so-called nuclear umbrella.

France currently has an estimated 290 warheads, the UK ~225, while the US and Russia both have well over 5,000.

https://www.reuters.com/world/europe/macron-says-france-will-increase-size-its-nuclear-arsenal-2026-03-02/

https://www.wsj.com/world/europe/france-floats-nuclear-deployment-across-europe-056a5cbc

submitted by /u/Afrogthatribbits to r/nuclearweapons
[link] [comments]
Read the whole story
clumma
3 hours ago
reply
Berkeley, CA
Share this story
Delete

OpenClaw Surpasses React to Become the Most-Starred Software Project on GitHub

1 Share

Article URL: https://www.star-history.com/blog/openclaw-surpasses-react-most-starred-software

Comments URL: https://news.ycombinator.com/item?id=47217812

Points: 192

# Comments: 192

Read the whole story
clumma
1 day ago
reply
Berkeley, CA
Share this story
Delete

Iran's Ayatollah Ali Khamenei is killed in Israeli strike, ending 36-year rule

1 Share

Article URL: https://www.npr.org/2026/02/28/1123499337/iran-israel-ayatollah-ali-khamenei-killed

Comments URL: https://news.ycombinator.com/item?id=47200879

Points: 192

# Comments: 229

Read the whole story
clumma
3 days ago
reply
Berkeley, CA
Share this story
Delete

OpenAI raises $110B on $730B pre-money valuation

1 Share

https://openai.com/index/scaling-ai-for-everyone/

https://x.com/sama/status/2027386252555919386

https://xcancel.com/sama/status/2027386252555919386


Comments URL: https://news.ycombinator.com/item?id=47181211

Points: 553

# Comments: 578

Read the whole story
clumma
3 days ago
reply
Berkeley, CA
Share this story
Delete
Next Page of Stories