1-dan master of the unyielding fist of Bayesian inference
5944 stories
·
1 follower

[R] Continuous Thought Machines: neural dynamics as representation.

1 Share
[R] Continuous Thought Machines: neural dynamics as representation.

Try our interactive maze-solving demo: https://pub.sakana.ai/ctm/

Continuous Thought Machines

Hey r/MachineLearning!

We're excited to share our new research on Continuous Thought Machines (CTMs), a novel approach aiming to bridge the gap between computational efficiency and biological plausibility in artificial intelligence. We're sharing this work openly with the community and would love to hear your thoughts and feedback!

What are Continuous Thought Machines?

Most deep learning architectures simplify neural activity by abstracting away temporal dynamics. In our paper, we challenge that paradigm by reintroducing neural timing as a foundational element. The Continuous Thought Machine (CTM) is a model designed to leverage neural dynamics as its core representation.

Core Innovations:

The CTM has two main innovations:

  1. Neuron-Level Temporal Processing: Each neuron uses unique weight parameters to process a history of incoming signals. This moves beyond static activation functions to cultivate richer neuron dynamics.
  2. Neural Synchronization as a Latent Representation: The CTM employs neural synchronization as a direct latent representation for observing data (e.g., through attention) and making predictions. This is a fundamentally new type of representation distinct from traditional activation vectors.

Why is this exciting?

Our research demonstrates that this approach allows the CTM to:

  • Perform a diverse range of challenging tasks: Including image classification, solving 2D mazes, sorting, parity computation, question-answering, and RL tasks.
  • Exhibit rich internal representations: Offering a natural avenue for interpretation due to its internal process.
  • Perform tasks requirin sequential reasoning.
  • Leverage adaptive compute: The CTM can stop earlier for simpler tasks or continue computing for more challenging instances, without needing additional complex loss functions.
  • Build internal maps: For example, when solving 2D mazes, the CTM can attend to specific input data without positional embeddings by forming rich internal maps.
  • Store and retrieve memories: It learns to synchronize neural dynamics to store and retrieve memories beyond its immediate activation history.
  • Achieve strong calibration: For instance, in classification tasks, the CTM showed surprisingly strong calibration, a feature that wasn't explicitly designed for.

Our Goal:

It is crucial to note that our approach advocates for borrowing concepts from biology rather than insisting on strict, literal plausibility. We took inspiration from a critical aspect of biological intelligence: that thought takes time.

The aim of this work is to share the CTM and its associated innovations, rather than solely pushing for new state-of-the-art results. We believe the CTM represents a significant step toward developing more biologically plausible and powerful artificial intelligence systems. We are committed to continuing work on the CTM, given the potential avenues of future work we think it enables.

We encourage you to check out the paper, interactive demos on our project page, and the open-source code repository. We're keen to see what the community builds with it and to discuss the potential of neural dynamics in AI!

submitted by /u/Gramious to r/MachineLearning
[link] [comments]
Read the whole story
clumma
19 hours ago
reply
Berkeley, CA
Share this story
Delete

The Fastest Way Yet to Color Graphs

2 Shares

Here’s a scary scenario: You’ve been put in charge of air traffic control at Newark airport near New York. You need to make sure every plane can taxi between the runway and its gate without hitting any other planes. Let’s bring the power of mathematics to bear on your problem. First, create a big, abstract map of your airport. For each runway, taxiway and gate, mark a point. Then…

Source



Read the whole story
clumma
19 hours ago
reply
Berkeley, CA
Share this story
Delete

Absolute Zero Reasoner

1 Share

Article URL: https://andrewzh112.github.io/absolute-zero-reasoner/

Comments URL: https://news.ycombinator.com/item?id=43922341

Points: 69

# Comments: 14

Read the whole story
clumma
20 hours ago
reply
Berkeley, CA
Share this story
Delete

First concrete for US advanced reactor

1 Share
Read the whole story
clumma
4 days ago
reply
Berkeley, CA
Share this story
Delete

[P] Introducing the Intelligent Document Processing (IDP) Leaderboard – A Unified Benchmark for OCR, KIE, VQA, Table Extraction, and More

1 Share

The most comprehensive benchmark to date for evaluating document understanding capabilities of Vision-Language Models (VLMs).

What is it?
A unified evaluation suite covering 6 core IDP tasks across 16 datasets and 9,229 documents:

  • Key Information Extraction (KIE)
  • Visual Question Answering (VQA)
  • Optical Character Recognition (OCR)
  • Document Classification
  • Table Extraction
  • Long Document Processing (LongDocBench)
  • (Coming soon: Confidence Score Calibration)

Each task uses multiple datasets, including real-world, synthetic, and newly annotated ones.

Highlights from the Benchmark

  • Gemini 2.5 Flash leads overall, but surprisingly underperforms its predecessor on OCR and classification.
  • All models struggled with long document understanding – top score was just 69.08%.
  • Table extraction remains a bottleneck — especially for long, sparse, or unstructured tables.
  • Surprisingly, GPT-4o's performance decreased in the latest version (gpt-4o-2024-11-20) compared to its earlier release (gpt-4o-2024-08-06).
  • Token usage (and thus cost) varies dramatically across models — GPT-4o-mini was the most expensive per request due to high token usage.

Why does this matter?
There’s currently no unified benchmark that evaluates all IDP tasks together — most leaderboards (e.g., OpenVLM, Chatbot Arena) don’t deeply assess document understanding.

Document Variety
We evaluated models on a wide range of documents: Invoices, forms, receipts, charts, tables (structured + unstructured), handwritten docs, and even diacritics texts.

Get Involved
We’re actively updating the benchmark with new models and datasets.

This is developed with collaboration from IIT Indore and Nanonets.

Leaderboard: https://idp-leaderboard.org/
Release blog: https://idp-leaderboard.org/details/
GithHub: https://github.com/NanoNets/docext/tree/main/docext/benchmark

Feel free to share your feedback!

submitted by /u/SouvikMandal to r/MachineLearning
[link] [comments]
Read the whole story
clumma
4 days ago
reply
Berkeley, CA
Share this story
Delete

WeightWatchers files bankruptcy

1 Share

Article URL: https://www.wsj.com/articles/weightwatchers-files-bankruptcy-to-adapt-to-chemically-induced-weight-loss-future-a63aa8ac

Comments URL: https://news.ycombinator.com/item?id=43916411

Points: 60

# Comments: 216

Read the whole story
clumma
5 days ago
reply
Berkeley, CA
Share this story
Delete
Next Page of Stories