1-dan master of the unyielding fist of Bayesian inference
5940 stories
·
1 follower

First concrete for US advanced reactor

1 Share
Read the whole story
clumma
10 hours ago
reply
Berkeley, CA
Share this story
Delete

[P] Introducing the Intelligent Document Processing (IDP) Leaderboard – A Unified Benchmark for OCR, KIE, VQA, Table Extraction, and More

1 Share

The most comprehensive benchmark to date for evaluating document understanding capabilities of Vision-Language Models (VLMs).

What is it?
A unified evaluation suite covering 6 core IDP tasks across 16 datasets and 9,229 documents:

  • Key Information Extraction (KIE)
  • Visual Question Answering (VQA)
  • Optical Character Recognition (OCR)
  • Document Classification
  • Table Extraction
  • Long Document Processing (LongDocBench)
  • (Coming soon: Confidence Score Calibration)

Each task uses multiple datasets, including real-world, synthetic, and newly annotated ones.

Highlights from the Benchmark

  • Gemini 2.5 Flash leads overall, but surprisingly underperforms its predecessor on OCR and classification.
  • All models struggled with long document understanding – top score was just 69.08%.
  • Table extraction remains a bottleneck — especially for long, sparse, or unstructured tables.
  • Surprisingly, GPT-4o's performance decreased in the latest version (gpt-4o-2024-11-20) compared to its earlier release (gpt-4o-2024-08-06).
  • Token usage (and thus cost) varies dramatically across models — GPT-4o-mini was the most expensive per request due to high token usage.

Why does this matter?
There’s currently no unified benchmark that evaluates all IDP tasks together — most leaderboards (e.g., OpenVLM, Chatbot Arena) don’t deeply assess document understanding.

Document Variety
We evaluated models on a wide range of documents: Invoices, forms, receipts, charts, tables (structured + unstructured), handwritten docs, and even diacritics texts.

Get Involved
We’re actively updating the benchmark with new models and datasets.

This is developed with collaboration from IIT Indore and Nanonets.

Leaderboard: https://idp-leaderboard.org/
Release blog: https://idp-leaderboard.org/details/
GithHub: https://github.com/NanoNets/docext/tree/main/docext/benchmark

Feel free to share your feedback!

submitted by /u/SouvikMandal to r/MachineLearning
[link] [comments]
Read the whole story
clumma
20 hours ago
reply
Berkeley, CA
Share this story
Delete

WeightWatchers files bankruptcy

1 Share

Article URL: https://www.wsj.com/articles/weightwatchers-files-bankruptcy-to-adapt-to-chemically-induced-weight-loss-future-a63aa8ac

Comments URL: https://news.ycombinator.com/item?id=43916411

Points: 60

# Comments: 216

Read the whole story
clumma
1 day ago
reply
Berkeley, CA
Share this story
Delete

Buffett to step down following six-decade run atop Berkshire

1 Share

Article URL: https://www.bloomberg.com/news/articles/2025-05-03/warren-buffett-to-step-down-from-berkshire-hathaway-at-year-end

Comments URL: https://news.ycombinator.com/item?id=43880973

Points: 341

# Comments: 265

Read the whole story
clumma
4 days ago
reply
Berkeley, CA
Share this story
Delete

Starbase votes in favor of incorporation. 173 ballots were in favor, 4 were against.

1 Share
Starbase votes in favor of incorporation. 173 ballots were in favor, 4 were against. submitted by /u/avboden to r/SpaceXLounge
[link] [comments]
Read the whole story
clumma
5 days ago
reply
Berkeley, CA
Share this story
Delete

Waymo and Toyota outline partnership to advance autonomous driving deployment

1 Share

Article URL: https://waymo.com/blog/2025/04/waymo-and-toyota-outline-strategic-partnership

Comments URL: https://news.ycombinator.com/item?id=43839123

Points: 381

# Comments: 356

Read the whole story
clumma
8 days ago
reply
Berkeley, CA
Share this story
Delete
Next Page of Stories