GLAM Document Extraction — Leaderboard (POC)

📊Leaderboard GLAM extractionPOC · silver Task: index cards + forms · score = typed-KIE F1 ×100

Parameters Size

#	Model	Size	Score
🥇	datalab-to/lift	9B	88.5
🥈	numind/NuExtract3	4B	84.5
🥉	MiniMaxAI/MiniMax-M3	MoE	82.4
4	meta-llama/Llama-4-Scout-17B-16E	109B	81.9
5	Qwen/Qwen2.5-VL-72B	72B	71.3
6	Qwen/Qwen3-VL-235B-A22B	235B	65.0
7	Qwen/Qwen3-VL-8B	8B	64.7
8	google/gemma-3-4b-it	4B	45.8

There is no best model. The top two (● green = purpose-built schema-native extractors, lift 9B and NuExtract 4B) beat much larger general VLMs on these cards. Size decouples from quality — an 8B matches a 235B. And every model here is open-weight and self-hostable, so an institution can run extraction on its own hardware and sensitive or licensed collections never leave the building. Use the size filter to see.

Illustrative POC — 7 typed items, silver labels (NuExtract-3 + one reviewer agent; not human-verified, so NuExtract is advantaged). Models run via schema-native Spaces, the HF Inference Providers router, and HF jobs-serving. Not a ranking — a demonstration of the shape. github.com/davanstrien/glam-extract-bench