Add evaluation results module #3633

hanouticelina · 2025-12-18T16:46:43Z

Add Evaluation Results module to support the Hub's new decentralized evaluation results system: https://huggingface.co/docs/hub/eval-results

This PR introduces:

EvalResultEntry dataclass representing evaluation scores stored in .eval_results/*.yaml files.
eval_result_entries_to_yaml() to serialize entries to the YAML format.
parse_eval_result_entries() to parse YAML data back into EvalResultEntry objects.

This lives in a new module, separate from the existing repocard_data.py which handles the (legacy?) model-index format in README metadata. Backward compatibility is maintained for now.

bot-ci-comment · 2025-12-18T16:52:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Wauplin

Made a first pass and the parsing logic looks good to me 👍

I would also add a simple HfApi.get_eval_results method that takes a repo as input (+token+revision) and returns a list of eval results entries taken from README + .eval_results folder. A bit similar to get_safetensors_metadata that parses high-level info from a repo. I don't think we need a method to upload eval results though.

(I know it's only a draft, happy to review again when ready^^)

src/huggingface_hub/_eval_results.py

…ace_hub into revamp-eval-results

src/huggingface_hub/_eval_results.py

julien-c · 2025-12-19T14:59:40Z

src/huggingface_hub/_eval_results.py

+            entry = EvalResultEntry(
+                dataset_id=dataset["id"],
+                value=item["value"],
+                task_id=dataset.get("task_id"),
+                dataset_revision=dataset.get("revision"),
+                verify_token=item.get("verifyToken"),
+                date=item.get("date"),
+                source_url=source.get("url") if source else None,
+                source_name=source.get("name") if source else None,
+                source_user=source.get("user") if source else None,
+            )
+            entries.append(entry)
+        else:
+            # https://github.com/huggingface/hub-docs/blob/434609e6d09f7c1203ea59fcc32c7ff4d308a68e/modelcard.md?plain=1#L23 format
+            source = item.get("source", {})
+            for metric in item.get("metrics", []):
+                entry = EvalResultEntry(
+                    dataset_id=dataset["type"],
+                    value=metric["value"],
+                    task_id=dataset.get("config"),


maybe those should be two different types (legacy and new)? no strong opinion though

or just keep using EvalResult (the previous type)

on the Hub at least we have two different types

Co-authored-by: Julien Chaumond <[email protected]>

julien-c · 2025-12-19T15:00:28Z

src/huggingface_hub/hf_api.py

            )

+    @validate_hf_hub_args
+    def get_eval_results(


i'm not sure we need this, because this will be exposed by the Hub API

ok nice, we will have to update ModelInfo (and model_info) then

hanouticelina

Added eval_results property to ModelInfo and updated expand docstring in model_info and list_models.
We need the server-side PR (private) to be merged first.

hanouticelina added 3 commits December 18, 2025 15:42

support new eval results format

a350d16

define a new module for eval results?

21d1676

update docs

5ca259f

hanouticelina requested review from Wauplin and julien-c December 18, 2025 16:46

Wauplin reviewed Dec 18, 2025

View reviewed changes

src/huggingface_hub/_eval_results.py Show resolved Hide resolved

src/huggingface_hub/_eval_results.py Show resolved Hide resolved

hanouticelina added 7 commits December 19, 2025 15:18

refactor and add HfApi.get_eval_results()

7493b8d

fix

2cdb10a

Merge branch 'main' into revamp-eval-results

5d30089

docs

0534318

Merge branch 'revamp-eval-results' of github.com:huggingface/huggingf…

96ff235

…ace_hub into revamp-eval-results

comments

4ae9164

docstrings

54d80b5

hanouticelina marked this pull request as ready for review December 19, 2025 14:53

julien-c reviewed Dec 19, 2025

View reviewed changes

src/huggingface_hub/_eval_results.py Outdated Show resolved Hide resolved

julien-c reviewed Dec 19, 2025

View reviewed changes

Update src/huggingface_hub/_eval_results.py

3184502

Co-authored-by: Julien Chaumond <[email protected]>

julien-c reviewed Dec 19, 2025

View reviewed changes

hanouticelina added 3 commits December 19, 2025 16:20

no need for get_eval_results

8e4a6de

Merge branch 'main' into revamp-eval-results

5190831

address comments and add eval_results property to ModelInfo

1a70db0

hanouticelina marked this pull request as draft January 6, 2026 16:37

update list_models and model_info expand param

28386a2

hanouticelina commented Jan 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add evaluation results module #3633

Add evaluation results module #3633

Uh oh!

hanouticelina commented Dec 18, 2025

Uh oh!

bot-ci-comment bot commented Dec 18, 2025

Uh oh!

Wauplin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julien-c Dec 19, 2025

Uh oh!

julien-c Dec 19, 2025

Uh oh!

julien-c Dec 19, 2025

Uh oh!

hanouticelina Dec 19, 2025

Uh oh!

hanouticelina left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add evaluation results module #3633

Are you sure you want to change the base?

Add evaluation results module #3633

Uh oh!

Conversation

hanouticelina commented Dec 18, 2025

Uh oh!

bot-ci-comment bot commented Dec 18, 2025

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julien-c Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

julien-c Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

julien-c Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

hanouticelina Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

hanouticelina left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants