Skip to content

Conversation

@hanouticelina
Copy link
Contributor

Add Evaluation Results module to support the Hub's new decentralized evaluation results system: https://huggingface.co/docs/hub/eval-results

This PR introduces:

  • EvalResultEntry dataclass representing evaluation scores stored in .eval_results/*.yaml files.
  • eval_result_entries_to_yaml() to serialize entries to the YAML format.
  • parse_eval_result_entries() to parse YAML data back into EvalResultEntry objects.

This lives in a new module, separate from the existing repocard_data.py which handles the (legacy?) model-index format in README metadata. Backward compatibility is maintained for now.

@bot-ci-comment
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass and the parsing logic looks good to me 👍

I would also add a simple HfApi.get_eval_results method that takes a repo as input (+token+revision) and returns a list of eval results entries taken from README + .eval_results folder. A bit similar to get_safetensors_metadata that parses high-level info from a repo. I don't think we need a method to upload eval results though.

(I know it's only a draft, happy to review again when ready^^)

@hanouticelina hanouticelina marked this pull request as ready for review December 19, 2025 14:53
Comment on lines 196 to 215
entry = EvalResultEntry(
dataset_id=dataset["id"],
value=item["value"],
task_id=dataset.get("task_id"),
dataset_revision=dataset.get("revision"),
verify_token=item.get("verifyToken"),
date=item.get("date"),
source_url=source.get("url") if source else None,
source_name=source.get("name") if source else None,
source_user=source.get("user") if source else None,
)
entries.append(entry)
else:
# https://github.com/huggingface/hub-docs/blob/434609e6d09f7c1203ea59fcc32c7ff4d308a68e/modelcard.md?plain=1#L23 format
source = item.get("source", {})
for metric in item.get("metrics", []):
entry = EvalResultEntry(
dataset_id=dataset["type"],
value=metric["value"],
task_id=dataset.get("config"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe those should be two different types (legacy and new)? no strong opinion though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just keep using EvalResult (the previous type)

on the Hub at least we have two different types

)

@validate_hf_hub_args
def get_eval_results(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure we need this, because this will be exposed by the Hub API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok nice, we will have to update ModelInfo (and model_info) then

@hanouticelina hanouticelina marked this pull request as draft January 6, 2026 16:37
Copy link
Contributor Author

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added eval_results property to ModelInfo and updated expand docstring in model_info and list_models.
We need the server-side PR (private) to be merged first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants