-
Notifications
You must be signed in to change notification settings - Fork 476
prototype supporting extra fields in llmobs evaluator return #16007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codeowners resolved as |
Performance SLOsComparing candidate christopher.fox/llmobs-evaluator-return-extra-fields (e6e98b3) with baseline main (624ef30) 📈 Performance Regressions (3 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 103.379µs (SLO: <130.000µs 📉 -20.5%) vs baseline: +2.2% Memory: ✅ 42.684MB (SLO: <43.250MB 🟡 -1.3%) vs baseline: +4.8% ✅ add_inplace_aspectTime: ✅ 101.855µs (SLO: <130.000µs 📉 -21.6%) vs baseline: +0.8% Memory: ✅ 42.703MB (SLO: <43.250MB 🟡 -1.3%) vs baseline: +5.0% ✅ add_inplace_noaspectTime: ✅ 28.500µs (SLO: <40.000µs 📉 -28.8%) vs baseline: +0.6% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.9% ✅ add_noaspectTime: ✅ 48.585µs (SLO: <70.000µs 📉 -30.6%) vs baseline: -0.1% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.7% ✅ bytearray_aspectTime: ✅ 260.052µs (SLO: <400.000µs 📉 -35.0%) vs baseline: +0.8% Memory: ✅ 42.743MB (SLO: <43.500MB 🟡 -1.7%) vs baseline: +5.3% ✅ bytearray_extend_aspectTime: ✅ 650.687µs (SLO: <800.000µs 📉 -18.7%) vs baseline: -0.4% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.8% ✅ bytearray_extend_noaspectTime: ✅ 267.993µs (SLO: <400.000µs 📉 -33.0%) vs baseline: -0.6% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.1% ✅ bytearray_noaspectTime: ✅ 140.841µs (SLO: <300.000µs 📉 -53.1%) vs baseline: +1.1% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ bytes_aspectTime: ✅ 224.373µs (SLO: <300.000µs 📉 -25.2%) vs baseline: +0.7% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.1% ✅ bytes_noaspectTime: ✅ 134.773µs (SLO: <200.000µs 📉 -32.6%) vs baseline: +0.5% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +4.9% ✅ bytesio_aspectTime: ✅ 3.849ms (SLO: <5.000ms 📉 -23.0%) vs baseline: +0.4% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.5% ✅ bytesio_noaspectTime: ✅ 322.885µs (SLO: <420.000µs 📉 -23.1%) vs baseline: +0.4% Memory: ✅ 42.703MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.1% ✅ capitalize_aspectTime: ✅ 89.774µs (SLO: <300.000µs 📉 -70.1%) vs baseline: -0.6% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ capitalize_noaspectTime: ✅ 249.514µs (SLO: <300.000µs 📉 -16.8%) vs baseline: -0.8% Memory: ✅ 42.743MB (SLO: <43.500MB 🟡 -1.7%) vs baseline: +5.2% ✅ casefold_aspectTime: ✅ 93.003µs (SLO: <500.000µs 📉 -81.4%) vs baseline: +3.0% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ casefold_noaspectTime: ✅ 310.407µs (SLO: <500.000µs 📉 -37.9%) vs baseline: +0.8% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +4.8% ✅ decode_aspectTime: ✅ 87.591µs (SLO: <100.000µs 📉 -12.4%) vs baseline: ~same Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +4.9% ✅ decode_noaspectTime: ✅ 153.608µs (SLO: <210.000µs 📉 -26.9%) vs baseline: +0.5% Memory: ✅ 42.723MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.1% ✅ encode_aspectTime: ✅ 84.999µs (SLO: <200.000µs 📉 -57.5%) vs baseline: -0.2% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ encode_noaspectTime: ✅ 138.151µs (SLO: <200.000µs 📉 -30.9%) vs baseline: +0.5% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +4.7% ✅ format_aspectTime: ✅ 14.739ms (SLO: <19.200ms 📉 -23.2%) vs baseline: +0.3% Memory: ✅ 42.880MB (SLO: <43.250MB 🟡 -0.9%) vs baseline: +5.0% ✅ format_map_aspectTime: ✅ 16.477ms (SLO: <21.500ms 📉 -23.4%) vs baseline: ~same Memory: ✅ 42.900MB (SLO: <43.500MB 🟡 -1.4%) vs baseline: +4.9% ✅ format_map_noaspectTime: ✅ 368.539µs (SLO: <500.000µs 📉 -26.3%) vs baseline: +0.6% Memory: ✅ 42.566MB (SLO: <43.250MB 🟡 -1.6%) vs baseline: +4.9% ✅ format_noaspectTime: ✅ 305.213µs (SLO: <500.000µs 📉 -39.0%) vs baseline: -0.4% Memory: ✅ 42.625MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +5.0% ✅ index_aspectTime: ✅ 124.048µs (SLO: <300.000µs 📉 -58.7%) vs baseline: +0.1% Memory: ✅ 42.684MB (SLO: <43.250MB 🟡 -1.3%) vs baseline: +4.9% ✅ index_noaspectTime: ✅ 40.331µs (SLO: <300.000µs 📉 -86.6%) vs baseline: -0.7% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ join_aspectTime: ✅ 220.212µs (SLO: <300.000µs 📉 -26.6%) vs baseline: +0.9% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ join_noaspectTime: ✅ 148.815µs (SLO: <300.000µs 📉 -50.4%) vs baseline: +0.4% Memory: ✅ 42.625MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +4.9% ✅ ljust_aspectTime: ✅ 501.516µs (SLO: <700.000µs 📉 -28.4%) vs baseline: +0.9% Memory: ✅ 42.644MB (SLO: <43.250MB 🟡 -1.4%) vs baseline: +4.6% ✅ ljust_noaspectTime: ✅ 257.331µs (SLO: <300.000µs 📉 -14.2%) vs baseline: ~same Memory: ✅ 42.723MB (SLO: <43.250MB 🟡 -1.2%) vs baseline: +5.1% ✅ lower_aspectTime: ✅ 305.229µs (SLO: <500.000µs 📉 -39.0%) vs baseline: ~same Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +4.8% ✅ lower_noaspectTime: ✅ 234.944µs (SLO: <300.000µs 📉 -21.7%) vs baseline: ~same Memory: ✅ 42.684MB (SLO: <43.250MB 🟡 -1.3%) vs baseline: +5.1% ✅ lstrip_aspectTime: ✅ 0.336ms (SLO: <3.000ms 📉 -88.8%) vs baseline: 📈 +22.5% Memory: ✅ 42.605MB (SLO: <43.250MB 🟡 -1.5%) vs baseline: +4.8% ✅ lstrip_noaspectTime: ✅ 0.179ms (SLO: <3.000ms 📉 -94.0%) vs baseline: -0.8% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.6% ✅ modulo_aspectTime: ✅ 14.353ms (SLO: <18.750ms 📉 -23.4%) vs baseline: +0.3% Memory: ✅ 42.743MB (SLO: <43.500MB 🟡 -1.7%) vs baseline: +4.7% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 14.885ms (SLO: <19.350ms 📉 -23.1%) vs baseline: +0.3% Memory: ✅ 42.821MB (SLO: <43.500MB 🟡 -1.6%) vs baseline: +4.9% ✅ modulo_aspect_for_bytesTime: ✅ 14.491ms (SLO: <18.900ms 📉 -23.3%) vs baseline: -0.2% Memory: ✅ 42.762MB (SLO: <43.500MB 🟡 -1.7%) vs baseline: +5.1% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.669ms (SLO: <19.150ms 📉 -23.4%) vs baseline: ~same Memory: ✅ 42.841MB (SLO: <43.500MB 🟡 -1.5%) vs baseline: +4.8% ✅ modulo_noaspectTime: ✅ 0.359ms (SLO: <3.000ms 📉 -88.0%) vs baseline: +0.3% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.1% ✅ replace_aspectTime: ✅ 19.068ms (SLO: <24.000ms 📉 -20.6%) vs baseline: +2.9% Memory: ✅ 42.900MB (SLO: <44.000MB -2.5%) vs baseline: +5.1% ✅ replace_noaspectTime: ✅ 286.722µs (SLO: <300.000µs -4.4%) vs baseline: +0.7% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.9% ✅ repr_aspectTime: ✅ 317.765µs (SLO: <420.000µs 📉 -24.3%) vs baseline: -0.4% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ repr_noaspectTime: ✅ 47.105µs (SLO: <90.000µs 📉 -47.7%) vs baseline: +1.0% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +5.0% ✅ rstrip_aspectTime: ✅ 379.581µs (SLO: <500.000µs 📉 -24.1%) vs baseline: -0.2% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.6% ✅ rstrip_noaspectTime: ✅ 182.455µs (SLO: <300.000µs 📉 -39.2%) vs baseline: +0.6% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +5.0% ✅ slice_aspectTime: ✅ 181.575µs (SLO: <300.000µs 📉 -39.5%) vs baseline: ~same Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ slice_noaspectTime: ✅ 53.984µs (SLO: <90.000µs 📉 -40.0%) vs baseline: ~same Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ stringio_aspectTime: ✅ 3.885ms (SLO: <5.000ms 📉 -22.3%) vs baseline: -0.5% Memory: ✅ 42.743MB (SLO: <43.500MB 🟡 -1.7%) vs baseline: +5.1% ✅ stringio_noaspectTime: ✅ 356.809µs (SLO: <500.000µs 📉 -28.6%) vs baseline: -0.1% Memory: ✅ 42.585MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% ✅ strip_aspectTime: ✅ 271.377µs (SLO: <350.000µs 📉 -22.5%) vs baseline: ~same Memory: ✅ 42.684MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +4.7% ✅ strip_noaspectTime: ✅ 177.595µs (SLO: <240.000µs 📉 -26.0%) vs baseline: -0.2% Memory: ✅ 42.703MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.1% ✅ swapcase_aspectTime: ✅ 342.432µs (SLO: <500.000µs 📉 -31.5%) vs baseline: -1.4% Memory: ✅ 42.802MB (SLO: <43.500MB 🟡 -1.6%) vs baseline: +5.3% ✅ swapcase_noaspectTime: ✅ 276.117µs (SLO: <400.000µs 📉 -31.0%) vs baseline: +1.2% Memory: ✅ 42.625MB (SLO: <43.500MB -2.0%) vs baseline: +4.8% ✅ title_aspectTime: ✅ 330.272µs (SLO: <500.000µs 📉 -33.9%) vs baseline: -0.3% Memory: ✅ 42.684MB (SLO: <43.000MB 🟡 -0.7%) vs baseline: +4.9% ✅ title_noaspectTime: ✅ 258.337µs (SLO: <400.000µs 📉 -35.4%) vs baseline: +0.1% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.8% ✅ translate_aspectTime: ✅ 549.282µs (SLO: <700.000µs 📉 -21.5%) vs baseline: +9.4% Memory: ✅ 42.664MB (SLO: <43.500MB 🟡 -1.9%) vs baseline: +5.2% ✅ translate_noaspectTime: ✅ 423.499µs (SLO: <500.000µs 📉 -15.3%) vs baseline: -2.2% Memory: ✅ 42.644MB (SLO: <43.500MB 🟡 -2.0%) vs baseline: +4.9% ✅ upper_aspectTime: ✅ 304.913µs (SLO: <500.000µs 📉 -39.0%) vs baseline: -0.4% Memory: ✅ 42.703MB (SLO: <43.500MB 🟡 -1.8%) vs baseline: +5.1% ✅ upper_noaspectTime: ✅ 233.446µs (SLO: <400.000µs 📉 -41.6%) vs baseline: -0.3% Memory: ✅ 42.605MB (SLO: <43.500MB -2.1%) vs baseline: +4.7% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 503.973µs (SLO: <700.000µs 📉 -28.0%) vs baseline: 📈 +19.2% Memory: ✅ 42.507MB (SLO: <43.500MB -2.3%) vs baseline: +5.0% ✅ ospathbasename_noaspectTime: ✅ 423.566µs (SLO: <700.000µs 📉 -39.5%) vs baseline: -1.0% Memory: ✅ 42.566MB (SLO: <43.500MB -2.1%) vs baseline: +5.1% ✅ ospathjoin_aspectTime: ✅ 624.459µs (SLO: <700.000µs 📉 -10.8%) vs baseline: ~same Memory: ✅ 42.389MB (SLO: <43.500MB -2.6%) vs baseline: +4.9% ✅ ospathjoin_noaspectTime: ✅ 632.603µs (SLO: <700.000µs -9.6%) vs baseline: ~same Memory: ✅ 42.330MB (SLO: <43.500MB -2.7%) vs baseline: +4.6% ✅ ospathnormcase_aspectTime: ✅ 349.809µs (SLO: <700.000µs 📉 -50.0%) vs baseline: -0.7% Memory: ✅ 42.330MB (SLO: <43.500MB -2.7%) vs baseline: +5.1% ✅ ospathnormcase_noaspectTime: ✅ 358.646µs (SLO: <700.000µs 📉 -48.8%) vs baseline: -0.6% Memory: ✅ 42.448MB (SLO: <43.500MB -2.4%) vs baseline: +4.8% ✅ ospathsplit_aspectTime: ✅ 485.818µs (SLO: <700.000µs 📉 -30.6%) vs baseline: -0.7% Memory: ✅ 42.369MB (SLO: <43.500MB -2.6%) vs baseline: +4.8% ✅ ospathsplit_noaspectTime: ✅ 497.177µs (SLO: <700.000µs 📉 -29.0%) vs baseline: -0.4% Memory: ✅ 42.349MB (SLO: <43.500MB -2.6%) vs baseline: +4.7% ✅ ospathsplitdrive_aspectTime: ✅ 373.431µs (SLO: <700.000µs 📉 -46.7%) vs baseline: +0.8% Memory: ✅ 42.389MB (SLO: <43.500MB -2.6%) vs baseline: +5.0% ✅ ospathsplitdrive_noaspectTime: ✅ 72.912µs (SLO: <700.000µs 📉 -89.6%) vs baseline: -1.6% Memory: ✅ 42.408MB (SLO: <43.500MB -2.5%) vs baseline: +5.1% ✅ ospathsplitext_aspectTime: ✅ 456.477µs (SLO: <700.000µs 📉 -34.8%) vs baseline: -0.8% Memory: ✅ 42.369MB (SLO: <43.500MB -2.6%) vs baseline: +5.1% ✅ ospathsplitext_noaspectTime: ✅ 462.082µs (SLO: <700.000µs 📉 -34.0%) vs baseline: -0.6% Memory: ✅ 42.349MB (SLO: <43.500MB -2.6%) vs baseline: +4.4% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.439µs (SLO: <20.000µs 📉 -82.8%) vs baseline: 📈 +15.2% Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.8% ✅ 1-count-metrics-100-timesTime: ✅ 199.819µs (SLO: <220.000µs -9.2%) vs baseline: +0.2% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.0% ✅ 1-distribution-metric-1-timesTime: ✅ 3.359µs (SLO: <20.000µs 📉 -83.2%) vs baseline: +0.6% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.0% ✅ 1-distribution-metrics-100-timesTime: ✅ 215.544µs (SLO: <230.000µs -6.3%) vs baseline: +1.6% Memory: ✅ 34.977MB (SLO: <35.500MB 🟡 -1.5%) vs baseline: +4.9% ✅ 1-gauge-metric-1-timesTime: ✅ 2.207µs (SLO: <20.000µs 📉 -89.0%) vs baseline: +0.6% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.4% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.889µs (SLO: <150.000µs -8.7%) vs baseline: -0.8% Memory: ✅ 34.957MB (SLO: <35.500MB 🟡 -1.5%) vs baseline: +4.9% ✅ 1-rate-metric-1-timesTime: ✅ 3.169µs (SLO: <20.000µs 📉 -84.2%) vs baseline: +0.4% Memory: ✅ 34.937MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +5.0% ✅ 1-rate-metrics-100-timesTime: ✅ 211.887µs (SLO: <250.000µs 📉 -15.2%) vs baseline: +0.1% Memory: ✅ 35.016MB (SLO: <35.500MB 🟡 -1.4%) vs baseline: +4.9% ✅ 100-count-metrics-100-timesTime: ✅ 20.041ms (SLO: <22.000ms -8.9%) vs baseline: +0.6% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.8% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.213ms (SLO: <2.550ms 📉 -13.2%) vs baseline: -0.6% Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.9% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.415ms (SLO: <1.550ms -8.7%) vs baseline: +0.9% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.9% ✅ 100-rate-metrics-100-timesTime: ✅ 2.178ms (SLO: <2.550ms 📉 -14.6%) vs baseline: -0.4% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.6% ✅ flush-1-metricTime: ✅ 4.505µs (SLO: <20.000µs 📉 -77.5%) vs baseline: ~same Memory: ✅ 35.212MB (SLO: <35.500MB 🟡 -0.8%) vs baseline: +4.9% ✅ flush-100-metricsTime: ✅ 173.132µs (SLO: <250.000µs 📉 -30.7%) vs baseline: -0.9% Memory: ✅ 35.212MB (SLO: <35.500MB 🟡 -0.8%) vs baseline: +4.7% ✅ flush-1000-metricsTime: ✅ 2.186ms (SLO: <2.500ms 📉 -12.6%) vs baseline: -0.2% Memory: ✅ 36.038MB (SLO: <36.500MB 🟡 -1.3%) vs baseline: +4.8% 🟡 Near SLO Breach (15 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
|
|
||
|
|
||
| def dummy_evaluator_with_extra_return_values(input_data, output_data, expected_output): | ||
| return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why dict instead of a class (where we could check the type to control the logic instead of try_dict_access)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was noticing that we seem to be keeping number of things that sort of "public" that we intend users to import to a minimum, e.g. https://github.com/DataDog/dd-trace-py/blob/main/ddtrace/llmobs/__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but yeah, class would be more explicit... and Fouad is already going that way
Description
Testing
Risks
Additional Notes