Currently all the metrics events are defined and handled (also hard-coded) in observability.py and I am not sure if there is a better way to emit OT events/metrics thru MAF rather than re-inventing ...
Testing AI systems is hard. Responses are non-deterministic, you need to validate tool usage, and semantic meaning matters more than exact text matching.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results