As the title suggests, basically i have a few LLM models and wanted to see how they perform with different hardware (Cpus only instances, gpus - t4, v100, a100). Ideally it’s to get an idea on the performance and overall price(vm hourly rate/ efficiency)
Currently I’ve written a script to calculate ms per token, ram usage(memory profiler), total time taken.
Wanted to check if there are better methods or tools. Thanks!
Thanks. Does this also conduct compute benchmarks too? Looks like this is more focused on model accuracy (if I’m not wrong)
seems like, keep an eye, when i run across one I will post it, usually to the model’s community.
sure, thank you!