I am responsible for the Ayumi LLM Benchmark. Developing and running the benchmark is something that I do for fun in my spare time. The goal was to have a benchmark that explores the role play chat capabilities of large language models like eg. LLaMA. Unfortunately the Ayumi LLM Benchmark has it's limits in that regard and can only partially do that still - no false promises here. Maybe you still find the data interesting. I recently started to rent GPU time for speeding up the benchmark (also for the 34B to 70B models). And some people asked me if they could donate somehow. Just keep in mind that it can take weeks until I get some time to work on the benchmark again. Because I can only work on the benchmark in my scarce spare time. If you decide to donate: Do it as a gesture for finding the existing benchmark data useful.