I think teams inherit their metrics rather than choose them.
Lighthouse scores exist because they're easy to automate. So teams end up using them by default. It's not because someone sat down and asked "what should we be tracking," but because the tool was already there.
That's a design decision that got made by default.
To me, the interesting question isn't "how do we improve our score?" I'm more interested in sitting down and figuring out who decided this was the score because the score defines and creates the behaviour.
It also creates a finish line for you. With Lighthouse or anything similar, once you hit 90-something, the work feels done. Again, that's a number someone else decided on. But accessibility isn't a threshold you can say above this, we're good. I see it as something that's ongoing.
Any metric with a clean pass/fail will eventually get treated like a compliance checkbox.
But what if you designed your metrics?
Would it be so bad if the metrics you designed didn't fit neatly into a dashboard? If you couldn't automate them? If there was no final score to display somewhere? Maybe that's the point. Good metrics should force a conversation instead of a glance at a number on some screen.
The metrics you choose are a statement about what you think accessibility is. If you're checking a score before every release, that tells me you think accessibility is a technical property of your product. But if you're tracking who you talked to, what you learned and what you changed because of it, that tells me you think accessibility is an actual human experience.
So look at what you're measuring. Not what you say you care about, but what you track and report on. That's the whole ballgame.