The solution is two opposite systems.
I’m a systems design engineer with over a decade of experience in software development alone (though I transitioned to neuroscience and AI research now) and I wrote up a quick solution to your problems. I really wish you guys would read it.
The issue, I believe, is that you started from an ELO design perspective at the beginning which was just utterly wrong from any scientific perspective, however there is an important issue of capturing complexities of teamwork only available from the win/loss perspective found in the ELO approach, which must still be considered. Personal metrics are generally too ignorant of - and even counter to - good team-play. However, at the bottom, team effects are almost non-existent and personal metrics are truly the meaningful effect to be captured.
Your MMR design perspective must flip at low ranks vs high ranks.
Low long term skill, low amounts of games played, new players, and smurfing at low ranks, together create deviations to large to be used. There’s too much chaos (non-reliable effects) at low ranks and that chaos can seep all the way up to Gold rank. (and possibly have aftershocks in even higher ranks)
In the design of Microsoft’s “TrueSkill” system they addressed issues of regression and long term (processing heavy) re-comparison etc that can suss out actual personal skill from small sets of data. Effects which are almost utterly washed out and proven to be (usually) barely better than random with anything less than enormous datasets per individual when attempting to use purely win/loss comparisons. (as I know you are)
The science is in. Every scientific paper on it shows that ELO methods for ranking individuals in team games is almost purely non-functional.
So, the solution I’ve proposed is customized to solve for your (less than optimal) design perspective and entirely for the purpose of reducing low rank chaos that constantly introduces widespread instability to your overarching system.
For low rank players you must use more basic personal metrics and almost wholly ignore win/loss but then slowly switch to win/loss as they rise through the ranks. (almost ignoring personal metrics at higher Ranks like currently already do)
Almost entirely personal metrics at the bottom ranks and almost entirely win/loss at middle and higher ranks.
It will solve all these problems without having to completely design a whole new “TrueSkill” type of system. It will work with your ELO-like design perspective.
For those interested in the science this is a good article: [add https://]tht[dot]fangraphs[dot]com/elo-vs-regression-to-the-mean-a-theoretical-comparison/