@Aieres No. I’m certainly not as much of an expert as a DeepMind employee, but I use, and have hand coded, some ML algorithms in my research.
You can’t teach an AI to play based on replays on a patch that hasn’t gone out yet, there won’t be any… Furthermore, the replay based learning is only the first step in training the current AlphaStar AI.
Secondly, you are far too optimistic with your guess at how the AI will react to patches. Your comment about massive game-breaking changes being the same as minor ones is exactly the problem. The learning often gets stuck in local minima. If you’re optimizing on a non-convex surface, you have no guarantee that the strategies you learn are optimal, and the agent can stagnate at non-optimal strategies. This is the point I think you’re missing. ML algorithms usually stop meaningfully learning after some amount of time, and in fact letting them run too long can actually reduce their effectiveness.
There are various tricks that ML people attempt to use to get around being stuck in local minima (like adding second derivative information, trying a variety of cost functions, etc), but none of these guarantee reaching a global minimum as opposed to a local one.