HotS at 4K resolution

DrSuperGood-1327 · January 1, 2021, 11:45pm

It helps reduce the impact of. It cannot mitigate. If your data set is larger than the cache size it will still have to fetch it from main memory, during which time execution will stall. As such with some workloads which inherently are cache-miss heavy the memory access latency makes a huge difference. It does not even require many misses to hold up the CPU execution to fall behind performance wise due to how long memory access is.

There are various reasons why Zen3 is faster. The unified cache per CCD (since it does not actually have more cache in total…) is just one part that helps. Others include AMD implementing some vector instructions efficiently (some bit manipulation ones take 1/3 cycles where as on zen2 they took over 100 cycles), increasing the micro code cache performance so that branch prediction and other related aspects scale and perform better, adding more efficient memory access controller mechanics to allow cores to generate more memory access workload especially when running with AVX (often used for bulk memory operations and some loops), boosting the core frequency 200-300 MHz to the point that people reliably see frequencies above the advertised spec in single threaded workloads as opposed to 100-200 MHz below, removing the CCX tier (now only CCD) so that infinity fabric latency is less terrible, e.t.c.

You cannot really compare the two because M1 has specific hardware level changes made to make it more x86-64 compatible. This is why Rosetta can emulate X86-64 with such a small performance impact (20% odd) since it has hardware level support for the memory model and other aspects that vastly improve execution performance. This is also why Microsoft’s attempts to emulate x86 on ARM appear so poor compared with Apple’s since they were not using specially designed ARM processors with hardware level features for x86 emulation. It is also why in theory Microsoft could make a Windows for M1 build that would see similar x86-64 performance as Apple’s MacOS.

There is no doubt that the M1 cores are fantastic. They are basically what Intel 7nm laptop processors will be like since they are made with a similar technology node. If Intel ever makes any…

Planar-11507 · January 2, 2021, 5:27am

How about the ampere ultra made from stock arm cores?

What in the world is your definition of mitigate?

Because mine is “make an issue less severe”. Which would mean in this case to me: “making the impact of higher memory latency less impactful”.

We could look at a good source: Zen 3 At A Glance: Behind The +19% IPC Increase - AMD Zen 3 Ryzen Deep Dive Review: 5950X, 5900X, 5800X and 5600X Tested

If your effectively memory latency is lower, would you say that the design choice to have the cache is mitigating the higher memory latency?

???

Drothvader-1215 · January 2, 2021, 7:39am

I play at an internal resolution of 4k that’s downscaled and displayed at 1080. (Because I don’t have a 4k monitor)

The game’s bottleneck is CPU performance because of deterministic lockstep, not rendering… If you’re stuttering it’s because you either have a really bad CPU or internet connection.

I run the game just fine at 4k on a 970 GTX. The people saying it won’t run at 4k are just talking nonsense.

DrSuperGood-1327 · January 2, 2021, 8:43am

Will likely run x86-64 applications very poorly unlike the M1 from apple as I doubt it has specific hardware features to help with x86-64 emulation or the high single thread performance that the M1 has, especially since it is a server CPU.

Apple states that the A14 is up to 40% faster CPU performance wise than the A12. On top of that the M1 is theoretically faster than the A14 due to it being more performance orientated. As such I would say the M1 is considerably faster than the A12, possibly close to 100% faster. I cannot seem to find performance comparisons between the two chips so much of this remains speculation.

Except it has no impact at all when cache misses are frequent. In such situation it is possible that theoretically slower Intel CPUs will perform better, and even AMD’s APUs might perform better. Again it all depends on the workload being run since if the entire dataset fits inside the cache memory access latency is not a concern.

Zen3 offers not significantly more cache in total than Zen2 does. The main difference with cache is how it is structured with 2 CCXs being merged together. As such the effective shared cache of each core is double the size, but also the number of cores sharing the cache has doubled. In highly parallel workloads this offers no performance improvement at all since each core has effectively the same amount of cache as a Zen2 core due to the sharing. Where this approach shows gains is with lightly threaded workloads where each core can effectively use more of the cache, with the optimum being just a single thread being executed in which case the cache size for the core is effectively doubled.

Most of the other latency improvements are for core to core latency, which has little effect on the memory access latency to physical memory (not to cache).

Bitterman-2165 · January 3, 2021, 5:53pm

I just have a few questions, first one is, why do you want to play online games on a 4k monitor? Second one is, do you actually get more than 30 fps and not being an anvil for the others in your team?
Now about the cpu, you can play hots on everything that was released in the last 7 years or even more, hots is actually more on the cpu side than gpu side, but since its a moba its not that heavy.

gishki-1202 · January 3, 2021, 6:08pm

Do you really think a build to play at 4k can’t run HotS at 60 fps? Are you serious?

Planar-11507 · January 3, 2021, 7:17pm

Did you by chance, actually follow the link?

Yes, which is what enables games to avoid latency because 32MB (well, somewhere between 16.1 and 32MB) of cache is enough to fit all the data in it.

With old design, going over 16MB meant hitting high latency. Given that some games had massive FPS gains.

I’m honestly wanting to call a Dunning-Kruger effect right now. like how earlier you were talking about marketing choices in terms of how the processor performs for example.

DrSuperGood-1327 · January 4, 2021, 5:45am

Yes and as I said, these are all server CPUs so all have poor single thread performance compared with consumer orientated desktop CPUs of the same generation. Getting most of the way is easy, it is always the last part that is hard.

Also I will again point out that it will emulate x86-64 much slower than the M1 does because the M1 does not use a stock ARM processor design. Apple has made special modifications to it so that it can run x86-64 emulation efficiently. The old A12 will suffer a similar issue as it does not have this feature. As such if you were to compare the two with x86-64 emulation then you will find the M1 cores significantly faster than the A12 cores. This is why in my opinion they cannot really be compared for performance.

As mentioned before Zen3’s performance improvements come from the sum of all the improvements and not just from the unification of CCXs on the CCD. To list some the significantly higher core clock speed, the improved memory controller, the improved micro code cache, the addition of hardware level bit vector instructions, the increase in number of instructions executed per clock for many instructions e.t.c. The larger cache will have given some games some more performance, but “massive FPS gains” cannot be entirely correlated to that.

Even still when a workload floods that cache it will fall back down to the large latency. No amount of cache will ever solve that and for now that is where Intel chips and AMD APUs have an advantage since they do not have to deal with moving data around a chiplet design. Future Intel CPUs will likely suffer similar latency issues as they move to a chiplet design and the move to DDR5 in the industry as a whole might help compensate to some extent.

Bitterman-2165 · January 4, 2021, 4:41pm

Do you really built a 4 k build to play online games?

gishki-1202 · January 4, 2021, 5:09pm

I built a 4k build to play any games. Please, don’t be so simple minded.

Drothvader-1215 · January 4, 2021, 6:26pm

I get 60FPS at 4k internal scaled down to 1080. The image is displayed at 1080 but rendered at 2160. Why? Because it looks better than 1080 with FXAA.

Seriously, if the game is choppy its because either you have the game installed on a slow HDD and it’s having issues streaming assets, your CPU clock is too slow to handle the deterministic lockstep engine, or your internet connection sucks. I don’t have these issues with my almost 10 year old i5 and 970GTX.

Bitterman-2165 · January 4, 2021, 7:12pm

Now if you said that you built a pc to play offline games at 4k i would said nothing at all, but on lonline games 4k is useless and will slow down you like a lot, first its only 60 fps, now lets get in the part were team fights happens and based on which map and lets say sometimes lags will also do its job you will experience some 20-25 fps drops, because of how hots is made. I have a 5900x and a 3070 playing on a 1440 at 165hz and depending on which maps the fps drops are usually those, even if im always over 120 fps, they never remain at 165 for all the game, so i doubt that at 4k you will have 60 fps for the entire game, this will cause you to be an anvil, because other playing at a lower resolution will have far more fps and see everything skill or movement in a more fluent way, sorry to say this but making a 4 k build for online games its pretty stupid.

gishki-1202 · January 4, 2021, 7:17pm

I haven’t have less than 60 fps in years. My 1060 3gb was flawless at 1440, now my 3070 gives me no issue at all at 4k. If you are having any issue perhaps is an issue with your machine.

Bitterman-2165 · January 4, 2021, 7:17pm

I have never said i have problems with the game, and i have already stated that hots can be played on really old cpus and gpus at a lower resolution, but at 4k limited to 60 fps, now add team fights, and depending on which map you are playing, and lets say sometimes the connection is a bit slower, i doubt that will keep those 60 fps, now lets also add to the table that hots is pretty bad developed because it cause pretty high fps drops in some map areas, while on others online games will not happen, take lol for example, on lol i get almost 165 fps, since i capped it, almost all the time with a minimum of 10 or 15 at max during teamfight, but on hots the fps drop is far bigger. Now you can defend it how much you want, but a 4k build to play online its a stupid idea, its trying to run a race with a broken leg. Its a good build for AAA games.

Bitterman-2165 · January 4, 2021, 7:21pm

Why do you assume i have problems with my pc, im just stating the true, it doesnt matter if its a 3070 or a 3090, you will not be able to keep those 60 fps for 25 mins, since they change a lot based on the game, i doubt that will 60 fps on hots, but you can always post a video where showing it make me a fool for what im saying. Im actually not saying that a 4k build is trash or something else, im just saying that its not the best for certain types of games, dont take it personal.

gishki-1202 · January 4, 2021, 7:28pm

I find what you say about HotS as incorrect because those Bugisoft games give a headche with their frame drops even at 1440p. I know how 51 fps look and I can notice it but I haven’t have any issues with HotS. The game runs smooth all time.

Moonshadow-2748 · January 4, 2021, 9:19pm

All games are CPU intensive. All the things that fly on your screen? Well, the CPU needs to tell the GPU to render them, so every single frame, every single model, texture, detail, they were all touched by the CPU at some point.

The reason HOTS doesn’t scale well with GPUs is that its lock-update model, where it needs to wait for the slow internet connection in order to move on to the next frame.

The same engine would easily achieve 80-90% more frames if it were played offline.

The higher the resolution, the more expensive each shader becomes for the GPU. It is a common trick to up resolution in order to fix a CPU - GPU bottleneck when the CPU can’t handle the GPU.

Moonshadow-2748 · January 4, 2021, 9:31pm

Actually, Microsoft SQ CPU does have hardware support for x86-x64% emulation. The problem is that it’s a Qualcomm CPU, abd Qualcomm cannot do the same performance optimization that Apple does due to economic reasons. Apple produces M1 just for their devices, while Qualcomm builds their SQ for any device, but just surface.

DrSuperGood-1327 · January 4, 2021, 10:09pm

Except for text. Since the text is then drawn for 4k and downsized so is often illegible. Technically if games are DPI aware they could be made to compensate, but I have never really seen that happen and at best they may offer a UI scaling factor like Factorio does.

Which does not matter for HotS as the game only updates 16 times a second. 60 FPS is already well past that offering low latency.

The frame rate drops are usually due to CPU bottleneck and not graphics. It is completely possible to have 120+ FPS at 4k and still be CPU bottlenecked in games like HotS which are not that graphically demanding.

To put it in perspective my GTX 760 can achieve 270+ FPS at 1920x1080 resolution with AA off most of the game. During team fights it might drop lower but that almost always is CPU related and still above 100 FPS. This same GPU only managed 15 FPS in team fights and never passed 50 FPS when paired with a Core2 quad 6600Q (if I recall correctly) as opposed to my current Ryzen 9 3900X . For most modern gaming GPUs HotS will run at the same frame rate at 1080p and it will at 4k since it is not GPU bottlenecked. Even when it is, it will likely be well above 60 FPS.

Of course if you enable anti-aliasing or other effects its frame rate might be much lower. You usually want these off when playing at high resolutions like 4k anyway as the increased resolution naturally acts as its own anti-aliasing.

The only issue might be longer response latency due to the higher GPU workload (more time spent per frame). If this makes a difference for a game running at 16 updates per second is another matter but it would for something competitive like counter strike running at 200+ updates per second.

This has little to do with frame rate. It locks internal updates to safe milestones to remain in sync. Frames are presented by interpolating between internal updates. There is even the silly situation where if you are waiting for the server due to an unreliable connection your frame rate will skyrocket (400 FPS+) due to nothing happening and then will fall very low while the game fast forwards to catch up (drops frames).

As far as I am aware only pixel and raytracing shaders. Geometry, tessellation and compute shaders are resolution independent as they do not work on pixels rather on buffers of other data.

What is a Microsoft SQ CPU? First time I heard of it and google is not showing anything useful.

The x86-64 emulation support for Windows 10 for ARM is very new. The blog post is dated 10th December so is under a month old, likely in response to M1 based macs. I am not even sure if this has been released in general yet, since it may still be limited to insider builds only for now.

Drothvader-1215 · January 4, 2021, 10:59pm

The rendering stays a solid 60 FPS regardless of connection / CPU clock. What happens is the game stops ANIMATING while it’s lockstepped and calculating where you should be / receiving a packet. You can tell this by not only using either the game’s FPS counter or an external FPS counter, but also by cloth physics continuing to render. Play Leoric and you’ll see his cape come down as the PhysX component is still running.
The engine is NOT terribly optimized, it’s high CPU usage for a reason. The game uses a design pattern which is known as deterministic lockstep. The client only executes input instructions, not parses gamestates. This keeps the packet size very very low, and the file sizes for replays low as well. This CANNOT be an asynchronous operation, as it must be deterministic, and therefore, there cannot be multithreading for the game’s base engine. If an instruction / packet is missed, the game will lock the client’s animator until it is caught up. This is why you see the “choppiness” and why replays take so long to scrub through. The game has to REBUILD the game’s state for you.

Exactly this. 100%.

I have this issue with D3, but I do not have this issue with HOTS.

Exactly. The game’s engine tick is going to only run 16 times a second regardless of how well it can render the game.