Random DC's Over Past 30 days or so

Hey All,

So I’ve been getting seemingly random DC’s between my host and the Moon Guard server on a semi regular basis as of late. It seems it starts giving one every 5 minutes or so, and then it calms down and doesn’t happen again for a while.

Now I read you guys support article over at https://us.battle.net/support/en/article/000027780

And you list the US West and US Central IP Address as

137.221.105.2 (US West)
24.105.62.129 (US Central)

Which is… interesting.
So both

24.105.0.0/18 and
137.221.96.0/19

Show as belonging to you guys in ARIN, but those specific addresses don’t seem to be getting used by the game itself. Wireshark shows 137.221.96.152 as the primary address (for myself at least) and 137.221.97.120 neither of which were listed in the article . There’s also a host of addresses owned by google but resolves to prod.depot.battle.net so I assume it’s probably some kind of patching mechanism or similar

34.105.52.54
34.168.63.5
34.168.233.93
34.83.189.200
34.127.54.29
35.230.79.161
35.203.191.112
34.105.26.50

Anyway looking at 137.221.96.152, I am showing packet loss, but it’s the Level 3 connections at Atlanta and Chicago, though I’m not sure if Level 3 is doing that on purpose or if those Route points are actually overwhelmed, Nearly 99% loss to ae2.3608.ear7.Chicago2.Level3.net. Given the staggering amount of packet loss I assume that the point takes initial ICMP burst and then drops the rest. There’s an additional 7% Packet loss at Level 3’s Atlanta Lag

Anyway based on the forum topics it looks like I’m not the only one running into this. I was curious if outside of WinMTR if there was a better way to isolate exactly when and why the connection fails, or if there’s a debug log client side that might get into more specific on where a given packet failure is occurring.

1 Like

Heya Alnarra,

We’ve been tracking what’s probably the same issue, over at Moon Guard disconnection issues compendium - could I ask you to run the test in post #24 and let us know your findings?

Also, to explain what you’re seeing in ARIN - 24.105.0.0/18 means “every IP address in range 24.105.0.1 - 24.105.63.254”, and 137.221.96.0/19 is “every IP address in range 137.221.96.1 - 137.221.127.254”. 137.221.105.2 is the address, probably a router, in their West Coast datacenter that’s allowing ICMP packets and is therefore usable for WinMTR tests. All others (including 137.221.97.120 and 137.221.96.152) are denying ICMP. But, since all of those addresses are located in the same datacenter, an routing issue occurring to 137.221.105.2 would also occur to 137.221.97.120 or 137.221.96.152.

And, for the final part of the puzzle that threw me off originally - Moon Guard is a CST timezone server hosted in the US West datacenter.

Looking at the traceroute after it hits Blizzard-En.Ear7.Chicago2.level3.net and the connection from there to et-0-0-0-pe03-swlv10.as57976.net (which is Blizzard’s AS) and then las-swlv10-ia-bons-03.as57976.net (I assume based on the airport code that it’s dropping out somewhere in Vegas). I assume the GSLB entry points the user to the Chicago data center and then hops a MPLS circuit or similar to head to the Arizona DC

Running the netstat I’m seeing

TCP 192.168.1.106:62291 137.221.96.152:3724 ESTABLISHED

Though it’s worth noting it’s very difficult to actually forcibly replicate this issue and for me at least, several disconnects will occur and then it seems to settle. Given these days I imagine all the servers are virtual host and if there were an issue with Moon Guard, I would expect that issue to also be present on other servers living off the same NIC for whichever vcenter appliance it’s living in at that very moment.

I wonder if there’s a good way to get a list of other servers in whatever virtual cluster moon guard’s on at the moment. I used to assume it was the battlegroups, but these days I’m not nearly as sure.

For WinMTR tests, only first and last hop really matter. When it shows info, it’s not following the packet, it’s pinging each destination individually. Most routers are set up to deny ICMP, but if they’re being used for routing packets to other routers, they don’t care what protocol the message is. The hops between only matter if the trend occurs from a certain point all the way to the last hop.

The trend that’s starting to appear is that 137.221.96.152 seems stable, and 137.221.97.120 is experiencing disconnects - “several disconnects will occur and then it seems to settle” to me tells me that you’re disconnecting from 97.120 until eventually the game gives you 96.152, but would love to hear the results of some experimentation on your end to see if you can replicate this exact phenomena too.

1 Like

Hmm, Interesting, isolating down I suspect you may be correct. It looks like the initial connection strings at least from the logging I have do show 137.221.97.120 as the primary Server WoW was attempting to connect to. By the time the connection had stabilized it appears all traffic was headed for 96.152

https://imgur.com/a/mqInCBZ - Initial Connection / Connection Issues
https://imgur.com/d4Rihjg - Stable Gameplay

I wonder, if I was to forcibly block 96.152 if it would attempt to default to 97.120 or vice versa. Let’s find out…

Interesting, it appears that does work
So, Forcing a block on 96.152, the game once again reassociates only with 97.120
https://imgur.com/tAtGBE5 - Local Firewall Rule
https://imgur.com/sgxHO8T - Near immediate DC (or within only a minute or so of logging in)
https://imgur.com/wjN8D8p - Log showing transition from 96.152 to 97.120
https://imgur.com/sxCOPpJ - Also showing connection log from netstat perspective.

Of course now that I’ve got it under a microscope it wants to behave. Though the answer to will it default to one or the other if the one it looks for first is blocked appears to be yes (flipping the blocked IP to 97.120 it DC’d and them immediately was able to associate with 96.152). So if the problem TRUELY is with 97.120 in theory blocking the address should in THEORY solve the problem temporarily.

2 Likes

This is great info - I’ve gone ahead and added in another report in the compendium. Thank you!

I’m not familiar with Glasswire or how to read it, but being able to create a deny rule and reproduce the issues consistently on 97.120 is a pretty solid tell to me.

Even though diagnostically useful, it may not be good to use a firewall deny rule to work around this issue. It’s possible that 97.120 may pass other traffic, and so far the known scope is only TCP sessions occurring with remote port 3724. I’d suggest removing the rule and instead just reconnecting until you land on 96.152 if you want a stable connection, even though it’s a bit more time-consuming.

Great info - hoping we can get this passed along soon. Thank you again!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.