Client/server clock sync issue - confirmation and solutions

Started by
4 comments, last by Net_ 3 years, 8 months ago

So, I've been trying to track down a tiny desync issue between my client and server for a long time now, and I think I finally confirmed it. I just want to bounce my thoughts off the forum and see if it all makes sense.

The issue was manifesting as a drift at a rate of ~50ms per 10min. It wasn't consistent though, and seemed to change depending on what machines I used and what the network conditions were. The only times I would have 0 drift was in testing on localhost, and testing against 1 particular friend's machine (same gen cpu as mine).

I knew that I was well within the realm of cpu variance, but I had a blind spot in my thinking because “I'm dividing the high resolution timer count by the reported frequency, surely it's fine”. Thus, I spent a long time doing all sorts of tweaks to my netcode and testing in various conditions.

Finally, I made two tiny test apps. One connects to the other, then sends a start byte and starts a timer. After X seconds, it sends an ends byte. The other side runs a timer during this period, and both sides report the measured time.

Client: https://github.com/Mmpuskas/Amalgam/blob/7a8ab43e9ec889e4c0d0bf360540635e57283e77/Source/Tests/NetworkTests/Private/ClockTestClientMain.cpp#L1

Server: https://github.com/Mmpuskas/Amalgam/blob/7a8ab43e9ec889e4c0d0bf360540635e57283e77/Source/Tests/NetworkTests/Private/ClockTestServerMain.cpp#L1

Clock test results

Local:
60s:
    Client: 60.00091s
    Server: 60.00090s
600s:
    Client: 600.00012s
    Server: 600.00011s
    
Local client, remote server:
10s:
    Client: 10.00171s
    Server: 10.00149s

60s:
    Client: 60.00016s
    Server: 60.00595s
   
600s:
    Client: 600.00153s
    Server: 600.05837s
    
1200s:
    Client: 1200.00054s
    Server: 1200.11834s

As you can see, it doesn't occur when ran on the same machine, and it scales pretty linearly when ran against the remote, so it is seems fair to say that it isn't due to network effects.

Q1: Is it fair to say that my issues come from variance in reported time between machines?

If so, instead of letting my catch-up algorithm detect that the client is 2 ticks behind and jumping it back forward every 15min, I'd like to have the client move into line with the server's clock rate.

Q2: How can I best do this? Should I just measure the time between received messages, compare it to the server's network tick rate, and adjust the client tick rate using a leaky integrator? Or should I do something more sophisticated?

Advertisement

Local clocks drifting 50 milliseconds over 10 minutes is not impossible. Over long times, this is why NTP keeps slightly re-adjusting the wall-time clock. but the high-resolution timer isn't adjusted.

Separately, there will be networking variance/jitter, too, and network paths may also shift and get slower (or faster!) over time.

Yes, you generally need to always keep syncing the clock. The easiest way to do this is for the server to tell the client how early/late each message was – “your message was received by me X milliseconds early/late compared to when it was needed.”

The client can then adjust how far ahead it sends packets, to aim for X milliseconds ahead of needed buffering (something that's a bit robust to jitter is good; you probably want a good 30-50 milliseconds of buffering for the most robust system.)

So that you don't adjust the clock too much at any one time, keeping a “clock adjustment generation” counter that you send to the server, and the server echoes back, and you adjust each time you make a clock adjustment, is generally a good idea. There was a thread just a few months ago about this in this networking forum.

enum Bool { True, False, FileNotFound };

hplus0603 said:

Local clocks drifting 50 milliseconds over 10 minutes is not impossible. Over long times, this is why NTP keeps slightly re-adjusting the wall-time clock. but the high-resolution timer isn't adjusted.

Separately, there will be networking variance/jitter, too, and network paths may also shift and get slower (or faster!) over time.

Yes, you generally need to always keep syncing the clock. The easiest way to do this is for the server to tell the client how early/late each message was – “your message was received by me X milliseconds early/late compared to when it was needed.”

The client can then adjust how far ahead it sends packets, to aim for X milliseconds ahead of needed buffering (something that's a bit robust to jitter is good; you probably want a good 30-50 milliseconds of buffering for the most robust system.)

So that you don't adjust the clock too much at any one time, keeping a “clock adjustment generation” counter that you send to the server, and the server echoes back, and you adjust each time you make a clock adjustment, is generally a good idea. There was a thread just a few months ago about this in this networking forum.

I do have the clock adjustment mechanism that you mentioned implemented and it works fine, but I was thinking that it would be worth it to also have the client adjust its tick rate relative to the server's actual clock rate. That way, the client would settle into being at the same rate instead of having to be readjusted every X minutes. Once it's settled, there shouldn't need to be any readjustments unless the network conditions change.

Is this overkill? Should I just accept that readjustments will happen every once in a while due to clock drift?

50 milliseconds in 600 seconds (10 minutes) is 1:12000 of drift. That's better than 0.01% precision. You're just as likely to see the network characteristics shift (because someone else in the house starts/stops playing Netflix or whatever) than you are to get noticeable adjustments from the clock. Whether this is worth writing and debugging and maintaining code for is entirely your own choice. I can't tell you what “good enough" means for you :-)

Personally, I would work on making adjustments be less noticeable, rather than trying to avoid adjustments entirely, because you really can't avoid adjustments entirely. But if you have better-than-frame timestamps for network packets, there's no reason you have to let the clock drift get as big as a single frame, if everything else is steady.

enum Bool { True, False, FileNotFound };

hplus0603 said:

50 milliseconds in 600 seconds (10 minutes) is 1:12000 of drift. That's better than 0.01% precision. You're just as likely to see the network characteristics shift (because someone else in the house starts/stops playing Netflix or whatever) than you are to get noticeable adjustments from the clock. Whether this is worth writing and debugging and maintaining code for is entirely your own choice. I can't tell you what “good enough" means for you :-)

Personally, I would work on making adjustments be less noticeable, rather than trying to avoid adjustments entirely, because you really can't avoid adjustments entirely. But if you have better-than-frame timestamps for network packets, there's no reason you have to let the clock drift get as big as a single frame, if everything else is steady.

Good point, and I hadn't thought of sub-tick timestamps. Thanks!

This topic is closed to new replies.

Advertisement