An offset in time, saves nine ⏰🌪️
A look at the 1840s Railway Mania, clocks in the Linux kernel, Time Namespaces, Network Time Protocol (NTP)
All in due time.
1840s Railway Mania, GMT and NTP(Network Time Protocol)
Before UTC (Coordinated Universal Time), we had GMT (Greenwich Mean Time) as the global standard reference for coordinated time keeping. Before GMT, local times varied significantly across regions in the United Kingdom. As the railways grew, it became really tough for train stations to coordinate schedules due to differences in times across regions. This led to a far wider usage of standard time-keeping through the railway network.
The rise in train construction was due to the 1840s Railway Mania, a period in British history marked by rapid construction in railways and speculative investments in railway companies. The period led to both economic growth and significant financial losses and it also gave way to synchronized time keeping.
Synchronized time keeping is also seen in distributed systems. For keeping time these systems use the Network Time Protocol. The protocol operates based on a hierarchical system of time sources allowing devices to synchronize their clocks with a high degree of accuracy.
Before we get deeper into NTP, let’s go back to clocks in the Linux kernel for a little bit.
Time Keeping in Linux
19th November 2024
While reading the man pages for unshare, I noticed a “—time” flag which I hadn’t known about before. That’s because when we think of containers we think of creating a namespace for mount, user, UTS etc. but we don’t usually thing about time and that’s because Time namespaces are a much newer feature in the Linux kernel. The feature was proposed in 2018 and was then released in 2020.
A time namespace seeks to provide a virtualised view of the system time.
Clocks in the Linux Kernel
There are a few different clocks in the Linux Kernel which are used by timer objects to mark the progress of a timer.
Realtime Clock
It’s like your everyday wall clock but for the system wide time. This clock is settable and is defined by CLOCK_REALTIME in the Linux Kernel. This clock is also affected by NTP (Network Time Protocol) where there might be time leaps causing an issue with analyzing events which have taken place in a chronological order.
In this case it makes more sense for us to use a monotonic clock. There is a CLOCK_REALTIME_ALARM which can be used to wake the system up if suspended
Monotonic Clock
This clock is unaffected by system startup time and cannot be set by either kernel or the user. The monotonic clock goes on even when a system is suspended. The monotonic clock is unsettable and defined by CLOCK_MONOTONIC.
CLOCK_MONOTONIC_RAW on the other hand is the less precise and more immune to NTP adjustments in this case.
CLOCK_BOOTTIME is also a monotonically increasing clock but this one also measures the time for which the system was suspended. One can see the usage of this clock in the
uptime
command.
Process and Thread CPUTime Clocks
These clocks, denoted by, CLOCK_PROCESS_CPUTIME and CLOCK_THREAD_CPU_TIME respectively, measure the CPU time a process or a thread uses over time.
Time Namespaces and Offsets
Offsets for the clocks above can be set in a namespace which create virtualized time for the particular namespace.
unshare --time /bin/bash
If you search for the process ID of the above using ps
, you should be able to offset the times in the following ways.
cat /proc/3926/timens_offsets
monotonic 0 0
boottime 0 0
Let’s add two days (in seconds) to monotonic.
echo "monotonic 172800 0" > /proc/$$/timens_offsets
The above will offset the monotonic clock of the namespace by two days. These offsets can help created virtualized time for a containerized enviroment which could be very useful especially in testing distributed systems which rely upon synchronization between themselves by maintaining a standard clock.
If we do the same for boottime, then the change will be reflected in the value of the uptime
command as below.
20th November 2024
Time namespaces were primarily created to virtualize the values of the monotonic clocks and the bootime clock. A change in the time values might affect the functioning of time sensitive applications. Time namespaces help provide an environment where the time values can be managed more closely for the application to run as expected.
A Network Time Protocol (NTP) refresher
NTP is used to ensure that multiple systems in a network have synchronized time through a system of hierarchical time syncing. The accuracy of an NTP servers is determined by the stratum the NTP server is a part of.
ntpstat
Let’s run ntpstat on our Ubuntu VM.
ntpstat
synchronised to NTP server (12.123.12.123) at stratum 2
time correct to within 32 ms
polling server every 128 s
The above output let’s us know which server and at which stratum have we synced our time to.
ntpq
Let’s use ntpq
to see which NTP peers can we sync our time with and some other parameters which will help us understand NTP a littler better.
ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
0.ubuntu.pool.ntp.org .POOL. 16 p - 64 0 0.000 +0.000 0.000
1.ubuntu.pool.ntp.org .POOL. 16 p - 64 0 0.000 +0.000 0.000
...
-ntp1.fake.net 123.45.67.89 2 u 2 64 1 175.320 -12.695 0.987
-ntp7.fake.net 123.45.67.90 2 u - 64 1 43.524 +6.680 12.414
+172.100.50.25 123.45.67.91 2 u 1 64 1 14.844 -6.596 0.856
-139.50.25.100 123.45.67.92 2 u - 64 1 28.610 +1.980 11.222
*ntp-fake.time 123.45.67.93 2 u - 64 1 42.856 +4.092 12.650
+cloud.fake.com 10.0.0.1 3 u - 64 1 26.457 +9.343 11.432
-cloud.fake.net 10.0.0.2 3 u - 64 1 26.443 +9.250 11.497
ec2-3-0-0-1 10.0.0.3 4 u - 64 1 48.324 +4.409 11.818
123.45.67.94 .PPS. 1 u 1 64 1 44.844 -3.736 1.267
ntp-pool.net 123.45.67.95 4 u - 64 1 43.442 +6.946 12.873
ec2-13-0-0-2 123.45.67.96 2 u - 64 1 48.738 +3.726 11.613
40.50.60.70 25.0.0.1 3 u 1 64 1 36.059 -16.506 0.671
...
I ran the above command in a Linux VM running on my Mac to see which peers is my local NTP server communicating with.
Stratum and Type of NTP Peer
The stratum of the first few peers is 16 which is really high which also entails that the time possibly won’t be as accurate as the peers in stratum before the first few ones. “st” over here refers to stratum. These are pool servers which the Ubuntu distribution uses to sync time by default. In an event of NTP server failure, this pool of servers is helpful because a healthy server is assigned based on the client location directly from the pool. Also, because the stratum is high doesn’t always mean that the time will be in accurate. We will look at this argument later in this post.
Reach
A bitmask which let’s us know the success rate of NTP packets reaching the NTP server. 1 is success, 0 is failure.
Delay, Offset and Jitter
Delay is the round trip delay to the NTP server/peer in question. Offset is the time offset between the NTP server and client, and jitter is the variation of the offset overtime. All in milliseconds. These parameters help determine time accurately on the NTP client.
To learn more above these calculations, check out this article.
NTP and Clock Skew
Consistent time keeping is important in distributed systems where performance of time sensitive processes such as operations, logging and node coordination can be affected due to incorrect time synchronization.
Understanding clock skew
Clock skew is measured using the offset between the local clock and the reference clock. If the offset is less than 128 ms then slewing is used to gradually adjust the clocks to avoid sudden jumps and if the time difference is more than a particular threshold (eg: 600 s), then we use stepping to change the time on the local system.
These drifts in time can cause performance issues in certain applications. You can also check out this article where running ntpd on slew mode was causing large inaccuracies in the local time.
Time synchronization in Linux containers
Containers in Linux inherit the time of the underlying Kernel which can cause issues during migrations if the two hosts have different times. To avoid issues like this Time Namespaces were introduced.
Chronos for a day ⏰🌪️
Because of all the issues we have discussed above, it has become clear that we test our applications for inconsistencies in network time to ensure application level SLAs. In the Kubernetes would we can use Chaos Tests to simulate chaotic changes in time to an application and test it’s behavior against it.
Time for some chaos.
(continued in the next batch of relevant entries …)
More References
https://www.baeldung.com/linux/timekeeping-clocks
https://unix.stackexchange.com/questions/646318/how-are-time-namespaces-supposed-to-be-used
https://www.reddit.com/r/Garmin/comments/dzcfra/garmin_manufacturers_gps_devices_yet_its_own_ntp/
https://stackoverflow.com/questions/61947696/date-and-time-synchronization-among-the-pods-and-host-in-kubernetes/61955680
https://medium.com/@yildirimabdrhm/kubernetes-timezone-management-8cc139b01f9d
https://forums.docker.com/t/docker-time-synchronization-issue/97436
https://stackoverflow.com/questions/57306639/correcting-clock-skew-in-a-gke-cluster
https://developer.harness.io/docs/chaos-engineering/use-harness-ce/chaos-faults/linux/linux-time-chaos/