In this guide, I will show you a quick and easy way to get open source syslog monitoring using Opsview.
You are here
What Is Packet Loss And How Does It Affect Your Network?
What is packet loss?
In this article, we'll look at what packet loss is, how to measure it, and how to make it go away. Simply put, packet loss is when “one or more packets fail to make their destination” and in many cases, it is a minor background issue. Many applications are designed to be tolerant of packet loss by having a level of acceptable packet loss or relying on TCP’s built in retransmission.
Acceptable packet loss?
While it sounds like an oxymoron as packet loss should not be acceptable, many application/protocol designers are aware that packet loss happens, and as such, protocols are designed to tolerate it. For example, you could probably lose 1% of all the packets involved in a SIP VoIP call and have no issues.
What causes packet loss?
One of the most common causes of packet loss is congestion, the act of having a link close to its maximum throughput that can often cause packets to start getting dropped. Other causes tend to include faulty hardware, general radio based issues and in some cases, packets can be dropped intentionally by devices to achieve a purpose such as limiting traffic throughput or for routing purposes.
What are the affects of packet loss?
Packet loss will generally reduce the speed or throughput of a given connection. Sometimes this can result in a loss or reduction in quality to latency sensitive protocols or applications such as streaming video or voice over IP, where there is less of a requirement for accuracy. Packet loss will still have some minor knock on affects since it may increase the CPU load to process the additional network overhead.
How to check packet loss?
Packet loss is a problem that can affect any given network, slowing transfers to a halt and making real time streams such as VoIP or video streams unusable. Packet loss is something that should be avoided wherever possible and is a symptom of network issues such as lack of capacity or failing devices.
Our broken local network
In the diagram below, I have a small segment of a larger network where we suspect packet loss. The two computers are connected through a switch on the same subnet. No routers are involved.
How to do a packet capture?
Ask an engineer what the first step to solving any problem and the answer you’ll probably hear back is ‘replicate it’ because if you can’t measure it, you can’t identify if you have fixed the issue or not.
The tool we are going to use for this is ping, which is on most Windows or Linux computers as a command line tool.
From 192.168.0.2, we will ping 192.168.0.3 ten times with the following command:
Mac:~ $ ping 192.168.0.3 -c 10 PING 192.168.0.3 (192.168.0.3): 56 data bytes 64 bytes from 192.168.0.3: icmp_seq=0 ttl=255 time=4.841 ms 64 bytes from 192.168.0.3: icmp_seq=1 ttl=255 time=1.980 ms 64 bytes from 192.168.0.3: icmp_seq=2 ttl=255 time=10.877 ms 64 bytes from 192.168.0.3: icmp_seq=3 ttl=255 time=14.607 ms Request timeout for icmp_seq 4 64 bytes from 192.168.0.3: icmp_seq=5 ttl=255 time=1.729 ms 64 bytes from 192.168.0.3: icmp_seq=6 ttl=255 time=1.766 ms Request timeout for icmp_seq 7 64 bytes from 192.168.0.3: icmp_seq=8 ttl=255 time=31.202 ms 64 bytes from 192.168.0.3: icmp_seq=9 ttl=255 time=3.034 ms
--- 192.168.0.3 ping statistics --- 10 packets transmitted, 8 packets received, 20.0% packet loss round-trip min/avg/max/stddev = 1.475/7.605/31.202/8.901 ms
Looks like we are having a bad time on our network with 20% packet loss spotted. Best send someone to look at it!
Our broken routed network
Now let’s look at this in a larger network. We are going to go from computer to computer via two routers.
How to identify and monitor packet loss
Let’s try the ping again from 10.0.0.2 to 10.2.0.2:
Mac:~ $ ping 10.2.0.2 -c 10 PING 10.2.0.2 (10.2.0.2): 56 data bytes 64 bytes from 10.2.0.2: icmp_seq=0 ttl=255 time=4.841 ms 64 bytes from 10.2.0.2: icmp_seq=1 ttl=255 time=1.980 ms 64 bytes from 10.2.0.2: icmp_seq=2 ttl=255 time=10.877 ms 64 bytes from 10.2.0.2: icmp_seq=3 ttl=255 time=14.607 ms Request timeout for icmp_seq 4 64 bytes from 10.2.0.2: icmp_seq=5 ttl=255 time=1.729 ms 64 bytes from 10.2.0.2: icmp_seq=6 ttl=255 time=1.766 ms Request timeout for icmp_seq 7 64 bytes from 10.2.0.2: icmp_seq=8 ttl=255 time=31.202 ms 64 bytes from 10.2.0.2: icmp_seq=9 ttl=255 time=3.034 ms
--- 10.2.0.2 ping statistics --- 10 packets transmitted, 8 packets received, 20.0% packet loss round-trip min/avg/max/stddev = 1.475/7.605/31.202/8.901 ms
As you can see, there is 20% loss once again. But this time, the different issues that could go wrong warrant the use of another tool to see if we can receive more information. To do this, we can utilize a tool called mtr and run it against 10.2.0.2 to get these results:
$ mtr --report 10.2.0.2 HOST: example Loss% Snt Last Avg Best Wrst StDev 1. 10.0.0.1 0.0% 10 2.8 2.1 1.9 2.8 0.3 2. 10.1.0.2 0.0% 10 3.2 2.6 2.4 3.2 0.3 3. 10.2.0.2 20.0% 10 9.8 12.2 8.7 18.2
This response creates an important finding. The loss is happening after hop 2, so we need to send the engineer to look at the link between the second router and 10.2.0.2 as its probably broken/congested. That link should be the first place you look.
Understanding the issue to stop packet loss
Seeing as there is no single cause for packet loss, you need to start looking at monitoring for other symptoms. Is there congestion on the segment in question? Are you seeing any port errors on the segment? These will serve as valid clues for what might be going wrong and will influence your next actions.
How to test
There are a few commonly found packet loss scenarios that can be fixed by testing out fixes within your environment. If it’s a link sitting at over 90% congestion, you may want to provision extra capacity or consider the source of this utilization spike. This would be the appropriate time to leverage flow based traffic monitoring as it might be a backup or malicious network usage issue, such as CryptoLocker encrypting a network drive. It could also be the router’s resources (such as CPU) causing a problem, so be sure to check those situations within your monitoring system.
What to do if the issue is happening outside of your network
So it would need a miracle for packet loss to only happen inside the network. A packet loss issue may represent a fault with your ISP or another ISP in the network path between you and your destination. At this point, your best course of action is to raise a ticket with your ISP and attach the captured ping/MTR output to help demonstrate and replicate the issue. The more detail you include, the higher chance you have of getting it successfully resolved.
How to reduce packet loss
Here comes the product pitch you have all been waiting for: the best way to reduce packet loss is by monitoring it. If packet loss is a symptom of a larger issue, such as lack of capacity or hardware failure, monitoring deployed across your network recognizes these problems and immediately alerts you. Ultimately, the best cure for packet loss is not letting your network fall behind in terms of maintenance or planning.
How to monitor for packet loss
With Opsview, you can be monitoring network packet loss very quickly. We have a range of fully supported monitoring solutions tailored exactly to the needs of your organization. Try Opsview for yourself with a free trial.
More like this
The agent Opsview provides comes pre-configured for use with the Host Templates, but it has been created in such as way that the agent can be...
There’s a lot being said about observability these days. Particularly, a lot being said about the difference between monitoring and observability...