Analysis of long-running Dante processPosted by Inferno Nettverk A/S, Norway on Thu Apr 23 14:42:08 MEST 2015
Making software run properly requires testing and we continuously run tests internally for software we develop in order to uncover as many bugs and problems as possible before the software is released. Most problems are usually detected by our internal testing, but testing rarely manages to produce all the diverse configurations and environments that the software might be used in when in production.
Fortunately, now and then there are Dante users that allow us to test and monitor our software on their machines when we are getting close to a new release, and most of our internal tests are passing. Apart from allowing us to see how Dante behaves in a specific production environment, this testing has sometimes allowed us to find and fix problems that were not found in our internal tests, and often provides an opportunity to improve our internal test systems based on the the newly discovered problems.
In one case, during preparation for the release of Dante 1.4.0, the machine we were to deploy the new version on was running Dante 1.3.2 and had been running the same server instance uninterrupted for almost one year. When the main Dante server process was about to terminated so that we could upgrade it to Dante 1.4.0, it had been running for 326 days, 18 hours and 34 minutes.
This was a good opportunity to observe the behavior of a long-running Dante session, so before terminating the 1.3.2 processes, we spent some days collecting data on the machine.
Data collection and analysis
We have a set of scripts that we use to collect various system data, based on tools such as ifconfig, netstat, ps, etc. This makes it possible to determine the load on the machine, and in some cases, additional information about the number of active Dante sessions.
The figure above shows the incoming and outgoing transfer rates on the machine, with almost all of the traffic passing through the Dante server (it was a server dedicated to running Dante). For the measured period, the incoming and and outgoing rates overlap, indicating as expected that most of the traffic passes through the machine, rather than originating or terminating at the machine. The measured rates vary between 56 Mbps and 198 Mbps, with the average being 108 Mbps.
A comparison of the packet rates for the different IP protocols is shown in the above figure. Around 25,000 packets are received and sent each second, with a slightly higher number being received. The traffic is largely split between TCP and UDP, with the number of UDP packets being slightly larger. A relatively small amount of ICMP traffic can also be seen from the lines at the very bottom of the diagram.
An overview of the processes running on the machine can be seen in the figure above. Between around 300 and 500 processes run on the machine, with around half of being Dante processes. Most of the variability in the number of processes comes from changes in the number of Dante processes, as Dante adapts to changes in the client load on the machine.
The CPU usage on the machine is shown in the above plot, and it is fairly low. Dante uses a total of between around 8% and 20% of the CPU for all its processes. The total CPU usage for the non-Dante processes is also fairly low and stable, being between 4% and 10%.
The above figure shows the memory usage on the machine. Most of the memory is in use, but there is very little swap usage.
From the data on this machine, one can observe that Dante, at least for the type of client load handled by this machine, is able to run stable for a long time while handling a fairly large amount of traffic. One can also see that Dante adjusts the number of processes dynamically as the need changes.
If the main Dante server process, or any of the other long running Dante processes, had suffered from even modest resource leaks, such defects in Dante should have been easily visible after handling this amount of traffic on a daily basis for almost one year, assuming that it had still been running. One could also have expected Dante or the machine to have crashed a long time ago if such problems were in general present in Dante.
Part of the reason for the long lifetime of the processes might be that Dante has been designed to, when possible, avoid operations such as dynamic memory allocation, as they can often lead to gradual resource leaks and performance degradation when not implemented fully correct. This design choice in Dante can often make code less complicated, though the trade off is that it may use somewhat more memory than strictly necessary. The benefits can however likely be seen in the long lifetime of this Dante server instance; being able to handle a large amount of clients and traffic month after month.
In sum, there is nothing remarkable about the data from this machine. The machine is under load and it is able to handle the load with a good margin for spikes in client sessions and traffic. Both the machine and Dante appear to be running well, and there is no indication that it would not have continued to run for another year or two if we had not terminated it in order to test a new Dante version.
This type of testing and analysis is essential for the development of well working software.
If you are in a situation where you can help us with this type of testing, feel free to contact us at the (non-public) email address email@example.com. While being able to access and test software remotely is very helpful, having access to performance data from a production environment is also very useful. We have a set of shell scripts that we use for data collection (based on tools such as ps, ifconfig, etc.). If you think you might be able to collect this type of system information from one of your systems running Dante, please feel free to contact us on the address above, and we will send you a copy of the scripts we use for this.
In addition to helping us produce releases that have fewer bugs, the Dante users that help us with testing benefit in that they are less likely to experience any unexpected problems and performance issues when they upgrade to the latest version, as we have already analyzed the performance of Dante on their machine.
They might also inspire us to write blog entries such as this (so please also let us know if publishing analysis results based on your data is ok or not if you send us any data).
Subject: When will you update Debian repository...
Date: May 2016
Hello, I'm using Debian Jessie and I was wondering; do you abandon debian?> ttps://packages.debian.org/search?keywords=dante-server I emailed them several times last year, and they still doesn't update it. I'm using 1.1(installed when Debian Wheezy era), and I think it's time to upgrade Debian repository. (And 1.1 has horrible bug that stop connection if the user upload big file using POST method to webserver)
Name: Inferno Nettverk A/S
Subject: Debian repository
Date: July 2016
The Dante packages included with distributions such as Debian are not maintained by us. Unfortunately, this means that sometimes only old versions of Dante are available. Version 1.1 of Dante is very old and we would recommend that you manually compile a newer version, if possible.