![]() |
|
||
Traffic monitoringThe Dante server forwards network traffic between different sets of machines. As a result, some types of network problems that might affect the forwarded traffic will result in observable changes in network behavior. This includes behavior such as large number of TCP connections being terminated due to a server being rebooted, or traffic no longer being received or sent. The monitoring functionality in the Dante server allows different alarms to be set, which will result in warnings being logged if these types of situations are detected. For reverse proxy topologies, or topologies where the SOCKS clients are used to only access a limited set of target servers, which are always expected to be available, please also see the error logging functionality documentation for how to enable logging of routing and related errors returned by the kernel. Traffic monitoringThe alarms are specified in so-called monitors. These objects have the same general format as the rules Dante uses for access control. However, monitors are completely independent from the access control rules, and only perform passive monitoring of network traffic, or the lack of network traffic. The following example shows the general monitor syntax, showing a monitor, without any actual monitoring operations: monitor { from: 0.0.0.0/0 to: www.example.org port = 80 protocol: tcp } The example has a from address that matches all IPv4-addresses, a to address that matches only the host www.example.org, port 80, and a protocol keyword that limits monitoring to TCP traffic. This monitor can be used to monitor TCP traffic from all connecting clients (via Dante) to the host www.example.org on port 80. Any other traffic passing through Dante will be ignored by this monitor. A monitor can include many of the same keywords that are available in the Dante ACL rules. The following subset is currently supported:
NOTE: It is currently recommended that the protocol keyword is always specified and set to tcp because there is currently only limited support for monitoring of UDP traffic, and only limited testing of UDP traffic monitoring has been done. A monitor can be mostly empty, as in the example above, in which case no actual monitoring will be performed. The main function of monitors is to provide a container for one or more alarms, which are specified using a new set of keywords not available for other objects. These keywords can be used to make Dante warnings for different types of unexpected network behaviour. Network behaviour that can cause the currently supported alarms to trigger include the following:
The keywords for the different alarms are described further down. Active TCP sessions will at most match one monitor, but multiple alarms can be specified in a single monitor. This makes it possible to specify multiple sets of conditions for the same TCP sessions, depending on what network interface the traffic is transferred on and whether the traffic is being received or transmitted. Data idleness detectionFor machines or networks that are expected to continuously send or receive traffic, a period of no or little traffic being transmitted or received might be an indication of a network problem. To allow these types of situations to be detected it is now possible to enable data alarms in the Dante monitors using the alarm.data keyword. Adding an alarm.data keyword to a monitor will result in warnings being logged if there are periods with too little network traffic. Dante has four network paths and data alarms can be configured independently for each of them:
The data.alarm keyword takes two parameters: a byte count and a duration in seconds. The alarm will trigger if the specified number of seconds pass with only the specified number of bytes (or less) being transmitted. The syntax is as follows: internal.alarm.data.recv: DATALIMIT in INTERVAL
If only DATALIMIT bytes (or less) have been transferred during a period of INTERVAL seconds, an alarm will trigger in Dante. The following is an example of a configuration where Dante is expected to always receive at least some traffic on the internal network interface. At most there should be a 10 second pause without any data being received: internal.alarm.data.recv: 0 in 10 If no data, not even one byte, is received by Dante on its internal interface during a period of 10 seconds, an alarm will trigger in Dante. In this example the alarm is for Dante's internal network interface, on which it would typically have connectivity to the SOCKS clients. The following is an example with a data limit of 10240 bytes and a duration of 20 seconds: external.alarm.data.recv: 10240 in 20 On this network the operator expects that during a period of 20 seconds there will never be the case during normal operation that Dante will have received only 10240 bytes or less on the external interface. Should there be a 20 second period where Dante has received only 10240 bytes or less, an alarm will trigger. In this example the alarm is for Dante's external network interface, on which it would typically have connectivity to the target servers. Placed in a monitor, the full expression for this alarm can be expressed using this syntax: monitor { from: 0.0.0.0/0 to: www.example.org port = 80 protocol: tcp # warn in case 20 seconds pass where only 10240 bytes have been # received from the target server www.example.org port 80. external.alarm.data.recv: 10240 in 20 } The above monitor will apply only to TCP traffic received from the server www.example.org on Dante's external network interface. It will not consider traffic sent to the server www.example.org, or the traffic received from the SOCKS clients. Multiple alarms can be specified in more complicated monitors: monitor { from: 0.0.0.0/0 to: www.example.org port = 80 protocol: tcp # warn if only 10240 bytes have been received from target server # www.example.org port 80 during a period of 20 seconds. external.alarm.data.recv: 10240 in 20 # warn if only 1024 bytes have been sent to the clients during # a period of 20 seconds. internal.alarm.data.send: 1024 in 20 } Data alarms trigger when a period of data idleness has been detected. Once a data alarm has triggered, it will remain active until it is cleared. A warning will be logged when the alarm triggers and than again when the alarm condition is cleared. In between these two points no warnings related to this alarm will be logged. This avoids repeating the same alarm/warning multiple times during network problems that last for an extended amount of time. When the alarm is cleared, Dante will also include information about how long the alarm condition lasted. A data alarm can be cleared in two ways:
Once an alarm has been cleared, it can trigger again if enough data is not being transferred. Using the previous example: external.alarm.data.recv: 10240 in 20 An alarm will trigger if only 10240 bytes have been received by Dante on the external network interface during the last 20 seconds. If, after the alarm has triggered, more than 10240 bytes of data is received on the external interface during a period of 20 seconds, Dante will clear the alarm and log that the alarm has been cleared using the same log level at which it logged the alarm triggering. Note that alarms will trigger also shortly after server startup if the Dante server does not receive sufficient traffic to prevent the alarms from triggering. Data alarms will trigger regardless of whether there are active sessions matching the monitor or not; if enough data is not being transmitted or received, a data alarm will trigger. The following format is used for the data alarm warnings: warning: monitor(MONNUM): alarm/data STATE: MONSRC -> MONDST TYPE: NBYTES/DATALIMIT in INTERVALs. Session count: SESSIONSThe keywords have the following meaning:
The following is an example of a monitor and the corresponding warning that is produced when the second alarm triggers: monitor { from: 0.0.0.0/0 to: 0.0.0.0/0 protocol: tcp internal.alarm.data.recv: 1 in 2 external.alarm.data.recv: 1 in 2 }warning: monitor(1): alarm/data [: 0.0.0.0/0 -> 0.0.0.0/0 external.recv: 0/1 in 2s. Session count: 0 From the warning fields it can be seen that the alarm triggered because no data was received during the last two seconds. No sessions were active when the alarm triggered. When the monitor clears due to enough data having been transferred, the log message can look like this: warning: monitor(1): alarm/data ]: 0.0.0.0/0 -> 0.0.0.0/0 external.recv: 2/1 in 2s. Session count: 1. Alarm duration: 4sAs can be seen, '[' is used to specify that a data alarm triggers, while ']' is used to specify that it has cleared; the former indicates that an error condition has occurred, the latter that the error condition has ended. The message logged when the alarm is cleared also specifies how long the alarm lasted. In this case the alarm was active for four seconds, at which point two bytes had been received during the last two seconds and there was one active session matching the monitor. Note that the message indicating that an alarm has cleared is not logged if the alarm was cleared due to a SIGHUP signal being received. Abnormal rate of connection termination detectionIf a large number of connections are terminated within a short period of time, this is also a possible indication of a connectivity or network problem, perhaps due to a remote network server/proxy crashing. By using the alarm.disconnect keyword the Dante server can log a warning when this type of situation occurs. There are two variants of the alarm keyword, one for the internal network interface, between the SOCKS clients and Dante, and one for the external interface, between the Dante server and the target servers:
Each alarm keyword takes three parameters, a minimum count, a ratio value, and a time interval. The following format is used: internal.alarm.disconnect: MINCOUNT/RATIO in INTERVAL
The following is an example of an alarm that will trigger if one third of all connections between Dante and the target server are disconnected within 15 seconds, but only if the number of disconnected connections amount to at least 1000: external.alarm.disconnect: 1000/3000 in 15 If there are less than 1000 disconnects, or less than 33 percent of all connections that existed during this period are disconnected, no alarm will trigger. An alarm will also not trigger if the 1000 disconnects occur over a period of time that is longer than 15 seconds. The following should be noted:
A complete monitor with two disconnect alarms can look like this: monitor { from: 0.0.0.0/0 to: www.example.org port = 80 protocol: tcp # warn if 1/3 or more sessions disconnect during a period of five seconds, # but require a minimum of 1000 disconnects on either side. internal.alarm.disconnect: 1000/3000 in 5 external.alarm.disconnect: 1000/3000 in 5 } The above monitor will apply to TCP connections to the server www.example.org. If at least 1000 sessions on either the internal or external network interface side disconnect during a period of five seconds, and these 1000 disconnects constitutes at least 33% of the connections to www.example.org port 80 that existed during these five seconds, an alarm will trigger. The alarm will trigger regardless of whether the disconnects occurred on the connections between the clients and Dante, or between Dante and the target servers, but does require there to be at least 1000 disconnects on either or both sides. If there are 3000 sessions to www.example.org port 80, and 500 of these disconnect on the external interface (from www.example.org), while 500 disconnect on the internal interface, no alarm will trigger. Alarms trigger each time a sufficient number disconnects occur. Each sufficiently large burst of disconnects will result in an alarm, but normally at most one warning per alarm will be logged during each time interval, though this might change in a later version of Dante. Separate alarms are produced for each distinct alarm keyword. The following format is used for the disconnect alarm warnings: warning: monitor(RULENUM): alarm/disconnect ]: MONSRC -> MONDST TYPE: DISCONNECTS/SESSIONS disconnects during last INTERVALs. Session count: SESSIONSThe keywords have the following meaning:
The following is an example of a monitor and the corresponding alarm that is produced when it triggers: monitor { from: 0.0.0.0/0 to: 0.0.0.0/0 protocol: tcp external.alarm.disconnect: 1/2 in 5 }warning: monitor(1): alarm/disconnect ]: 0.0.0.0/0 -> 0.0.0.0/0 external: 1/1 disconnects during last 5s. Session count: 0 From the warning fields it can be seen that the alarm triggered because one connection was disconnected by the target server. The ratio in the alarm is 1/2, meaning that at least one connection must be disconnected and at least 50 percent of the total number of connections must terminate for the alarm to trigger. With only a single connection present, one disconnected connection corresponds to 100 percent of all connections, and thus 1/1 is the ratio given in the warning, meaning all sessions disconnected. At the time the alarm triggered, there were zero active sessions matching the monitor. |