SPP based CDN usage scenarios

With p2pd it is possible to build content distribution networks (CDNs) for Video-on-Demand (VoD) streaming. It is possible for clients to use HTTP (see HTTP based CDN usage scenarios), but a greater number of configurations is possible if p2pd also runs on the client machines.

Overview

Problem description: A content provider wishes to build a CDN for hosting (large) high quality video files for VoD streaming, and is assumed to already have a generic HTTP server (such as Apache) for web pages, which will handle all all non-video presentation. The p2pd based CDN is only to be used to distribute the video data, after video playback has been requested. A separate media player application will handle decoding of the video once it has been received at the client machine.

The p2pd based CDN can consist of three types of nodes; servers, caches (SCC nodes), and client machines (LHC nodes). The servers keep media files in the directory /mediafiles, which is used to publish files. The caches store downloaded files in the directory /cachefiles, the clients use $HOME/.p2pd_cache. All p2pd processes are here assumed to run on port 1111 (the current default). A file movie.mpg distributed by the CDN would have the URL spp://cdn.example.com/movie.mpg.

The machines available for the part of the server infrastructure owned and maintained by the content provider are assumed to have IP addresses in the range 10.0.0.1 - 10.0.0.5, with the DNS name cdn.example.com pointing to the nodes that should be accessible by users.

Users are assumed to have p2pd installed on their machines, and for their browser and/or media player to request the video files from it (see the HTTP/SPP compatibility document for information on how to configure this).

Scenario 1: single server (no p2p operation)

This scenario has a single stand-alone SPP server which serves all files itself. Clients connect with SPP. In this configuration the primary benefit compared to using an HTTP server is that message digests are used to verify that the video data has been correctly transmitted.

The server is started with the following command:

p2pd -p /mediafiles

At the clients, p2pd can be started as follows:

p2pd -c $HOME/.p2pd_cache

Scenario 2: multiple independent servers (no p2p operation)

In this scenario there are multiple independent servers with identical content. The communication between the servers required to synchronize the media files is not to be done with p2pd. The servers are assumed to be geographically distributed, with server selection done by clients. The servers have the addresses 10.0.0.1 - 10.0.0.5. All servers can be listed in DNS.

The benefits achieved by using SPP here primarily stem from the network awareness in the clients, which try to determine the closest/fastest server, and will automatically choose another server in case of server failures or significant negative changes in transfer rates. Clients might use multiple servers if estimated transfer rates are roughly similar.

The servers are started as follows:

p2pd -p /mediafiles \ -l 10.0.0.2 -l 10.0.0.3 -l 10.0.0.4 -l 10.0.0.5
p2pd -p /mediafiles \ -l 10.0.0.1 -l 10.0.0.3 -l 10.0.0.4 -l 10.0.0.5
p2pd -p /mediafiles \ -l 10.0.0.1 -l 10.0.0.2 -l 10.0.0.4 -l 10.0.0.5
p2pd -p /mediafiles \ -l 10.0.0.1 -l 10.0.0.2 -l 10.0.0.3 -l 10.0.0.5
p2pd -p /mediafiles \ -l 10.0.0.1 -l 10.0.0.2 -l 10.0.0.3 -l 10.0.0.4

At the clients, p2pd is started in the same way as in scenario 1, above.

Because each server is independent of the others, it is necessary to specify the address of the other servers with the -l option. Each listed host is assumed to have full copies of the all published files.

There is currently no functionality in p2pd for automatically synchronizing the content of the /mediafiles directory between the servers for this scenario, but this might be supported in future versions.

Scenario 3: coordinated configuration: single server, multiple caches

This configuration avoids the synchronization problem in scenario 2 by having only one of the servers (e.g., 10.0.0.1) running in server mode. The other nodes run in cache/SCC mode and retrieve requested parts of files from the server before they are served to clients. Files will automatically be retrieved from the master server as needed, and only the parts of files that are actually needed will be transmitted. New files that are added at the master server will automatically appear at the other nodes.

In this example, the server has the IP address 10.0.0.1. The caches have the addresses 10.0.0.2-10.0.0.5 and both the server and caches should be accessible by clients.

There are several possible ways by which traffic can reach the server infrastructure. All five addresses can be entered in DNS, some form of requested redirection mechanism can be used at the edge of the CDN, or redirection can be done at the server. This configuration assumes that client resources should not be included so it is necessary to explicitly list the caches with the -l option at the server.

The server is started with the following command:

p2pd -p /mediafiles -l 10.0.0.2 -l 10.0.0.3 -l 10.0.0.4 -l 10.0.0.5

The caches are started as follows:

p2pd -c /cachefiles -A cdn.example.com -n 0.0.0.0/0 -S 10.0.0.1

At the clients, p2pd is started in the same way as in scenario 1, above.

With this configuration, when a cache receives a request for a file (or part of a file), it will stream the data directly to the client if it is cached. If not, it will first retrieve it from either the server or one of the other caches. Availability and network conditions will be considered in an attempt to retrieve the data as fast as possible (i.e., from the machine that provides the highest estimated transfer rate). More than one machine might be used, and as long as the content is available from one of the machines, the system should be able to handle network and node failures internally.

The -n option used on the caches is needed to cause the caches to retrieve data that is not already cached when they receive requests, regardless of the client address. The -S option marks the server (at 10.0.0.1) as being the machine one step up in the hierarchically structured CDN, and is not needed if only the server is listed in DNS. The -A option is quite important in this configuration and limits requests to files located at the cnd.example.com address.

Scenario 4: resource contributions by clients (p2p-mode)

In all the scenarios above, all data is downloaded by users from the the server infrastructure of the content provider. An obvious consequence of this is that hosting popular videos of high quality for streaming to a many users will require significant amounts of bandwidth, which might quickly become very expensive.

One way of potentially reducing these costs is to extend the serving infrastructure to include the machines of clients that have already downloaded video files. The more popular a video file is, the higher the number of widely distributed locations from which it can be available. In addition to reducing the cost of hosting the videos, it increases the fault tolerance and scalability of the system.

To enable this mode, the server should be started with the -i option, to have it add clients that download parts of files to the list of possible download locations that are returned to subsequently arriving clients. This mode can be used with several of the scenarios above. In this example we will assume the same server configuration as in scenario 1, with a single server.

The server is started with the following command:

p2pd -i -p /mediafiles

No provider owned caches are part of the server infrastructure in this example.

At the clients, p2pd is started in the same way as in scenario 1, above.

Each time a client downloads part of a file, it will report this to the server, essentially making the client part of the CDN. Caches can be added to the server infrastructure by the content provider without having to change the configuration at the server; as long as they receive requests (either due to DNS configuration or request forwarding), they can be configured to automatically obtain files from the server, after which they will be added to the list of possible download locations by the server.

Note that there is no synchronization of information on clients between the server and caches. A client that primarily uses one of the machines will only be available for other clients using the same machine. If the server/caches are sufficiently distributed this will in practice be beneficial because it results in close clients clustering around the same server/cache.

Scenario 5: resource contributions by clients and ISP caching (p2p mode)

An obvious extension to scenario 4 is to include caches maintained by other entities than the content providers that maintain the servers. Streamed video can consume a lot of bandwidth, and for a large sites/ISPs with many users there might be wasted bandwidth due to duplicate downloads. In the same way as web caches such as squid have been used to reduce duplicate transfers of files from web servers, caches can be used to reduce SPP traffic. This type of caching also creates client clustering; rather communicating directly with the content servers, users at a site with a cache only communicate with the cache or other users at the same site, resulting in increased traffic locality.

To add a cache, it is only necessary to redirect requests to it, either through configuration of the user p2pd process, or by bouncing requests to the cache (with software for this purpose). This example assumes that it is configured at the clients, and that the IP address of the cache is 192.168.0.1.

The server can use any of the configurations above, as long as it accepts SPP requests.

The cache can be started in this way:

p2pd -i -c /cachefiles -n 192.168.0.1/24

The clients at the site with the cache start p2pd with these options:

p2pd -c $HOME/.p2pd_cache' -S 192.168.0.1

The cache can run both with and without p2p-mode enabled (even independently of what the server is using). In the example above, p2p-mode is enabled at the cache (with the -i option). Clients at the 192.168.0.1/24 network will either retrieve content from the cache, or from other clients at the same network. The -n option specifies the addresses of the machines it considers to belong to its own internal network. The cache will retrieve non-cached content only for internal clients.

The aim of the configurations in scenarios 4 and 5 is to offer more low-cost alternatives to expensive provider owned CDN infrastructures, in order to reduce the barriers to entry for setting up VoD streaming sites. It has yet to be proven that these scenarios are realistic in practice, but with the functionality available in p2pd it should now be able to test them.