p2pd - VoD streaming system

HTTP based CDN usage scenarios

With p2pd it is possible to build content distribution networks (CDNs) for Video-on-Demand (VoD) streaming. If clients are to connect to the CDN using HTTP it is only possible to use SPP internally in the CDN. This limits the benefits that can be achieved with p2pd (see SPP based CDN usage scenarios) but still makes it possible to build easily deployed CDNs for some usage scenarios.

Overview

Problem description: A content provider wishes to build a CDN for hosting (large) high quality video files for VoD streaming, and is assumed to already have a generic HTTP server (such as Apache) for web pages which will handle all all non-video presentation. The p2pd based CDN is only to be used to distribute the video data, after video playback has been requested. A separate media player application will handle decoding of the video once it has been received at the client machine.

The p2pd based CDN can consist of two types of nodes; servers and caches (SCC nodes). The servers keep media files in the directory /mediafiles, which is used to publish files. The caches store downloaded files in the directory /cachefiles. All p2pd processes are here assumed to run on port 8080. A file movie.mpg distributed by the CDN would have the URL http://cdn.example.com:8080/movie.mpg.

The machines available for building the CDN are assumed to have IP addresses in the range 10.0.0.1 - 10.0.0.5, with the DNS name cdn.example.com pointing to the nodes that should be accessible by users.

Users should not be required to download any additional software, but be able to use any web browser that supports HTTP.

Scenario 1: single server

If the CDN should consist of only a single server, it is possible to use p2pd, but there will be little benefit compared to using e.g., Apache.

Scenario 2: multiple independent servers

In this scenario there are multiple independent servers with identical content, but the communication between the servers is not to be done with p2pd. As with the first scenario, it is possible to use p2pd but doing so would provide little benefit.

Scenario 3: coordinated configuration: single server, multiple caches

When the CDN consists of more than one machine it is possible for these machines to communicate with SPP, even if HTTP is used to receive requests from clients. In this configuration, the caches are accessed by users and the server is only used by the CDN owner to publish files (it is not accessed directly by clients).

The server has the IP address 10.0.0.1, and is not publicly accessible. The caches have the addresses 10.0.0.2-10.0.0.5. A DNS entry for cdn.example.com would point to 10.0.0.2, 10.0.0.3, 10.0.0.4, 10.0.0.5 (but not 10.0.0.1).

The server is started with the following command:

p2pd -i -p /mediafiles -P 8080

The caches are started as follows:

p2pd -c /cachefiles -A cdn.example.com -n 0.0.0.0/0 -N 0.0.0.0/0 \
   -P 8080 -S 10.0.0.1:8080

No special setup is needed at clients.

With this configuration, when a cache receives a request for a file (or part of a file), it will stream the data directly to the client if it is cached. If not, it will first retrieve it from either the server or one of the other caches. Availability and network conditions will be considered in an attempt to retrieve the data as fast as possible (i.e., from the machine that is estimated to provide the highest transfer rate). More than one machine might be used, and as long as the content is available from one of the machines, the system should be able to handle network and node failures internally.

The distribution of data between the server and caches is handled automatically by p2pd. New caches can also be added and existing caches removed by the CDN owner without having to make any configuration changes at either the server or the other caches; integration and fail-over happens automatically. The data exchanged between the server and caches is verified with message digests.

The -i option used on the server enables the load sharing between the server and caches. The -n option used on the caches is needed to cause the caches to retrieve data that is not already cached when they receive requests. The -N option causes the server to accept HTTP requests from any client. The -S option marks the server (at 10.0.0.1) as being the machine one step up in the hierarchically structured CDN. The -A option is quite important in this configuration and limits requests to files located at the cnd.example.com address.

Scenario 4: resource contributions by clients

This scenario, which involves clients in the CDN is not possible with HTTP.

Scenario 5: resource contributions by clients and ISP caching

This scenario, which includes clients, and caches not maintained by the CDN owner, is not possible with HTTP.