Chapter 1: The Evolution of Peer-to-Peer

Peer-to-peer is an almost magical term that's often used, rarely explained, and frequently misunderstood. In the popular media, peer-to-peer is often described as a copyright-violating technology that underlies song-swapping and file-sharing systems such as Napster and Gnutella. In the world of high-tech business, peer-to-peer networking is a revolution that promises to harness the combined computing power of ordinary personal computers and revolutionize the way we communicate. And to Internet pioneers, peer-to-peer is as much a philosophy as it is a model of development, one that contains the keys needed to defeat censorship and create global communities. All of these descriptions contain part of the answer, but none will help you build your own peer-to-peer systems, or explain why you should.

In this chapter, you'll learn what distinguishes peer-to-peer applications from traditional enterprise systems, how peer-to-peer technology evolved in the early Internet, and what advantages and disadvantages the peer-to-peer model offers. You'll also preview the .NET technologies you'll need to build peer-to-peer software, and the challenges you'll face along the way. By the end of the chapter, you'll be able to decide when you should (and shouldn't) use peer-to-peer designs in your own solutions.

A Brief History of Programming

The easiest way to understand peer-to-peer applications is by comparing them to other models of programming architecture. To understand peer-to-peer programming, you need to realize that it's part revolution, part evolution. On the one hand, peer-to-peer programming is the latest in a long line of schisms that have shaken up the programming world. Like them, it promises to change the face of software development forever. On the other hand, peer-to-peer programming borrows heavily from the past. It's likely that peer-to-peer concepts may end up enhancing existing systems, rather than replacing them.

The Birth of Client-Server

In a traditional business environment, software is centralized around a server. In the not-so-distant past, this role was played by a mainframe. The mainframe performed all the work, processing information, accessing data stores, and so on. The clients were marginalized and computationally unimportant: "dumb terminals." They were nothing more than an interface to the mainframe.

As Windows development gained in popularity, servers replaced the mainframe, and dumb terminals were upgraded to low-cost Windows stations that assumed a more important role. This was the start of the era of client-server development. In client-server development, the server hosts shared resources such as the program files and back-end databases, but the application actually executed on the client (see Figure 1-1).

Figure 1-1: Client-server computing

This approach is far from ideal because the clients can't work together. They often need to compete for limited server resources (such as database connections), and that competition creates frequent bottlenecks. These limitations appear most often in large-scale environments and specialized systems in which client communication becomes important. In mid-scale systems, clientserver development has proved enormously successful because it allows costly mainframes to be replaced by more affordable servers. In fact, though many programming books talk about the end of client-server development, this model represents the most successful programming paradigm ever applied to the business world, and it's still alive and well in countless corporations.

Distributed Computing

The more popular the Windows PC became in the business world and the more it became involved in ambitious enterprise systems, the more the limitations of client-server programming began to show. A new model was required to deal with the massive transactional systems that were being created in the business world. This new model was distributed computing. Distributed computing tackles the core problem of client-server programming—its lack of scalability—with a component-based model that can spread the execution of an application over multiple machines.

In a distributed system, the client doesn't need to directly process the business and data-access logic or connect directly to the database. Instead, the client interacts with a set of components running on a server computer, which in turn communicates with a data store or another set of components (see Figure 1-2). Thus, unlike a client-server system, a significant part of the business code executes on the server computer.

Figure 1-2: Distributed computing

By dividing an application into multiple layers, it becomes possible for several computers to contribute in the processing of a single request. This distribution of logic typically slows down individual client requests (because of the additional overhead required for network communication), but it improves the overall throughput for the entire system. Thus, distributed systems are much more scalable than client-server systems and can handle larger client loads.

Here are some of the key innovations associated with distributed computing:

If more computing power is needed, you can simply move components to additional servers instead of providing a costly server upgrade.
If good stateless programming practices are followed, you can replace individual servers with a clustered group of servers, thereby improving scalability.
The server-side components have the ability to use limited resources much more effectively by pooling database connections and multiplexing a large number of requests to a finite number of objects. This guarantees that the system won't collapse under its own weight. Instead, it will simply refuse clients when it reaches its absolute processing limit.
Distributed computing is associated with a number of good architecture practices, which make it easier to debug, reuse, and extend pieces of an application.^[1]

Distributed programming is the only way to approach a large-scale enterprise-programming project. However, the classic distributed design shown in Figure 1-2 isn't suited for all scenarios. It shares some of the same problems as client-server models: namely, the overwhelming dependence on a central server or cluster of server-like computers. These high-powered machines are the core of the application—the 1 percent of the system where 99 percent of the work is performed. The resources of the clients are mostly ignored.

Peer-to-Peer Appears

The dependency on a central set of servers isn't necessarily a problem. In fact, in some environments it's unavoidable. The reliability, availability, and manageability of a distributed system such as the one shown in Figure 1-2 are hard to beat. In all honesty, you aren't likely to use peer-to-peer technology to build a transaction-processing backbone for an e-commerce website. However, there are other situations that a server-based system can't deal with nearly as well. You'll see some of these examples at the end of this section.

Peer-to-peer technology aims to free applications of their dependence on a central server or group of servers, and it gives them the ability to create global communities, harness wasted CPU cycles, share isolated resources, and operate independently from central authorities. In peer-to-peer design, computers communicate directly with each other. Instead of a sharp distinction between servers that provide resources and clients that consume them, every computer becomes an equal peer that can exhibit clientlike behavior (making a request) and server-like behavior (filling a request). This increases the value of each computer on the network. No longer is it restricted to being a passive client consumer—a peer-to-peer node can participate in shared work or provide resources to other peers.

Peer-to-peer is most often defined as a technology that takes advantage of resources "at the edges of the network" because it bypasses the central server for direct interaction. As you can see in Figure 1-3, this approach actually complicates the overall system.

Figure 1-3: Peer-to-peer computing

Peer-to-peer programming is regarded by some as a new generation of programming design, and by others as a subset of distributed computing. In a sense, distributed architecture overlaps with peer-to-peer architecture because many of the technologies used to create distributed enterprise applications can be used to create peer-to-peer systems as well. However, peer-to-peer applications represent a dramatic shift toward a decentralized design philosophy that is quite different from what most programmers expect in an enterprise application.

Here are some of the hallmarks that distinguish a peer-to-peer application:

The processing is performed on the peers, not farmed out to another computer (such as a high-powered server).
The peers interact by establishing direct connections, rather than passing messages through a central authority.
The system can deal with inconsistent connectivity (for example, peers who disappear and reappear on the network).
The system uses a proprietary peer naming and discovery system that operates outside the Internet's Domain Name Service (DNS) registry.

^[1]Distributed computing is sometimes described as multitier or n-tier programming, but this is not strictly correct. Distributed computing is a physical model that splits execution over multiple computers. Multitier programming is a logical model that divides an application into distinct layers. Think of it this way: A program with a multitier design has the option of graduating into a distributed application. However, multitier design and component-based programming can still be used in a traditional client-server application.