Distributed computing is easy with .NET Remoting, but it's difficult to do well. In order to manage the system, you need a dedicated work manager that must be coded very carefully, or it could weaken the entire system.
Many potential enhancements to the work-manager system deal with improving the work manager itself. The next few sections describe several possibilities.
Currently, tasks are allocated as soon as they are received, much as messages were immediately sent with the first version of the Talk .NET coordination server. With a queued work manager, new task requests would be stored in memory. A dedicated work manager thread would periodically scan the queued tasks and allocate them to workers. One advantage of this approach is that it would allow the work manager to hold on to submitted tasks that can't be fulfilled right away (because there are no available workers). This approach would also allow you to deal with worker cancellations.
Fortunately, this design is easy to implement. In fact, you've already seen it, in Chapter 5 with an asynchronous message delivery! This design can be adapted almost exactly to the work manager coordinator.
You might also want to use the worker manager queuing thread to monitor currently assigned tasks. In this case, you could remove an assigned task if a peer doesn't respond after a long amount of time, and assign it to a different worker. If the task is extremely important and the system is working over a fast network, you might even want to add a GetProgress() method in the worker, which the server could call periodically to verify that a task is properly underway.
Queuing could also be applied in the worker itself. In this scenario, work segments would be added to a collection in memory. A separate thread would perform the actual work and would retrieve a new task segment as soon as the current one is finished.
Note |
For a demonstration of queuing in action, refer to the revamped coordination server in Chapter 5, or the file transfer application that we'll develop in Chapter 9. |
Currently, the work manager assigns work to the first available workers. This means that in a system with lots of extra capacity, the workers registered near the top of the collection will serve the most requests.
If you have peers of widely different abilities or connection speeds, you might want to assign work more intelligently. In this case, the server needs to track information about each worker. This information would probably be stored in the WorkerRecord object, although you could create another class and store it in a different hashtable (indexed by a worker ID) to reduce tread contention.
There are several questions you need to answer with performance scoring:
What statistics will you measure?
How often will you retrieve the statistics?
How will you combine the statistics to arrive at a single performance metric?
How will the performance metric influence the work assignment or worker choice?
For example, you might decide to track the peer's uptime, the number of task segments the peer has processed, the average response time for completing a task segment, and so on. Then, you need to provide a property that combines these details to arrive at a single number. There's no magic formula here—you may need to tweak this calculation based on experience. Here's an example that combines this information with different weighting
Performance Score = Total Uptime In Minutes - (Average Task Time In Minutes) * 50
In this case, the higher the performance score, the better. The average task time is weighted by a factor of 50 representing its importance relative to the total uptime.
Finally, now that you have this information, you need to optimize your work-assignment algorithm. There are two basic choices here:
Sort the collection of available workers by performance score. Then, take the workers with the best performance score, and use only them.
Use the workers as normal, but adjust the amount of work given to them so that the best performing workers receive the greatest share of the work. For example, in the prime number example, a better performing worker would receive a larger range of numbers.
The first approach is best suited to the prime number example. The second approach works well when you have a problem with a high degree of parallelism (for example, a task that's being divided into dozens of task segments).
In the distributed prime number example, all communication flows through the central work manager. In some cases, you may be able to reduce the amount of communication by using peers that store their results directly. This technique is primarily useful when you're using a distributed-computing framework to remove a processing bottleneck but aren't dividing individual tasks into multiple segments.
For example, consider a web service that allows clients to upload graphic projects that will be rendered on the server and stored on a hard drive. This task is extremely CPU-intensive, so you're unlikely to perform it inside the web-service method itself. Instead, the web service might forward the request to a back-end work manager. The client would check for the completed file at a later date.
In this scenario, the work manager doesn't necessarily need to receive the results from the peers because it doesn't need to contact the client directly or reassemble multiple task segments. The workers will still send back a task-complete acknowledgement to the work manager in order to confirm that the work was completed and that it doesn't need to be resubmitted. However, the workers can store the results directly in a database, file, or some other sort of permanent storage.
To support this design, the task request message would need to contain information about how the task results should be serialized. To ensure maximum flexibility, you could define an abstract class to use like this:
<Serializable()> _ Public MustInherit Class ResultStore End Class
Following is an example result store that contains the information needed to store results in a database:
<Serializable()> _ Public Class DatabaseStore Inherits ResultStore Public DatabaseConnection As String Public Table As String Public TaskIDFieldName As String Public ClientIDFieldName As String Public ResultFieldName As String End Class
Now the work manager would create a DatabaseStore method and send it to the appropriate worker with the task request. The worker would complete the task and then store it directly in the specified location.
In the prime number work manager, the work manager system is tightly bound to the type of problem (in this case, calculating prime numbers). The message formats are hard-coded to use certain properties that only make sense in this context. Then the task-submission logic implements the task-specific code needed to divide the range of prime numbers into shorter lists, and so on. This limits the flexibility of the system.
You might be able to create a more flexible system by creating a work manager that supports multiple types of tasks, defining a generic interface for all task objects, and moving some of the code into the task object itself. However, the task server and workers would still need to reference the assemblies for all the types of tasks.
What if there were a way for a requester to define a new type of task with a request? This would allow you to create a distributed computer that could tackle any client-defined problem, without needing to modify and redeploy the software. In fact, this is possible with .NET, but it isn't suitable in all situations.
The basic concept is for the task requester to submit a .NET assembly (as an array of bytes) with the task. The worker would then save this file to disk, and load the task processor using reflection. The worker only needs to know the name of the class, which it uses to instantiate the task-specific object. It could call methods in a generic task interface (for example, IGenericTask.DoTask()) to perform its work. The data would be returned as a variable-sized byte array, which only the client would be able to interpret.
Here's a snippet of code that creates an object in an assembly, knowing only its name and an interface that it supports:
' Load an assembly from a file. Dim TaskAssembly As System.Reflection.Assembly TaskAssembly = System.Reflection.Assembly.LoadFrom("PrimeNumberTest.dll") ' Instantiate a class from the assembly. Dim TaskProcess As IGenericTask TaskProcess = CType(TaskAssembly.CreateInstance("TaskProcessor"), IGenericTask) ' (You can now call TaskProcess.DoTask() to perform the task.)
Tip |
The Assembly.LoadFrom() method provides several useful overloaded versions. One version takes a URI that points to a remote assembly (possibly an assembly, which can include a Universal Naming Convention (UNC) path to an assembly on another computer, or a URL to an assembly on a web server). This version is particularly useful because the assembly is transparently copied to the local GAC, where it's cached. If you use LoadFrom() in this way to instantiate an assembly that already exists in the GAC, the local copy is used, thereby saving time. |
To make this example even more generic, the DoTask() method uses a byte array for all input parameters and the return value, which allows you to store any type and length of data.
<Serializable()> _ Public Class TaskRequest Public Client As ITaskRequester Public InputData() As Byte Public OutputData() As Byte End Class
The easiest way to convert the real input and output values into a byte array is to use a memory stream and a BinaryWriter. Here's the code you would use to call the prime number test component generically. It's included with the online examples for this chapter in the DynamicAssemblyLoad project.
Dim ms As New MemoryStream() Dim w As New BinaryWriter(ms) ' Write the parameters to the memory stream. w.Write(FromValue) w.Write(ToValue) ' Convert the memory stream to a byte array. Dim InputData() As Byte = ms.ToArray() ' Call the task, generically. Dim OutputData() As Byte = TaskProcess.DoTask(InputData) ' Convert the returned values (the list of primes) using a BinaryReader, ' and display them. ms = New MemoryStream(OutputData) Dim r As New BinaryReader(ms) Do Until ms.Position = ms.Length Console.WriteLine(r.ReadInt32()) Loop
Of course, this approach sacrifices some error-checking ability for the sake of being generic. If the caller doesn't encode parameters in the same way that the task processor decodes them, an error will occur.
As it stands, the generic task client is a perfect tool for distributing a malicious virus on a broad scale. Once an assembly is saved to a user's local hard drive, it has full privileges and can take any action from calculating prime numbers to deleting operating system files. In other words, an attacker could define a malicious task, and your system would set to work executing it automatically!
Fortunately, there's a solution. You need to build your own code sandbox and carefully restrict what the assembly can do. This is the approach taken by the peer-to-peer .NET Terrarium learning game. It allows you to restrict a dynamically loaded assembly so that it can't perform any actions other than the ones you allow. The code for this task is somewhat lengthy, but it works remarkably well. We'll examine the code you need piece by piece.
All the changes are implemented in the worker application. The goal is to create a way that the worker can identify user-supplied assemblies, and assign them less permissions before executing them. In order to create this design, you'll need to create a custom-evidence class, a membership condition, and a policy level.
First of all, you need to create a serializable Evidence class that will be used to identify assemblies that should be granted lesser permission. This class doesn't require any functionality because it acts as a simple marker.
<Serializable()> _ Public NotInheritable Class SandboxEvidence End Class
Next, you need to create a MembershipCondition class that implements IMembershipCondition. This class is responsible for implementing a Check() method that scans a collection of evidence and returns True, provided it finds an instance of SandboxEvidence. (In other words, the SandboxMembership Condition class checks whether an assembly should be sandboxed.)
The abbreviated code is shown here. It leaves out some of the methods you must include for XML serialization. However, because you don't need to store this membership condition (it is implemented programmatically), these methods simply throw a NotImplementedException.
<Serializable()> _ Public NotInheritable Class SandboxMembershipCondition Implements IMembershipCondition Public Function Check(ByVal ev As Evidence) As Boolean _ Implements IMembershipCondition.Check Dim Evidence As Object For Each Evidence In ev If TypeOf Evidence Is SandboxEvidence Then Return True End If Next Return False End Function ' (Other methods omitted.) End Class
Now you have the required ingredients to create a safe sandbox. The first step is to determine what permissions sandboxed code should be granted. In this case, we'll only allow it the Execute permission. This allows it to perform calculations, allocate memory, and so on, but doesn't allow it to access the file system, a database, or any other system resource.
' Create a permission set with the permissions the dynamically loaded assembly ' should have. Dim SandBoxPerms As New NamedPermissionSet("Sandbox", PermissionState.None) SandBoxPerms.AddPermission(New SecurityPermission(SecurityPermissionFlag.Execution))
Now that you've defined the permissions, you need to create a policy that will apply them. A policy level is essentially a tree of code groups. At runtime, the .NET security infrastructure will examine each code group. When it finds a code group with a membership condition that matches the evidence provided with the assembly, it takes the permission set from the code group and uses it for all the code that executes.
In this case, you need a policy tree with two groups:
A group that matches the SandboxEvidence and grants the limited SandBoxPerms permission set.
A group that matches all other code and grants full privileges.
In addition, you need a root group that contains both these groups and defines a "first match" rule. This organization is shown in Figure 6-7 and Figure 6-8.
The code you need is shown here:
Dim Policy As PolicyLevel = PolicyLevel.CreateAppDomainLevel() Policy.AddNamedPermissionSet(SandBoxPerms) ' The policy collection automatically includes an "everything" and ' a "nothing" permission set. We need to use these. Dim None As NamedPermissionSet = Policy.GetNamedPermissionSet("Nothing") Dim All As NamedPermissionSet = Policy.GetNamedPermissionSet("Everything") Dim SandboxCondition As New SandboxMembershipCondition() Dim AllCondition As New AllMembershipCondition() ' The default group grants nothing. Dim RootCodeGroup As New FirstMatchCodeGroup(AllCondition, _ New PolicyStatement(None)) ' Code with the SandboxEvidence is given execute permission only. Dim SandboxCodeGroup As New UnionCodeGroup(SandboxCondition, _ New PolicyStatement(SandBoxPerms)) ' All other code will be given full permission. Dim AllCodeGroup As New UnionCodeGroup(AllCondition, New PolicyStatement(All)) ' Add these membership conditions. RootCodeGroup.AddChild(SandboxCodeGroup) RootCodeGroup.AddChild(AllCodeGroup) Policy.RootCodeGroup = RootCodeGroup
Finally, you set the policy to the current application domain using the SetAppDomainPolicy() method. This method can only be called once.
' Set this policy into action for the current application. AppDomain.CurrentDomain.SetAppDomainPolicy(Policy)
You can then load the task assembly—but with a twist. When you load it, you'll specify a SandboxEvidence object that will identify the assembly as one that needs to run with reduced permissions.
' Create the evidence that identifies assemblies that should be sandboxed. Dim Evidence As New Evidence() Evidence.AddHost(New SandboxEvidence()) ' Load an assembly from a file. ' We specify the evidence to use as an extra parameter. Dim TaskAssembly As System.Reflection.Assembly TaskAssembly = System.Reflection.Assembly.LoadFrom("PrimeNumberTest.dll", _ Evidence) ' (Instantiate the class as before.)
Note |
You can test this code using the DynamicAssemblyLoad project included with the samples for this chapter. If you add any restricted code to the task processor (for example, an attempt to access the file system), a security exception will be thrown when you execute it. |
The distributed computing example in this chapter relied on a central component to coordinate work. However, this isn't incompatible with the peer-to-peer programming philosophy. That's because a peer in the prime number system can act both as a worker and a task requester. The next step is to allow a peer to play all three roles: worker, requester, and coordinator, for its own tasks.
One way to implement this is to reduce the role of the central component, as you'll see in the third part of this book. For example, you could replace the work manager with a basic discovery server. A peer that wants to request a task would then query the server, which could return a list containing a subset of available workers. The peer would then contact these peers to begin a new task. In this scenario, you would need to use a two-stage commit protocol. First, the peer would contact workers and ask if they were available. If the worker is available, it would respond "yes" and make itself unavailable for any other requests for a brief time period while it waits for an assignment from the requester (possibly five minutes). Next, the requester peer would deliver task segments to all the workers it had reserved. (See Figure 6-9.)
Of course, decentralization has its sacrifices, and a fully decentralized task processor might not be what you want at all. Without a central authority, it's easy for a malicious (or just plain greedy) peer to monopolize network resources. Also, it's difficult to modify the rules for prioritizing tasks and determine how to subdivide them into task segments, because every peer would need to be updated. For those reasons, a hybrid design such as the one developed in this chapter may be the most effective and practical.