In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
Will defend his thesis
To date, idle desktop computers, volunteered by the general public to create volunteer computing environments, have been limited to sequential scientific computing. The objective of this research is to convert idle desktop computers into virtual cluster nodes for executing parallel scientific applications. The dissertation introduces VolpexMPI that is designed to enable seamless forward application progress in the presence of frequent node failures as well as dynamically changing networks speeds and node execution speeds. Process replication is employed to provide robustness in such volatile environments. The central challenge in VolpexMPI design is to efficiently and automatically manage dynamically varying number of process replicas in different states of execution progress. The key fault tolerance technique employed is fully distributed sender based logging. The dissertation presents the design and performance of two architectures of VolpexMPI. One architecture implements asynchronous message passing with non-blocking sockets in C, with an emphasis on performance. The other architecture utilizes Python and threaded TCP services with the goal of portability between heterogeneous desktop computers. These implementations are tested by executing parallel benchmarks on dedicated clusters as well as virtualized clusters and pools of clusters managed by Condor. The C architecture results validate that the overhead of providing process replication is modest for parallel applications, having a favorable ratio of communication to computation and a low degree of communication. However, the Python architecture results show significant performance degradation when executing parallel scientific applications, even though the development process was remarkably easier.