Calendar - University of Houston
Skip to main content

[Defense] Parallel I/O on Compressed Data Files

Tuesday, December 8, 2020

1:00 pm - 2:30 pm

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Siddhesh Singh
will defend his proposal
Parallel I/O on Compressed Data Files


Advancements in computer hardware continue to provide improved performance. However, improvements in I/O devices have been more limited compared to other components. This poses a challenge to achieve peak performance in modern computer architectures. Since parallel distributed computing applications may operate on large files which are hosted on a remote file system, the future of High Performance Computing remains bottle-necked by I/O. Instead of relying exclusively on hardware advancements, it is also possible to improve I/O performance through a software approach. One way to increase I/O throughput is by decreasing the amount of data to be read/written, which can be done through data compression. This thesis focuses on investigating the application and benefits of data compression on parallel file I/O.

Parallel I/O on compressed data files introduces many unique challenges not encountered with uncompressed parallel I/O or compressed serial I/O. There are many parallel programming frameworks which use compression with I/O but they often rely on a more limited set of operations with compressed files. As such, there is no generalized framework for parallel I/O on compressed data files. This prevents logical abstractions relating to parallel I/O, such as collective I/O and file views, from being used with compression.

This dissertation explores the challenges posed by using compressed data files in parallel I/O. It first defines the semantics of parallel compressed I/O, explores necessary properties that compression formats must have to be compatible with parallel I/O and the semantics of I/O operations on such formats. Google’s Snappy compression format is used to demonstrate the applicability of a file format matching these requirements by integrating Snappy compression into Open MPI’s parallel I/O mechanism.

 Tuesday, December 8, 2020
1:00PM - 2:30PM CT
Online via MS Teams (click link)

Dr. Edgar Gabriel, dissertation advisor

Faculty, students and the general public are invited.

Online via MS Teams