[Defense] Extensible Graph Analytics for Large-scale Data Science
Tuesday, April 5, 2022
11:45 am - 12:45 pm
will defend her proposal
Extensible Graph Analytics for Large-scale Data Science
Graph analytics require specialized storage and algorithms, setting them apart from machine learning. Currently, the best approaches to analyze “big graphs” either work completely in main memory or they require building so-called graph engines. SQL-based solutions are somewhere in between: they are not as comprehensive as memory-based solutions (in terms of breadth of graph problems) and they are competitive with graph engines (slightly slower in some problems). We first propose optimized SQL algorithms to analyze complicated graphs metrics such as triangle, betweenness centrality, and diameter on distributed DBMSs. Then, we develop a general C++ function based on a semiring algorithm. The function can help solve many graph problems. It also works for graphs that cannot fit in the main memory. The function is developed in C++, but it can be easily called in Python. Finally, we explore a fourth, but natural, alternative: studying how to program graph algorithms within the Python ecosystem, but following database system principles. We thereby present a solution inspired by previous research on analyzing graphs with SQL queries. Our solution is based on a general semiring operator, which allows easy programming of several graph algorithms by swapping functions. We study how to optimize our operator as a primitive query, treating Python functions as basic database operators. Even though our solution cannot compete with graph engines to analyze massive graphs, it can be an acceptable solution to analyze graphs in an average computer today, without main memory limitations. Moreover, we expect our solution to become more viable as hardware gets faster/cheaper and Python becomes more popular.
11:45AM - 1:45PM CT
Dr. Carlos Ordonez, dissertation advisor
Faculty, students and the general public are invited.