In Partial Fulfillment of the Requirements for the Degree of
Master of Science
Will defend his thesis
Data mining refers to discovering knowledge from large amounts of data and it has wide applications in business and scientific world. Many data mining algorithms require input data in a tabular format called data set. In relational database environment, this data is extracted from existing OLTP or OLAP databases and transformed into the format required by data sets. This process is costly because there exists a gap between the source databases and the data sets. This gap is because of normalized input databases, unconventional data requirements of data sets and lack of reuse of the existing components etc. Moreover the semantic meaning of the temporary tables created during data transformation is not clear in data modeling context. In this work we propose a solution to this problem. We present a data model to capture the transformations of data. First, we elaborate the issues involved, then we define the data set as an entity having special properties in ER-Relational modeling paradigm. We note that the relational operations and derivations of attributes can be captured in data modeling paradigm. To represent the transformations, we present extensions to ER-Relational model and propose a method to design the data model.