EECS500 Spring 2013 Department Seminar

Kemafor Anyanwu
Scalable Querying of Semantic Web Data Models : Challenges and Opportunities
North Carolina State University
Wolstein Research Bldg, Room 6136
11:30am - 12:30pm
March 12, 2013

Recent advancements in Semantic Web publishing technologies are ushering in the era of "big Semantic Web data". We now have a rapidly growing number of publicly available Semantic Web datasets spanning a variety of domains. Attempting to harness the collective knowledge represented in combinations of these datasets, quickly gives rise to a "big and heterogeneous data" problem.

While there is now flurry of research activities on "big data", many of them focus on data that is either unstructured or structured. Unfortunately, the differences between Semantic Web and relational data models leads to limitations when relational query optimization techniques are naively adopted for Semantic Web data processing. This is due to the fact that key underlying assumptions made by relational optimization techniques e.g. containment of value sets assumption used in cost-based join optimization models, do not carry over to Semantic Web data models. The impact of this misalignment becomes much more noticeable when processing at big data scales. Semantic Web data is semi(graph)-structured compared to relational data that is very structured. Semantic Web data often contains data expressed using multiple vocabularies or schemas and information represented beyond that which is explicitly stated. The implication here is the need for inferencing as part of query processing to reconcile terminological differences across schemas and to account for implicitly represented information. One major consequence of these issues is that Semantic Web workloads are typically much more join-intensive than are adequately supported by current big data techniques. Further, emerging computational platforms e.g. Google's MapReduce, for leveraging resources in the cloud to achieve scalability on big data, add additional challenges when mapping such Semantic Web data processing workloads to cloud computing programming models.

Finally, many of the emerging Semantic Web consumer applications in domains like science research, business and government domains have data interrogation and analysis needs beyond what is supported by the traditional pattern matching query model e.g., the need for more graph-oriented query primitives, queries about pareto optimal subgraphs given a set of constraints, etc. All these issues suggest that additional thinking about newer classes of query models and appropriate optimization techniques for scalable Semantic Web data processing is necessary.

In this talk, I will present an overview of the efforts being undertaken by the Semantic Computing Research Lab at North Carolina State University to address some of these challenges and highlight some open research opportunities.


Kemafor Anyanwu is an Assistant Professor of Computer Science and director of the Semantic Computing Research Lab at North Carolina State University. She received a Ph.D. in Computer Science from the University of Georgia in 2007. Her research interests include Semantic Web data management, data analytics and mining, and their applications. The two themes of her research activities revolve around developing optimization techniques for large scale Semantic Web data processing and developing query primitives and languages for supporting more sophisticated querying on the Semantic Web. She has served on program committees of different tracks of conferences such as ISWC, ICDE, ICSC and was on the organizational committee for WWW2010 held in Raleigh. She reviews for journals such as TKDE, IJSWIS and has been a guest editor for IJSWIS. Her work with her student received the best paper award in JIST 2012. Her work is funded by grants from the NSF and industry awards like the IBM Faculty awards.