Data Intensive Systems

Data, like any other valuable object needs to be curated before actually been used. That means put its pieces together, get it cleaned, study it.

Data has been coined “The oil of the 21st century”. Businesses and organizations have realized that in order to thrive in the data driven economy, have to adopt modern data management solutions that  will allow them to innovate and generate high-quality added value services. 

Before any data can be leveraged by data analytics to generate insights, it has to be first prepared, understood, and curated to maintain its value. New York Times has reported that such tasks may take up to 80% of a data scientist time.

The Data Intensive Systems Group aims at supporting the users of tomorrow, and particularly data scientists, to: (i) integrate a multitude of highly heterogeneous and independently developed data sources, (ii) analyze and understand data, even with complex,  unknown or non-traditional structures,  (iii) eliminate data quality issues, and (iv) extract and manage knowledge in a systematic way. All these, with the intention to be performed  in ways that are less laborious, less time-consuming, and less error prone.

The group studies new paradigms of user interaction with Big Data and develops algorithms and systems  that exploit state-of-the-art technologies to cope with the large-scale and intensive processing that modern massive datasets require. The expertise and research  revolves around (but is not limited to) the following areas:

  • Data Preparation & Curation (Data Discovery, Heterogeneous Information Integration, Entity Linkage, Data Cleaning, Data Quality, Data Preservation)
  • Data Understanding (Data Exploration, Metadata Management, Big Data Profiling)
  • Information Extraction from Heterogeneous Big Data Repositories (Keyword Search, Search through Examples)
  • Graph Management (Labeled Graphs, Ontologies) 
  • Knowledge Management (Knowledge Extraction, Reasoning, Knowledge Graph Management, Semantic Web Data)
  • Evolving Data (Streams, Time Series, Anomaly Detection, Evolving Graphs, Temporal Knowledge graphs)

The generated solutions find applications in many different domains, ranging from Science, Engineering, Retailing and Finance to Healthcare, Telecommunication, Education, and Transportation.

The Data Intensive Systems is one of the research groups of the Algorithms division within the department of Information and Computing sciences.

Head of the Group