Management of Sensor Data: Focusing Reproducibility of Results
Description of research
Sensors have become very common in our day-to-day lives and are used in many applications. Sensor data includes both streaming and sampling data. Streaming data is append only where as sampling data might be updated over time.
Sensors are one of the major sources of data in most e-science applications. Sensor data are acquired and processed to higher level events used in applications for decision making and process control. To understand the semantics of the event and to justify the correctness of generated events, the explanation about the origin of processed data is important in e-science applications. Most significantly, researchers often need to reproduce the previous results in e-science applications. Thus, reproducibility of results is an important requirement for e-science applications.
Our research addresses management of streaming and sampling sensor data especially focusing on the reproducibility of results. Data provenance defines the origin and movement of data within databases. Fine-grained or tuple-based provenance data refers to original tuple which allows us to retrieve past database state. But maintaining fine-grained provenance data requires enormous disk space which makes it expensive. Furthermore, the delayed and out-of-order arrival of data tuples from different streams may cause troubles to execute a particular query like join or aggregate because tuples are correlated based on the time on which they originate.
Other problem may arise due to the massive amount of streaming data. Since streaming data is append only and reproducibility is our prime concern, we cannot discard any tuples. Therefore, the size of database is always increasing over time. Lastly, achieving reproducibility in decentralized scenario may be another challenging task because of the data replication and execution of other identical operations between databases. Based on these problems, we formulate our research questions and these are:
|
How to minimize storage requirement of fine-grained provenance data to have reproducible results? |
|
How to coordinate different data streams in order to optimize processing of results? |
|
How to optimize storage space requirement of data tuples within a workflow without sacrificing reproducibility? |
|
How to keep global database state consistent in distributed scenario to have reproducibility? |
Our research goal is to make a framework which can handle both streaming and sampling sensor data to obtain reproducible results. Our research will facilitate the researchers in e-science applications by providing a means to have reproducibility.
Advisor(s)
Duration
June 2009 to May 2013
Project
Management of Sensor Data: Focusing Reproducibility of Results
Funding institution
CTIT (Centre for Telematics and Information Technology)
Strategic Research Orientation
ASSIST - Applied Science of Services for Information Society Technologies
Links to relevant web pages:
http://www.sensordatalab.org/wiki/index.php5/Main_Page
http://wwwhome.cs.utwente.nl/~wombachera/
http://wwwhome.cs.utwente.nl/~huq/
