Miao, dr. S. (Shengfa)


Presence: Monday, Tuesday Wednesday, Thursday
Room: Cubicus B112a
Phone number: +31 53 489 6611; +31 53 489 4470 (secr.)


Shengfa Miao obtained his bachelor and master degrees in Computer Software and Theory from Lanzhou University, China, and got his PhD degree in Data Mining from Leiden University in December 2014. The title of his PhD thesis is "Structural Health Monitoring meets Data Mining". Before joining the Story Lab in Twente University, he did two years’ postdoc research on text mining and insurance data modeling in Leiden University. His research interests lie in the areas of Data Mining, Machine Learning, Natural Language Processing, Structural Health Monitoring and so on.


2010.11 - 2014.12 Ph.D. candidate, Data mining, Leiden University
2009.09 - 2010.10 Ph.D. candidate, Signal processing, Lanzhou University
2007.09 - 2009.07 MSc, Computer Software and Theory, Lanzhou University
2001.09 - 2005.06 Bachelor, Computer Science & Technology, Lanzhou University

Research projects

COSPREMO: Advanced consumer scoring by nonlinear predictive modeling
Duration: 2015.09 – present
Purpose: The project aims at developing improved scoring models by using state-of-the-art data mining and optimization algorithms and a considerably extended set of consumer features, in order to increase the competitive advantage of financial services companies.

Contributions and skills:

  • Consumer scoring system:
    • Function: The system is composed of the following steps: data exploration and pre-processing; feature selection; model evaluation and comparison; result visualization.
    • Skills: Python; Ipython; R; Spark; Generalized linear models; Random forest; Support vector machines; Multivariable regression; Principal component analysis; Linear discriminant analysis.

Digging into Data: Automating Data Extraction from Chinese text
Duration: 2014.09 – present
Purpose: We aim to develop means of transforming texts written in classical Chinese into highly structured data, and provide researchers with intelligent entity extraction solution.

Contributions and skills:

  • Developed an Openid Login system:
    • Function: Users can register with their openids, such as Facebook, Linkedin, and Twitter account; users can also register a local account; uploading files to server, deleting files from server
    • Skills: Note.js; JavaScript; MongoDB; Passport; Html
  • Text mining module:
    • Function: Html file parser; entity extraction; word segmentation; pattern selection; regular expression transformation and matching; Bayesian classifier;
    • Skills: Python; JavaScript; Regular expression; BeautifulSoup; Bayesian network; Hadoop; Conditional random field; Hidden Markov model;

InfraWatch: Data Management for Monitor Infrastructural Performance
Duration: 2010.11 - 2014.08
Purpose: We aim to monitor and evaluate the health of a Dutch highway bridge, through extracting damage sensitive features from terabyte datasets collected with a sensor network.

Contributions and skills:

  • Dependency analysis:
    • Function: Detecting dependencies among time series of multiple scales; finding interesting subgroups.
    • Skills: Matlab; Signal processing; Subgroup discovery
  • Pattern and trend detection:
    • Function: Detecting patterns of multiple scales from time series
    • Skills: Matlab; Hadoop; Big data processing; Time series representation
  • Modal analysis:
    • Function: Extracting modal parameters (features) from time series; modeling the relationship between environmental factors and modal parameters.
    • Skills: Stochastic subspace identification; Autoregressive and moving average model.

Alumni Resource management platform
Duration: 2007.09 – 2009.06
Purpose: The project aims to develop a decision support system, by making good use of historical data and dynamic alumni information on the Internet.

Contributions and skills:

  • ETL steps:
    • Function: Extracting, transforming and loading historical data. The historical data are of various sources and different qualities.
    • Skills: Missing value checking; Data cleaning; Data merging.
  • Name information acquisition system:
    • Function: The system is used to collect the latest alumni information from the Internet.
    • Skills: Data warehouse; Search engine; Webpage cleaning; Information retrieval; word segmentation.
  • Data analysis:
    • Function: Employment trend and distribution analysis; Student origin analysis.
    • Skills: Association analysis; Linear regression; OLAP

The Study about Meteorological Spatial Data Mining and Forecast under High Performance Computing Environment
Duration: 2007.01 - 2008.05
Purpose: By cooperating with the meteorological bureau of Gansu province, we aim to detect rules from huge amount of meteorological datasets, and based on which to predict the weather.

Contributions and skills:

  • Meteorological data processing platform:
    • Function: Meteorological data quality control; High performance computing environment establishment, OLAP analysis.
    • Skills: MySQL; Javascript; ETL operation
  • Near Meteorological forecast:
    • Function: Building models to predict weather within 24 hours.
    • Skills: Statistical models; Parallel computing

Geef aan welke colleges je verzorgt (2007 / 2008), met vakcode2015.09 – Researcher, LIACS, Leiden University

2014.09 – 2015.08 Researcher associate, LIAS, Leiden University
2008.01 - 2009.08 Software engineer, Graduate School, Lanzhou University
2005.09 – 2008.12 Manager, co-founder, Eagle software studio
2005.07 - 2009.08 Teaching assistant, Computer Science and Technology, Lanzhou University


  • Data scientist in SuperGraph
  • Visiting researcher in Leiden Univeristy


  1. Miao S., Vespier U., Meeng M., Cachucho R., and Knobbe A. Predefined pattern detection in large time series. Information Sciences, 329, pages 950-946, 2016.
  2. Miao S., Koenders E., and Knobbe A. Automatic baseline correction of strain gauge signals. Journal of Structural Control and Health Monitoring, 22(1), pages 36-49, 2015.
  3. Vanschoren J., Vespier U., Miao S., Meeng M., Cachucho R., and A. Knobbe. Large-scale sensor network analysis: applications in structural health monitoring. In Big data management, technologies, and applications, pages 314-348, 2013.
  4. Miao S. Structural health monitoring meets data mining. PhD thesis, Leiden University, 2014.
  5. Miao S., Vespier U., Vanschoren J., Knobbe A., and Cachucho R. Modeling sensor dependencies between multiple sensor types. In Proceedings of BeneLearn, 2013.
  6. Miao S. The Application and Research of Data Warehouse and Search Engine in the Management of Alumni Resources. Master thesis, Lanzhou University, 2009.
  7. Chen X., Miao S., and Wang B. Design and implement of Chinese name search engine based on multidimensional data model. In Sciencepaper Online, Aug 2008.
  8. Chen X., He Y., Chen P., Miao S., Song W., and Yue M. HPFP-miner: A novel parallel frequent itemset mining algorithm. In Natural Computation (ICNC ’09), 
volume 3, pages 139–143, 2009. 

  9. Miao S., Knobbe A., Koenders E., and Bosma C. Analysis of traffic effects on a Dutch highway bridge. In Proceedings of IABSE, 2013.
  10. Miao S., Veerman R., Koenders E., and Knobbe A. Modal analysis of a concrete highway bridge-structure calculations and vibration-based results. In Proceedings of SHMII, 2013.
  11. Veerman R., Miao S., Koenders E., and Knobbe A. Data intensive structural health monitoring in the InfaWatch project, In Proceedings of SHMII, 2013.
  12. Vespier U, Knobbe A, Vanschoren J, Miao S., Koopman A., Obladen B., and Bosma C. Traffic events modeling for structural health monitoring. In Proceedings of IDA, pages 376-387, 2011.
  13. Chen X., Yao Y., Liu G., Su Y., Chen Y., and Miao S. IBTA: An IBT-Tree based algorithm for RFID anti-collision. In Information Technology and Applications (IFITA), volume 2, pages 407–410, 2010. 

  14. Chen X., Liu G., Yao Y., Chen Y., Miao S., and Su Y. IRBST: an improved RFID anti-collision algorithm based on regressive-style binary search tree. In Information Technology and Applications (IFITA), volume 2, pages 403–406, 2010.