It is not hard for humans to recognize an indoor environment, but teaching an artificial intelligence (AI) system to distinguish an office from a library is. AI systems are usually trained to use images only, and recognizing a space just by looking at objects can easily go wrong. That is why computer scientist Estefanía Talavera Martínez added a new data modality, audio/sound, to the teaching material that the AI system looks at. This resulted in a high success rate in recognizing indoor spaces, and in a new dataset of real-world videos to use in research. Her work was published in the journal Neural Computing and Applications on 22 January.
Estefanía Talavera Martínez is interested in developing algorithms for the automatic analysis of human behaviour. In previous work, she relied on photo streams gathered by wearable cameras to gain an understanding of people’s daily behaviour. These images were first analysed using AI systems. Doing the same with video is a next step, and one with more applications. ‘This could also be used to help robots find where they are, or to monitor the elderly, for example,’ explains Talavera Martínez. However, this requires an automated system that can identify indoor spaces.