You can find the occupancy estimation dataset that we collected as part of our IEEE Globecom paper titled Machine Learning-Based Occupancy Estimation Using Multivariate Sensor Nodes on this blog post.
Abstract
In buildings, a large chunk of energy is spent on heating, ventilation and air conditioning systems. One way to optimize their usage is to make them demand-driven depending on human occupancy. This paper focuses on accurately estimating the number of occupants in a room by leveraging multiple heterogeneous sensor nodes and machine learning models. For this purpose, low-cost and non-intrusive sensors such as CO2 , temperature, illumination, sound and motion were used. The sensor nodes were deployed in a room in a star configuration and measurements were recorded for a period of four days. A regression based method is proposed for calculating the slope of CO2 , a new feature derived from real-time CO2 values. Supervised learning algorithms such as linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), support vector machine (SVM) and random forest (RF) were used on several different combinations of feature sets. Moreover, multiple performance metrics such as accuracy, F1 score and confusion matrix were used to evaluate the performance of our models. Experimental results demonstrate a maximum accuracy of 98.4% and a high F1 score of 0.953 for estimating the number of occupants in the room. Principal component analysis (PCA) was also applied to evaluate the performance of a dataset with reduced dimensionality.