Cost-Effective Development of Custom Wake Word Detection Models for Low-Resource Languages in Embedded Devices
arXiv, 2024
Creating a reliable wake word detection system for custom wake words poses a significant challenge, particularly in low-resource languages where the scarcity of available data sources is a major hurdle. Moreover, collecting an adequately voluminous dataset that includes both positive and negative samples entails substantial financial costs and significant time expenditures. To address this problem, we propose a cost-efficient approach to enrich a small set of collected custom samples. We provide a range of techniques for preprocessing, data augmentation, and noise synthesis to expand the positive samples. In addition, we automatically extracted specifically chosen negative samples from an existing speech dataset. The augmented data is utilized for the training of a neural network-based detector through the utilization of Mycroft Precise. The results demonstrate an improved production-grade performance, which can be vastly used in embedded devices and custom virtual assistants.