Application Optimizing AI Performance on Edge Devices: A Comprehensive Approach using Model Compression, Federated Learning, and Distributed Inference
Abstract
One major problem arises when AI models are run on edge devices because these have limited processing power, battery, and time constraints. This article explores methods to improve the performance of AI models in such settings so that they operate optimally and simultaneously and provide fast and accurate results. Some methods include model compression techniques such as pruning and quantizing, which make the model small sized to make the required computations with low energy utilization and knowledge distillation. Moreover, a special concern is checking the possibility of using federated learning as one of the ways of training AI models on devices spread across a distributed network while maintaining users’ privacy and avoiding the need to transfer the data to the central server. Another approach, distributed inference, in which the computations are suitably divided between different devices, is also investigated to enhance system performance and reduce latency. The use of these techniques is described in terms of the limited capabilities inherent to devices like smartphones, IoT sensors, and autonomous systems. In this work, efforts have been made to improve the inference and model deployment in edge AI systems, which is instrumental in enhancing the end user experience and smart energy usage by bringing sophisticated scale out edge-computing solutions closer to reality through application optimized edge AI models and frameworks.
Copyright (c) 2024 Venkata Mohit Tamanampudi
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright © by the authors; licensee Research Lake International Inc., Canada. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution Non-Commercial License (CC BY-NC) (http://creative-commons.org/licenses/by-nc/4.0/).