University of Texas at Arlington, Texas, USA.
International Journal of Science and Research Archive, 2025, 16(01), 2266-2275
Article DOI: 10.30574/ijsra.2025.16.1.2066
Received on 31 May 2025; revised on 18 July 2025; accepted on 27 July 2025
Deep neural networks and Artificial Intelligence (AI) models have shown great success in areas that include computer vision, natural language processing, and autonomous systems. Yet, their application in real-world tasks is typically limited by inference performance drawbacks, in particular, when the specialized cutting-edge devices are needed to complete such tasks in real time and on resource-constrained devices. The main issue with the requirement to scale, efficient, and responsive AI systems is the key attention paid to the inference performance optimisation. Dynamic profiling, or the process of analysing AI models and system performance in real-time as they execute, has become a critical technique not only as a means to detect locations where performance is impeded but to inform the process of performance optimisation at runtime. In contrast to static profiling which performs an analysis before execution of specific and prepared traces of the executable (static profiling uses the pre-execution information about the program to perform an analysis of it), dynamic profiling allows a more dynamic and fine-grained inspection of problems like inefficiencies in memory access, imbalances in compute utilisation, layer-resolution latency, and power consumption. Dynamic performance tracing, profiling, and tools and frameworks such as TensorRT, Intel VTune, NVIDIA Nsight, and PyTorch Profiler enjoy wide support across the diverse hardware platforms, including CPU, GPU, and edge accelerator, with full support across platforms. These tools can offer useful information to guide fine-grained optimisations like operator fusion, quantisation, memory and computation scheduling, and replication strategies. Notably, with the seamless coupling of dynamic profiling to automated deployment pipelines, AI systems can dynamically optimise themselves at runtime and respond well to variations in workloads and system constraints. It helps achieve intelligent self-optimising AI applications that are also able to be kept at production level performance. As dynamic profiling is integrated into the AI model lifecycle, it allows continuous performance tracking and a sign-and-iterate cycle, hence facilitating the delivery of scalable, energy-efficient, and high-throughput AI approaches at scale in the cloud as well as at the edge. This paper will demonstrate that dynamic profiling is a very important technique to overcome the performance issues and drive the best future of AI deployment.
Dynamic Profiling; Inference Optimisation; Real-Time Performance; Edge AI; Profiling Frameworks; Adaptive Deployment
Preview Article PDF
Ankush Jitendrakumar Tyagi. Optimizing AI Model Inference Performance with Dynamic Profiling. International Journal of Science and Research Archive, 2025, 16(01), 2266-2275. Article DOI: https://doi.org/10.30574/ijsra.2025.16.1.2066.
Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0







