Optimizing AI Model Inference Performance with Dynamic Profiling

Ankush Jitendrakumar Tyagi

doi:10.30574/ijsra.2025.16.1.2066

Ankush Jitendrakumar Tyagi ^*

University of Texas at Arlington, Texas, USA.

Review Article

International Journal of Science and Research Archive, 2025, 16(01), 2266-2275

Article DOI: 10.30574/ijsra.2025.16.1.2066

DOI url: https://doi.org/10.30574/ijsra.2025.16.1.2066

Publication history

Received on 31 May 2025; revised on 18 July 2025; accepted on 27 July 2025

Abstract

Deep neural networks and Artificial Intelligence (AI) models have shown great success in areas that include computer vision, natural language processing, and autonomous systems. Yet, their application in real-world tasks is typically limited by inference performance drawbacks, in particular, when the specialized cutting-edge devices are needed to complete such tasks in real time and on resource-constrained devices. The main issue with the requirement to scale, efficient, and responsive AI systems is the key attention paid to the inference performance optimisation. Dynamic profiling, or the process of analysing AI models and system performance in real-time as they execute, has become a critical technique not only as a means to detect locations where performance is impeded but to inform the process of performance optimisation at runtime. In contrast to static profiling which performs an analysis before execution of specific and prepared traces of the executable (static profiling uses the pre-execution information about the program to perform an analysis of it), dynamic profiling allows a more dynamic and fine-grained inspection of problems like inefficiencies in memory access, imbalances in compute utilisation, layer-resolution latency, and power consumption. Dynamic performance tracing, profiling, and tools and frameworks such as TensorRT, Intel VTune, NVIDIA Nsight, and PyTorch Profiler enjoy wide support across the diverse hardware platforms, including CPU, GPU, and edge accelerator, with full support across platforms. These tools can offer useful information to guide fine-grained optimisations like operator fusion, quantisation, memory and computation scheduling, and replication strategies. Notably, with the seamless coupling of dynamic profiling to automated deployment pipelines, AI systems can dynamically optimise themselves at runtime and respond well to variations in workloads and system constraints. It helps achieve intelligent self-optimising AI applications that are also able to be kept at production level performance. As dynamic profiling is integrated into the AI model lifecycle, it allows continuous performance tracking and a sign-and-iterate cycle, hence facilitating the delivery of scalable, energy-efficient, and high-throughput AI approaches at scale in the cloud as well as at the edge. This paper will demonstrate that dynamic profiling is a very important technique to overcome the performance issues and drive the best future of AI deployment.

Keywords

Dynamic Profiling; Inference Optimisation; Real-Time Performance; Edge AI; Profiling Frameworks; Adaptive Deployment

Download Article PDF

https://journalijsra.com/sites/default/files/fulltext_pdf/IJSRA-2025-2066.pdf

Preview Article PDF

How to cite this article

Ankush Jitendrakumar Tyagi. Optimizing AI Model Inference Performance with Dynamic Profiling. International Journal of Science and Research Archive, 2025, 16(01), 2266-2275. Article DOI: https://doi.org/10.30574/ijsra.2025.16.1.2066.

Copyright information

Optimizing AI Model Inference Performance with Dynamic Profiling

Ankush Jitendrakumar Tyagi *

Preview Article PDF

Ankush Jitendrakumar Tyagi ^*