Optimizing LLM Inference: Metrics that Matter for Real Time Applications

Sriram  Sagi

doi:10.47363/JAICC/2025(4)446

Authors

Sriram Sagi Duke University, USA. Author

DOI:

https://doi.org/10.47363/JAICC/2025(4)446

Keywords:

Large Language Models (LLMs), Time to First Token (TTFT), End-to-End Latency, Inter Token Latency (ITL), Tokens Per Second (TPS), Requests Per Second (RPS), Inference Benchmarking, Latency Optimization

Abstract

The deployment of Large Language Models (LLMs) including GPT, LLaMA, Claude, and Gemini occurs in real-time applications which require both lowlatency and high-throughput inference. The transition of these models from research to production systems requires essential evaluation of their inference performance. This paper provides a detailed analysis of six performance metrics which are commonly used to evaluate LLM inference: Time to First Token (TTFT), Generation Time, End-to-End Latency (e2e_latency), Inter Token Latency (ITL), Tokens Per Second (TPS), and Requests Per Second (RPS). This paper examines the definitions and practical implications and interrelationships between these metrics through descriptive analysis and empirical observations. Our research demonstrates how responsiveness and throughput create trade-offs while showing that applications need specific metrics for optimization. The research provides practical guidance to researchers and engineers and system architects who want to evaluate or enhance LLM systems
in latency-critical and shared infrastructure environments.

Author Biography

Sriram Sagi, Duke University, USA.

Sriram Sagi, Duke University, USA.

Journal of Artificial Intelligence & Cloud Computing

Optimizing LLM Inference: Metrics that Matter for Real Time Applications

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Similar Articles

Enhancing Ansible Playbooks with Large Language Models:Revolutionizing Automation

Generative AI for Vulnerability Management: A Blueprint

Enhancing CRM Systems with Prompt Engineering: AI-Driven Customer Feedback Intelligence

LLM Framework for Enhanced CI/CD Pipelines for Intelligent DevOps

Towards Responsible AI: Understanding and Mitigating Ethical Concerns of Large Language Models

Do Language Models Know Language?

Integrating Large Language Models with Computer Vision for Enhanced Image Captioning: Combining LLMS with Visual Data to Generate more Accurate and Context-Rich Image Descriptions

Revolutionizing ERP Systems: The Integration of AI and Large Language Models in Manufacturing and Retail

Comparative Analysis of Humans and Large Language Models Decision-Making Abilities: Exploring the Potential ConsiderationsRegarding Utilization of Artificial Intelligence in Decision SupportSystems

Secure Code Completion Models Tuned for Compliance-Heavy Domains