Optimizing Vector Embedding Storage and Indexing for AI at Scale

Suvendu Mohantyz  Mohantyz

doi:10.47363/JAICC/ICMLAIDS2026/2026(5)1

Authors

Suvendu Mohantyz Senior ML Engineer at Amazon, Virginia, USA Author

DOI:

https://doi.org/10.47363/JAICC/ICMLAIDS2026/2026(5)1

Keywords:

AI , Embedding Storage

Abstract

As large-scale AI systems continue to evolve, the demand for efficient storage and retrieval of dense vector embeddings has become a critical challenge for both training and inference. Vector databases such as FAISS
and Milvus enable high-performance similarity search, but the underlying infrastructure costs—spanning GPU utilization, memory bandwidth, and SSD/NVMe storage—are escalating rapidly. This talk explores emerging
strategies for optimizing embedding storage and indexing to balance cost efficiency with low-latency retrieval, a key requirement for production-scale AI applications.

We begin by highlighting advances in quantization, product quantization (PQ), and hybrid compression techniques, which significantly reduce embedding footprint without degrading model accuracy. We also discuss
adaptive precision storage, where embeddings dynamically shift between low-precision and high-precision formats depending on workload criticality. Beyond compression, indexing innovations such as hierarchical
navigable small-world graphs (HNSW), disk-aware indexing, and tiered memory hierarchies are pushing the boundaries of scalability by leveraging GPU-accelerated search and SSD-based caching.

Author Biography

Suvendu Mohantyz , Senior ML Engineer at Amazon, Virginia, USA

Suvendu Mohantyz, Senior ML Engineer at Amazon, Virginia, USA

Journal of Artificial Intelligence & Cloud Computing

Optimizing Vector Embedding Storage and Indexing for AI at Scale

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Similar Articles

Data Migration at Scale for Distributed Systems: Hot and Cold Migration (HCM)

Enhancing Virtual Desktop Infrastructure with Hybrid Cloud Environments and GPU Acceleration

Deploying and Managing Containerized Data Workloads on AmazonEKS

Designing Distributed Artifact Ingestion Platform for Analytics & Machine Learning

Compliance Considerations in Cyber Incident Response

Portable Cloud-Based Data Storage Security Using Dual Encryption

Azure Functions in Payment Gateways: A Serverless Approach to Financial Systems

Building Knowledge Graph for Your Collectible Cars and its Data

Mathematical Algorithm-Based Intrusion Detection for Resilient Cloud VMs

Benchmarking Network Performance in Smart Cities