Optimize Price-performance Of LLM Inference On NVIDIA GPUs Using The Amazon SageMaker Integration With NVIDIA NIM Microservices

প্লেটো দ্বারা প্রকাশিত

অনুসরণকারী: 0

এনভিডিয়া তাকে m icroservices now integrate with আমাজন সেজমেকার, allowing you to deploy industry-leading large language models (LLMs) and optimize model performance and cost. You can deploy state-of-the-art LLMs in minutes instead of days using technologies such as NVIDIA TensorRT, NVIDIA TensorRT-LLM, এবং এনভিআইডিএ ট্রাইটন ইনফারেন্স সার্ভার on NVIDIA accelerated instances hosted by SageMaker.

NIM, part of the এনভিডিয়া এআই এন্টারপ্রাইজ software platform listed on AWS মার্কেটপ্লেস, is a set of inference microservices that bring the power of state-of-the-art LLMs to your applications, providing natural language processing (NLP) and understanding capabilities, whether you’re developing chatbots, summarizing documents, or implementing other NLP-powered applications. You can use pre-built NVIDIA containers to host popular LLMs that are optimized for specific NVIDIA GPUs for quick deployment or use NIM tools to create your own containers.

In this post, we provide a high-level introduction to NIM and show how you can use it with SageMaker.

An introduction to NVIDIA NIM

NIM provides optimized and pre-generated engines for a variety of popular models for inference. These microservices support a variety of LLMs, such as Llama 2 (7B, 13B, and 70B), Mistral-7B-Instruct, Mixtral-8x7B, NVIDIA Nemotron-3 22B Persona, and Code Llama 70B, out of the box using pre-built NVIDIA TensorRT engines tailored for specific NVIDIA GPUs for maximum performance and utilization. These models are curated with the optimal hyperparameters for model-hosting performance for deploying applications with ease.

If your model is not in NVIDIA’s set of curated models, NIM offers essential utilities such as the Model Repo Generator, which facilitates the creation of a TensorRT-LLM-accelerated engine and a NIM-format model directory through a straightforward YAML file. Furthermore, an integrated community backend of vLLM provides support for cutting-edge models and emerging features that may not have been seamlessly integrated into the TensorRT-LLM-optimized stack.

In addition to creating optimized LLMs for inference, NIM provides advanced hosting technologies such as optimized scheduling techniques like in-flight batching, which can break down the overall text generation process for an LLM into multiple iterations on the model. With in-flight batching, rather than waiting for the whole batch to finish before moving on to the next set of requests, the NIM runtime immediately evicts finished sequences from the batch. The runtime then begins running new requests while other requests are still in flight, making the best use of your compute instances and GPUs.

Deploying NIM on SageMaker

NIM integrates with SageMaker, allowing you to host your LLMs with performance and cost optimization while benefiting from the capabilities of SageMaker. When you use NIM on SageMaker, you can use capabilities such as scaling out the number of instances to host your model, performing blue/green deployments, and evaluating workloads using shadow testing—all with best-in-class observability and monitoring with অ্যামাজন ক্লাউডওয়াচ.

উপসংহার

Using NIM to deploy optimized LLMs can be a great option for both performance and cost. It also helps make deploying LLMs effortless. In the future, NIM will also allow for Parameter-Efficient Fine-Tuning (PEFT) customization methods like LoRA and P-tuning. NIM also plans to have LLM support by supporting Triton Inference Server, TensorRT-LLM, and vLLM backends.

We encourage you to learn more about NVIDIA microservices and how to deploy your LLMs using SageMaker and try out the benefits available to you. NIM is available as a paid offering as part of the NVIDIA AI Enterprise software subscription available on AWS Marketplace.

In the near future, we will post an in-depth guide for NIM on SageMaker.

লেখক সম্পর্কে

Optimize price-performance of LLM inference on NVIDIA GPUs using the Amazon SageMaker integration with NVIDIA NIM Microservices | Amazon Web Services PlatoBlockchain Data Intelligence. Vertical Search. Ai. জেমস পার্ক অ্যামাজন ওয়েব সার্ভিসের একজন সলিউশন আর্কিটেক্ট। তিনি AWS-এ প্রযুক্তি সমাধান ডিজাইন, নির্মাণ এবং স্থাপন করার জন্য Amazon.com-এর সাথে কাজ করেন এবং AI এবং মেশিন লার্নিং-এ বিশেষ আগ্রহ রয়েছে। অবসর সময়ে তিনি নতুন সংস্কৃতি, নতুন অভিজ্ঞতা খুঁজে পেতে এবং সর্বশেষ প্রযুক্তির প্রবণতাগুলির সাথে আপ টু ডেট থাকতে উপভোগ করেন৷ আপনি তাকে এখানে খুঁজে পেতে পারেন৷ লিঙ্কডইন.

সৌরভ ত্রিকন্দে অ্যামাজন সেজমেকার ইনফারেন্সের একজন সিনিয়র প্রোডাক্ট ম্যানেজার। তিনি গ্রাহকদের সাথে কাজ করার জন্য উত্সাহী এবং মেশিন লার্নিংকে গণতান্ত্রিক করার লক্ষ্য দ্বারা অনুপ্রাণিত৷ তিনি জটিল এমএল অ্যাপ্লিকেশন স্থাপন, মাল্টি-টেন্যান্ট এমএল মডেল, খরচ অপ্টিমাইজেশান, এবং গভীর শিক্ষার মডেলগুলিকে আরও অ্যাক্সেসযোগ্য করে তোলার সাথে সম্পর্কিত মূল চ্যালেঞ্জগুলিতে মনোনিবেশ করেন। অবসর সময়ে, সৌরভ হাইকিং, উদ্ভাবনী প্রযুক্তি সম্পর্কে শেখা, টেকক্রাঞ্চ অনুসরণ করা এবং তার পরিবারের সাথে সময় কাটানো উপভোগ করেন।

Optimize price-performance of LLM inference on NVIDIA GPUs using the Amazon SageMaker integration with NVIDIA NIM Microservices | Amazon Web Services PlatoBlockchain Data Intelligence. Vertical Search. Ai. কিং ল্যান AWS-এর একজন সফটওয়্যার ডেভেলপমেন্ট ইঞ্জিনিয়ার। তিনি অ্যামাজনে বেশ কিছু চ্যালেঞ্জিং প্রোডাক্ট নিয়ে কাজ করছেন, যার মধ্যে রয়েছে হাই পারফরম্যান্স এমএল ইনফারেন্স সলিউশন এবং হাই পারফরম্যান্স লগিং সিস্টেম। Qing-এর দল সফলভাবে অ্যামাজন বিজ্ঞাপনে প্রথম বিলিয়ন-প্যারামিটার মডেল লঞ্চ করেছে খুব কম বিলম্বের প্রয়োজনে। কিং এর অবকাঠামো অপ্টিমাইজেশান এবং গভীর শিক্ষার ত্বরণ সম্পর্কে গভীর জ্ঞান রয়েছে।

নিখিল কুলকার্নি AWS মেশিন লার্নিং সহ একজন সফ্টওয়্যার ডেভেলপার, ক্লাউডে মেশিন লার্নিং ওয়ার্কলোডগুলিকে আরও পারফরম্যান্স করার দিকে মনোনিবেশ করে, এবং প্রশিক্ষণ এবং অনুমানের জন্য AWS ডিপ লার্নিং কন্টেনারগুলির সহ-নির্মাতা৷ তিনি ডিস্ট্রিবিউটেড ডিপ লার্নিং সিস্টেম সম্পর্কে উত্সাহী। কাজের বাইরে, তিনি বই পড়া, গিটারের সাথে বাজাতে এবং পিৎজা তৈরি করতে উপভোগ করেন।

Optimize price-performance of LLM inference on NVIDIA GPUs using the Amazon SageMaker integration with NVIDIA NIM Microservices | Amazon Web Services PlatoBlockchain Data Intelligence. Vertical Search. Ai. হরিশ তোমাল্লাছেল সেজমেকারে ডিপ লার্নিং পারফরম্যান্স টিমের সাথে সফটওয়্যার ইঞ্জিনিয়ার। সেজমেকারে দক্ষতার সাথে বড় ভাষার মডেল পরিবেশন করার জন্য তিনি পারফরম্যান্স ইঞ্জিনিয়ারিংয়ে কাজ করেন। তার অবসর সময়ে, তিনি দৌড়ানো, সাইকেল চালানো এবং স্কি পর্বতারোহণ উপভোগ করেন।

এলিউথ ট্রায়ানা ইসাজা NVIDIA-এর একজন ডেভেলপার রিলেশনস ম্যানেজার যিনি Amazon-এর AI MLOps, DevOps, বিজ্ঞানী এবং AWS কারিগরি বিশেষজ্ঞদেরকে ক্ষমতায়ন করে NVIDIA কম্পিউটিং স্ট্যাককে আয়ত্ত করার জন্য ডেটা কিউরেশন, GPU ট্রেনিং, মডেল ইনফারেন্স এবং GPUSstan-এ GPUSstan-এ প্রোডাকশন ডিপ্লয়মেন্ট থেকে বিস্তৃত জেনারেটিভ AI ফাউন্ডেশন মডেলগুলিকে ত্বরান্বিত ও অপ্টিমাইজ করার জন্য . এছাড়াও, এলিউথ একজন উত্সাহী পর্বত বাইকার, স্কিয়ার, টেনিস এবং জুজু খেলোয়াড়।

জিয়াহং লিউ NVIDIA-এর ক্লাউড পরিষেবা প্রদানকারী দলের একজন সমাধান স্থপতি। তিনি ক্লায়েন্টদের মেশিন লার্নিং এবং এআই সমাধান গ্রহণে সহায়তা করেন যা তাদের প্রশিক্ষণ এবং অনুমান চ্যালেঞ্জ মোকাবেলায় NVIDIA ত্বরিত কম্পিউটিংকে সুবিধা দেয়। অবসর সময়ে, তিনি অরিগামি, DIY প্রকল্প এবং বাস্কেটবল খেলা উপভোগ করেন।

ক্ষিতিজ গুপ্ত NVIDIA-এর একজন সলিউশন আর্কিটেক্ট। তিনি ক্লাউড গ্রাহকদের জিপিইউ এআই প্রযুক্তি সম্পর্কে শিক্ষিত করতে উপভোগ করেন NVIDIA-এর অফার করা এবং তাদের মেশিন লার্নিং এবং গভীর শিক্ষার অ্যাপ্লিকেশনগুলিকে ত্বরান্বিত করতে সহায়তা করা। কাজের বাইরে, তিনি দৌড়ানো, হাইকিং এবং বন্যপ্রাণী দেখা উপভোগ করেন।

এসইও চালিত বিষয়বস্তু এবং পিআর বিতরণ। আজই পরিবর্ধিত পান।
PlatoData.Network উল্লম্ব জেনারেটিভ Ai. নিজেকে ক্ষমতায়িত করুন। এখানে প্রবেশ করুন.
প্লেটোএআইস্ট্রিম। Web3 ইন্টেলিজেন্স। জ্ঞান প্রসারিত. এখানে প্রবেশ করুন.
প্লেটোইএসজি। কার্বন, ক্লিনটেক, শক্তি, পরিবেশ সৌর, বর্জ্য ব্যবস্থাপনা. এখানে প্রবেশ করুন.
প্লেটো হেলথ। বায়োটেক এবং ক্লিনিক্যাল ট্রায়াল ইন্টেলিজেন্স। এখানে প্রবেশ করুন.
উত্স: https://aws.amazon.com/blogs/machine-learning/optimize-price-performance-of-llm-inference-on-nvidia-gpus-using-the-amazon-sagemaker-integration-with-nvidia-nim-microservices/

সময় স্ট্যাম্প: মার্চ 18, 2024

সময় স্ট্যাম্প: জুন 15, 2022

প্লেটো দ্বারা প্রকাশিত

অ্যামাজন ট্রান্সক্রাইব, অ্যামাজন ট্রান্সলেট এবং অ্যামাজন পলির সাথে ভাষার বাধাগুলি ভেঙে দিন

Amazon SageMaker এবং AWS SSO এর সাথে দল এবং ব্যবহারকারী ব্যবস্থাপনা

Amazon SageMaker Data Wrangler-এ PySpark এবং Altair কোড স্নিপেট দিয়ে দ্রুত ডেটা প্রস্তুত করুন

আমাদের সম্পর্কে

উল্লম্ব অনুসন্ধান এবং আই

প্ল্যাটফর্ম

যোগাযোগ রেখো

হিসাব