将 Amazon SageMaker Autopilot 模型部署到无服务器推理终端节点

由柏拉图重新发布

关注： 0

亚马逊SageMaker自动驾驶仪根据您的数据自动构建、训练和调整最佳机器学习 (ML) 模型，同时让您保持完全控制和可见性。 Autopilot 还可以将经过训练的模型自动部署到实时推理端点。

如果您的工作负载具有可以容忍冷启动的尖峰或不可预测的流量模式，那么将模型部署到无服务器推理端点将更具成本效益。

Amazon SageMaker 无服务器推理是一个专门构建的推理选项，非常适合具有不可预测的流量模式并且可以容忍冷启动的工作负载。与由长期运行的计算实例支持的实时推理端点不同，无服务器端点通过内置的自动缩放按需配置资源。无服务器端点根据传入请求的数量自动扩展，并在没有传入请求时将资源缩减为零，从而帮助您最大限度地降低成本。

在这篇文章中，我们展示了如何使用 Boto3 库亚马逊SageMaker.

自动驾驶训练模式

在创建 Autopilot 实验之前，您可以让 Autopilot 自动选择训练模式，也可以手动选择训练模式。

Autopilot目前支持三种训练模式：

汽车 – 根据数据集大小，Autopilot 自动选择集成或 HPO 模式。对于大于 100 MB 的数据集，Autopilot 选择 HPO；否则，它选择集成。
合奏 – 自动驾驶仪使用自动胶使用模型堆叠的集成技术并产生最佳预测模型。
超参数优化 (HPO) – Autopilot 通过在您的数据集上运行训练作业时使用贝叶斯优化或多保真度优化调整超参数来找到模型的最佳版本。 HPO 模式选择与您的数据集最相关的算法，并选择最佳范围的超参数来调整您的模型。

要了解有关 Autopilot 训练模式的更多信息，请参阅训练方式.

解决方案概述

在这篇文章中，我们使用 UCI 银行营销数据集来预测客户是否会订阅银行提供的定期存款。这是一个二元分类问题类型。

我们使用 SageMaker 的 Boto3 库. 第一项工作使用集成作为选择的训练模式。然后，我们将生成的单个集成模型部署到无服务器端点，并将推理请求发送到该托管端点。

第二份工作使用HPO训练模式。对于分类问题类型，Autopilot 生成三个推理容器。我们提取这三个推理容器并将它们部署到单独的无服务器端点。然后我们将推理请求发送到这些托管端点。

有关回归和分类问题类型的更多信息，请参阅回归和分类问题类型的推理容器定义.

我们还可以从亚马逊SageMaker Studio 用户界面。如果您从 UI 启动作业，请确保关闭 自动部署 在选项 部署和高级设置 部分。否则，Autopilot 会将最佳候选者部署到实时端点。

先决条件

确保您安装了最新版本的 Boto3 和 SageMaker Python 包：

pip install -U boto3 sagemaker

我们需要 SageMaker 包版本 >= 2.110.0 和 Boto3 版本 >= boto3-1.24.84.

使用集成模式启动 Autopilot 作业

要使用 SageMaker Boto3 库启动 Autopilot 作业，我们使用创建_auto_ml_job 应用程序接口。然后我们传入 AutoMLJobConfig, InputDataConfig及 AutoMLJobObjective 作为输入 create_auto_ml_job。请参见以下代码：

bucket = session.default_bucket()
role = sagemaker.get_execution_role()
prefix = "autopilot/bankadditional"
sm_client = boto3.Session().client(service_name='sagemaker',region_name=region)

timestamp_suffix = strftime('%d%b%Y-%H%M%S', gmtime())
automl_job_name = f"uci-bank-marketing-{timestamp_suffix}"
max_job_runtime_seconds = 3600
max_runtime_per_job_seconds = 1200
target_column = "y"
problem_type="BinaryClassification"
objective_metric = "F1"
training_mode = "ENSEMBLING"

automl_job_config = {
    'CompletionCriteria': {
      'MaxRuntimePerTrainingJobInSeconds': max_runtime_per_job_seconds,
      'MaxAutoMLJobRuntimeInSeconds': max_job_runtime_seconds
    },    
    "Mode" : training_mode
}

automl_job_objective= { "MetricName": objective_metric }

input_data_config = [
    {
      'DataSource': {
        'S3DataSource': {
          'S3DataType': 'S3Prefix',
          'S3Uri': f's3://{bucket}/{prefix}/raw/bank-additional-full.csv'
        }
      },
      'TargetAttributeName': target_column
    }
  ]

output_data_config = {
	    'S3OutputPath': f's3://{bucket}/{prefix}/output'
	}


sm_client.create_auto_ml_job(
				AutoMLJobName=auto_ml_job_name,
				InputDataConfig=input_data_config,
				OutputDataConfig=output_data_config,
				AutoMLJobConfig=automl_job_config,
				ProblemType=problem_type,
				AutoMLJobObjective=automl_job_objective,
				RoleArn=role)

自动驾驶仪返回 BestCandidate 模型对象具有 InferenceContainers 需要将模型部署到推理端点。得到 BestCandidate 对于前面的工作，我们使用 describe_automl_job 功能：

job_response = sm_client.describe_auto_ml_job(AutoMLJobName=automl_job_name)
best_candidate = job_response['BestCandidate']
inference_container = job_response['BestCandidate']['InferenceContainers'][0]
print(inference_container)

部署训练好的模型

我们现在将前面的推理容器部署到无服务器端点。第一步是从推理容器创建一个模型，然后创建一个端点配置，我们在其中指定 MemorySizeInMB 和 MaxConcurrency 无服务器端点的值以及模型名称。最后，我们使用上面创建的端点配置创建一个端点。

我们建议您选择端点的内存大小根据您的模型尺寸。内存大小应至少与模型大小一样大。您的无服务器端点的最小 RAM 大小为 1024 MB (1 GB)，您可以选择的最大 RAM 大小为 6144 MB (6 GB)。

您可以选择的内存大小为 1024 MB、2048 MB、3072 MB、4096 MB、5120 MB 或 6144 MB。

为了从成本和性能的角度帮助确定无服务器端点是否是正确的部署选项，我们鼓励您参考 SageMaker 无服务器推理基准测试工具包，它测试不同的端点配置，并将最佳配置与可比较的实时托管实例进行比较。

请注意，无服务器端点仅接受 SingleModel 用于推理容器。集成模式下的 Autopilot 生成单个模型，因此我们可以将这个模型容器按原样部署到端点。请参见以下代码：

# Create Model
	model_response = sm_client.create_model(
				ModelName=model_name,
				ExecutionRoleArn=role,
				Containers=[inference_container]
	)

# Create Endpoint Config
	epc_response = sm_client.create_endpoint_config(
		EndpointConfigName = endpoint_config_name,
		ProductionVariants=[
			{
				"ModelName": model_name,
				"VariantName": "AllTraffic",
				"ServerlessConfig": {
					"MemorySizeInMB": memory,
					"MaxConcurrency": max_concurrency
				}
			}
		]
	)

# Create Endpoint
	ep_response = sm_client.create_endpoint(
		EndpointName=endpoint_name,
		EndpointConfigName=endpoint_config_name
	)

当无服务器推理端点是 InService，我们可以通过发送推理请求并观察预测来测试端点。下图说明了此设置的体系结构。

请注意，我们可以将原始数据作为有效负载发送到端点。 Autopilot 生成的集成模型自动将所有必需的特征转换和逆标签转换步骤与算法模型和包合并到一个模型中。

向经过训练的模型发送推理请求

使用以下代码对使用集成模式训练的模型发送推理：

from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer


payload = "34,blue-collar,married,basic.4y,no,no,no,telephone,may,tue,800,4,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0"

predictor = Predictor(
        endpoint_name=endpoint,
        sagmaker_session=session,
        serializer=CSVSerializer(),
    )

prediction = predictor.predict(payload).decode(‘utf-8’)
print(prediction)

使用 HPO 模式启动 Autopilot Job

在 HPO 模式下，对于 CompletionCriteria，除了 MaxRuntimePerTrainingJobInSeconds 和 MaxAutoMLJobRuntimeInSeconds, 我们还可以指定 MaxCandidates 限制 Autopilot 作业将生成的候选人数量。请注意，这些是可选参数，仅用于限制演示的作业运行时间。请参见以下代码：

training_mode = "HYPERPARAMETER_TUNING"

automl_job_config["Mode"] = training_mode
automl_job_config["CompletionCriteria"]["MaxCandidates"] = 15
hpo_automl_job_name =  f"{model_prefix}-HPO-{timestamp_suffix}"

response = sm_client.create_auto_ml_job(
					  AutoMLJobName=hpo_automl_job_name,
					  InputDataConfig=input_data_config,
					  OutputDataConfig=output_data_config,
					  AutoMLJobConfig=automl_job_config,
					  ProblemType=problem_type,
					  AutoMLJobObjective=automl_job_objective,
					  RoleArn=role,
					  Tags=tags_config
				)

要获得 BestCandidate 对于前面的工作，我们可以再次使用 describe_automl_job 功能：

job_response = sm_client.describe_auto_ml_job(AutoMLJobName=automl_job_name)
best_candidate = job_response['BestCandidate']
inference_containers = job_response['BestCandidate']['InferenceContainers']
print(inference_containers)

部署训练好的模型

HPO 模式下的自动驾驶仪针对分类问题类型生成三个推理容器。

第一个容器处理特征转换步骤。接下来，算法容器生成 predicted_label 以最高的概率。最后，后处理推理容器对预测标签进行逆变换并将其映射到原始标签。有关详细信息，请参阅回归和分类问题类型的推理容器定义.

我们提取这三个推理容器并将它们部署到单独的无服务器端点。对于推理，我们依次调用端点，首先将有效负载发送到特征转换容器，然后将该容器的输出传递到算法容器，最后将前一个推理容器的输出传递到后处理容器，输出预测标签。

下图说明了此设置的体系结构。

我们从中提取三个推理容器 BestCandidate 使用以下代码：

job_response = sm_client.describe_auto_ml_job(AutoMLJobName=automl_job_name)
inference_containers = job_response['BestCandidate']['InferenceContainers']

models = list()
endpoint_configs = list()
endpoints = list()

# For brevity, we've encapsulated create_model, create endpoint_config and create_endpoint as helper functions
for idx, container in enumerate(inference_containers):
    (status, model_arn) = create_autopilot_model(
								    sm_client,
								    automl_job_name,
            						role,
								    container,
								    idx)
    model_name = model_arn.split('/')[1]
    models.append(model_name)

    endpoint_config_name = f"epc-{model_name}"
    endpoint_name = f"ep-{model_name}"
    (status, epc_arn) = create_serverless_endpoint_config(
								    sm_client,
								    endpoint_config_name,
								    model_name,
            						memory=2048,
								    max_concurrency=10)
	endpoint_configs.append(endpoint_config_name)

	response = create_serverless_endpoint(
								    sm_client,
								    endpoint_name,
								    endpoint_config_name)
	endpoints.append(endpoint_name)

向经过训练的模型发送推理请求

为了进行推理，我们按顺序发送有效负载：首先发送到特征转换容器，然后发送到模型容器，最后发送到逆标签转换容器。

HPO模式三种推理容器的推理请求流程示意图

请参见以下代码：

from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer

payload = "51,technician,married,professional.course,no,yes,no,cellular,apr,thu,687,1,0,1,success,-1.8,93.075,-47.1,1.365,5099.1"


for _, endpoint in enumerate(endpoints):
    try:
        print(f"payload: {payload}")
        predictor = Predictor(
            endpoint_name=endpoint,
            sagemaker_session=session,
            serializer=CSVSerializer(),
        )
        prediction = predictor.predict(payload)
        payload=prediction
    except Exception as e:
        print(f"Error invoking Endpoint; {endpoint} n {e}")
        break

此示例的完整实现可在以下 jupyter 中获得笔记本.

清理

要清理资源，您可以删除创建的无服务器端点、端点配置和模型：

sm_client = boto3.Session().client(service_name='sagemaker',region_name=region)

for _, endpoint in enumerate(endpoints):
    try:
        sm_client.delete_endpoint(EndpointName=endpoint)
    except Exception as e:
        print(f"Exception:n{e}")
        continue
        
for _, endpoint_config in enumerate(endpoint_configs):
    try:
        sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config)
    except Exception as e:
        print(f"Exception:n{e}")
        continue

for _, autopilot_model in enumerate(models):
    try:
        sm_client.delete_model(ModelName=autopilot_model)
    except Exception as e:
        print(f"Exception:n{e}")
        continue

结论

在本文中，我们展示了如何将 Autopilot 生成的模型以集成和 HPO 模式部署到无服务器推理端点。该解决方案可以加快您使用和利用 Autopilot 等经济高效且完全托管的 ML 服务的能力，以从原始数据快速生成模型，然后将它们部署到具有内置自动缩放功能的完全托管的无服务器推理端点以降低成本.

我们鼓励您使用与您的业务 KPI 相关的数据集来尝试此解决方案。可以参考Jupyter notebook中实现的解决方案 GitHub回购.