서버리스 추론 엔드포인트에 Amazon SageMaker 자동 조종 장치 모델 배포

플라톤에 의해 재발행

팔로워 : 0

Amazon SageMaker 자동 조종 장치 데이터를 기반으로 최고의 기계 학습(ML) 모델을 자동으로 구축, 교육 및 조정하는 동시에 완전한 제어 및 가시성을 유지할 수 있습니다. Autopilot은 훈련된 모델을 실시간 추론 엔드포인트에 자동으로 배포할 수도 있습니다.

콜드 스타트를 허용할 수 있는 급격하거나 예측할 수 없는 트래픽 패턴이 있는 워크로드가 있는 경우 모델을 서버리스 추론 엔드포인트에 배포하는 것이 더 비용 효율적입니다.

Amazon SageMaker 서버리스 추론 트래픽 패턴을 예측할 수 없고 콜드 스타트를 허용할 수 있는 워크로드에 적합하도록 특별히 제작된 추론 옵션입니다. 장기 실행 컴퓨팅 인스턴스가 지원하는 실시간 추론 엔드포인트와 달리 서버리스 엔드포인트는 내장된 Auto Scaling으로 온디맨드 리소스를 프로비저닝합니다. 서버리스 엔드포인트는 들어오는 요청 수에 따라 자동으로 확장되고 들어오는 요청이 없으면 리소스를 XNUMX으로 축소하여 비용을 최소화할 수 있습니다.

이 게시물에서는 다음을 사용하여 Autopilot 학습 모델을 서버리스 추론 엔드포인트에 배포하는 방법을 보여줍니다. Boto3 라이브러리 for 아마존 세이지 메이커.

자동조종장치 훈련 모드

Autopilot 실험을 만들기 전에 Autopilot에서 학습 모드를 자동으로 선택하도록 하거나 수동으로 학습 모드를 선택할 수 있습니다.

Autopilot은 현재 세 가지 교육 모드를 지원합니다.

자동차 – 데이터 세트 크기에 따라 Autopilot은 앙상블 또는 HPO 모드를 자동으로 선택합니다. 100MB보다 큰 데이터 세트의 경우 Autopilot은 HPO를 선택합니다. 그렇지 않으면 앙상블을 선택합니다.
앙상블 – 자동 조종 장치는 오토글루온 모델 스태킹을 이용한 앙상블 기법으로 최적의 예측 모델을 생성합니다.
하이퍼파라미터 최적화(HPO) – Autopilot은 데이터 세트에서 교육 작업을 실행하는 동안 Bayesian 최적화 또는 다중 충실도 최적화를 사용하여 하이퍼파라미터를 조정하여 최상의 모델 버전을 찾습니다. HPO 모드는 데이터 세트와 가장 관련성이 높은 알고리즘을 선택하고 모델을 튜닝하기 위한 최상의 하이퍼파라미터 범위를 선택합니다.

Autopilot 훈련 모드에 대한 자세한 내용은 다음을 참조하십시오. 훈련 모드.

솔루션 개요

이 게시물에서 우리는 UCI 은행 마케팅 고객이 은행에서 제공하는 정기예금에 가입할지 여부를 예측하기 위한 데이터 세트입니다. 이것은 이진 분류 문제 유형입니다.

다음을 사용하여 두 개의 Autopilot 작업을 시작합니다. SageMaker용 Boto3 라이브러리. 첫 번째 작업은 선택한 훈련 모드로 앙상블을 사용합니다. 그런 다음 서버리스 끝점에 생성된 단일 앙상블 모델을 배포하고 이 호스팅된 끝점에 추론 요청을 보냅니다.

두 번째 작업은 HPO 훈련 모드를 사용합니다. 분류 문제 유형의 경우 Autopilot은 XNUMX개의 추론 컨테이너를 생성합니다. 이 XNUMX개의 추론 컨테이너를 추출하여 별도의 서버리스 엔드포인트에 배포합니다. 그런 다음 이러한 호스팅된 끝점에 추론 요청을 보냅니다.

회귀 및 분류 문제 유형에 대한 자세한 내용은 다음을 참조하십시오. 회귀 및 분류 문제 유형에 대한 추론 컨테이너 정의.

또한 Autopilot 작업을 시작할 수 있습니다. 아마존 세이지 메이커 스튜디오 UI. UI에서 작업을 시작하는 경우 자동 배포 에서 옵션 배포 및 고급 설정 부분. 그렇지 않으면 Autopilot이 실시간 엔드포인트에 가장 적합한 후보를 배포합니다.

사전 조건

최신 버전의 Boto3 및 SageMaker Python 패키지가 설치되어 있는지 확인합니다.

pip install -U boto3 sagemaker

SageMaker 패키지 버전이 필요합니다. >= 2.110.0 및 Boto3 버전 >= boto3-1.24.84.

앙상블 모드로 Autopilot 작업 시작

SageMaker Boto3 라이브러리를 사용하여 Autopilot 작업을 시작하려면 다음을 사용합니다. create_auto_ml_job API. 그런 다음 통과합니다. AutoMLJobConfig, InputDataConfig및 AutoMLJobObjective 에 대한 입력으로 create_auto_ml_job. 다음 코드를 참조하십시오.

bucket = session.default_bucket()
role = sagemaker.get_execution_role()
prefix = "autopilot/bankadditional"
sm_client = boto3.Session().client(service_name='sagemaker',region_name=region)

timestamp_suffix = strftime('%d%b%Y-%H%M%S', gmtime())
automl_job_name = f"uci-bank-marketing-{timestamp_suffix}"
max_job_runtime_seconds = 3600
max_runtime_per_job_seconds = 1200
target_column = "y"
problem_type="BinaryClassification"
objective_metric = "F1"
training_mode = "ENSEMBLING"

automl_job_config = {
    'CompletionCriteria': {
      'MaxRuntimePerTrainingJobInSeconds': max_runtime_per_job_seconds,
      'MaxAutoMLJobRuntimeInSeconds': max_job_runtime_seconds
    },    
    "Mode" : training_mode
}

automl_job_objective= { "MetricName": objective_metric }

input_data_config = [
    {
      'DataSource': {
        'S3DataSource': {
          'S3DataType': 'S3Prefix',
          'S3Uri': f's3://{bucket}/{prefix}/raw/bank-additional-full.csv'
        }
      },
      'TargetAttributeName': target_column
    }
  ]

output_data_config = {
	    'S3OutputPath': f's3://{bucket}/{prefix}/output'
	}


sm_client.create_auto_ml_job(
				AutoMLJobName=auto_ml_job_name,
				InputDataConfig=input_data_config,
				OutputDataConfig=output_data_config,
				AutoMLJobConfig=automl_job_config,
				ProblemType=problem_type,
				AutoMLJobObjective=automl_job_objective,
				RoleArn=role)

오토파일럿은 BestCandidate 가지고 있는 모델 객체 InferenceContainers 추론 엔드포인트에 모델을 배포하는 데 필요합니다. 를 얻으려면 BestCandidate 이전 작업의 경우 다음을 사용합니다. describe_automl_job 기능:

job_response = sm_client.describe_auto_ml_job(AutoMLJobName=automl_job_name)
best_candidate = job_response['BestCandidate']
inference_container = job_response['BestCandidate']['InferenceContainers'][0]
print(inference_container)

학습된 모델 배포

이제 이전 추론 컨테이너를 서버리스 엔드포인트에 배포합니다. 첫 번째 단계는 추론 컨테이너에서 모델을 생성한 다음 다음을 지정하는 엔드포인트 구성을 생성하는 것입니다. MemorySizeInMB 및 MaxConcurrency 모델 이름과 함께 서버리스 끝점에 대한 값. 마지막으로 위에서 만든 끝점 구성으로 끝점을 만듭니다.

우리는 당신을 선택하는 것이 좋습니다 끝점의 메모리 크기 모델 크기에 따라. 메모리 크기는 적어도 모델 크기만큼 커야 합니다. 서버리스 엔드포인트의 최소 RAM 크기는 1024MB(1GB)이고 선택할 수 있는 최대 RAM 크기는 6144MB(6GB)입니다.

선택할 수 있는 메모리 크기는 1024MB, 2048MB, 3072MB, 4096MB, 5120MB 또는 6144MB입니다.

서버리스 엔드포인트가 비용 및 성능 관점에서 올바른 배포 옵션인지 확인하려면 다음을 참조하는 것이 좋습니다. SageMaker 서버리스 추론 벤치마킹 툴킷, 다양한 엔드포인트 구성을 테스트하고 가장 최적의 구성을 유사한 실시간 호스팅 인스턴스와 비교합니다.

서버리스 엔드포인트는 SingleModel 추론 컨테이너용. 앙상블 모드의 Autopilot은 단일 모델을 생성하므로 이 모델 컨테이너를 있는 그대로 엔드포인트에 배포할 수 있습니다. 다음 코드를 참조하십시오.

# Create Model
	model_response = sm_client.create_model(
				ModelName=model_name,
				ExecutionRoleArn=role,
				Containers=[inference_container]
	)

# Create Endpoint Config
	epc_response = sm_client.create_endpoint_config(
		EndpointConfigName = endpoint_config_name,
		ProductionVariants=[
			{
				"ModelName": model_name,
				"VariantName": "AllTraffic",
				"ServerlessConfig": {
					"MemorySizeInMB": memory,
					"MaxConcurrency": max_concurrency
				}
			}
		]
	)

# Create Endpoint
	ep_response = sm_client.create_endpoint(
		EndpointName=endpoint_name,
		EndpointConfigName=endpoint_config_name
	)

서버리스 추론 엔드포인트가 InService, 추론 요청을 전송하여 엔드포인트를 테스트하고 예측을 관찰할 수 있습니다. 다음 다이어그램은 이 설정의 아키텍처를 보여줍니다.

원시 데이터를 엔드포인트에 페이로드로 보낼 수 있습니다. Autopilot에서 생성된 앙상블 모델은 알고리즘 모델 및 패키지와 함께 필요한 모든 기능 변환 및 역 레이블 변환 단계를 단일 모델에 자동으로 통합합니다.

학습된 모델에 추론 요청 보내기

앙상블 모드를 사용하여 훈련된 모델에 대한 추론을 보내려면 다음 코드를 사용하십시오.

from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer


payload = "34,blue-collar,married,basic.4y,no,no,no,telephone,may,tue,800,4,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0"

predictor = Predictor(
        endpoint_name=endpoint,
        sagmaker_session=session,
        serializer=CSVSerializer(),
    )

prediction = predictor.predict(payload).decode(‘utf-8’)
print(prediction)

HPO 모드로 Autopilot 작업 시작

HPO 모드에서 CompletionCriteria, 게다가 MaxRuntimePerTrainingJobInSeconds 및 MaxAutoMLJobRuntimeInSeconds, 우리는 또한 지정할 수 있습니다 MaxCandidates Autopilot 작업이 생성할 후보 수를 제한합니다. 이들은 선택적 매개변수이며 시연을 위해 작업 런타임을 제한하기 위해서만 설정됩니다. 다음 코드를 참조하십시오.

training_mode = "HYPERPARAMETER_TUNING"

automl_job_config["Mode"] = training_mode
automl_job_config["CompletionCriteria"]["MaxCandidates"] = 15
hpo_automl_job_name =  f"{model_prefix}-HPO-{timestamp_suffix}"

response = sm_client.create_auto_ml_job(
					  AutoMLJobName=hpo_automl_job_name,
					  InputDataConfig=input_data_config,
					  OutputDataConfig=output_data_config,
					  AutoMLJobConfig=automl_job_config,
					  ProblemType=problem_type,
					  AutoMLJobObjective=automl_job_objective,
					  RoleArn=role,
					  Tags=tags_config
				)

얻으려면 BestCandidate 이전 작업에 대해 다음을 다시 사용할 수 있습니다. describe_automl_job 기능:

job_response = sm_client.describe_auto_ml_job(AutoMLJobName=automl_job_name)
best_candidate = job_response['BestCandidate']
inference_containers = job_response['BestCandidate']['InferenceContainers']
print(inference_containers)

학습된 모델 배포

분류 문제 유형에 대한 HPO 모드의 Autopilot은 XNUMX개의 추론 컨테이너를 생성합니다.

첫 번째 컨테이너는 기능 변환 단계를 처리합니다. 다음으로 알고리즘 컨테이너는 predicted_label 가장 높은 확률로. 마지막으로 사후 처리 추론 컨테이너는 예측된 레이블에 대해 역변환을 수행하고 이를 원래 레이블에 매핑합니다. 자세한 내용은 다음을 참조하십시오. 회귀 및 분류 문제 유형에 대한 추론 컨테이너 정의.

이 XNUMX개의 추론 컨테이너를 추출하여 별도의 서버리스 엔드포인트에 배포합니다. 추론을 위해 먼저 페이로드를 기능 변환 컨테이너로 보낸 다음 이 컨테이너의 출력을 알고리즘 컨테이너로 전달하고 마지막으로 이전 추론 컨테이너의 출력을 사후 처리 컨테이너로 전달하여 끝점을 순서대로 호출합니다. 예측된 레이블을 출력합니다.

다음 다이어그램은 이 설정의 아키텍처를 보여줍니다.

우리는 XNUMX개의 추론 컨테이너를 추출합니다. BestCandidate 다음 코드로 :

job_response = sm_client.describe_auto_ml_job(AutoMLJobName=automl_job_name)
inference_containers = job_response['BestCandidate']['InferenceContainers']

models = list()
endpoint_configs = list()
endpoints = list()

# For brevity, we've encapsulated create_model, create endpoint_config and create_endpoint as helper functions
for idx, container in enumerate(inference_containers):
    (status, model_arn) = create_autopilot_model(
								    sm_client,
								    automl_job_name,
            						role,
								    container,
								    idx)
    model_name = model_arn.split('/')[1]
    models.append(model_name)

    endpoint_config_name = f"epc-{model_name}"
    endpoint_name = f"ep-{model_name}"
    (status, epc_arn) = create_serverless_endpoint_config(
								    sm_client,
								    endpoint_config_name,
								    model_name,
            						memory=2048,
								    max_concurrency=10)
	endpoint_configs.append(endpoint_config_name)

	response = create_serverless_endpoint(
								    sm_client,
								    endpoint_name,
								    endpoint_config_name)
	endpoints.append(endpoint_name)

학습된 모델에 추론 요청 보내기

추론을 위해 페이로드를 순서대로 보냅니다. 먼저 기능 변환 컨테이너, 모델 컨테이너, 마지막으로 역 레이블 변환 컨테이너로 보냅니다.

HPO 모드에서 XNUMX개의 추론 컨테이너에 대한 추론 요청 흐름의 시각적 개체

다음 코드를 참조하십시오.

from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer

payload = "51,technician,married,professional.course,no,yes,no,cellular,apr,thu,687,1,0,1,success,-1.8,93.075,-47.1,1.365,5099.1"


for _, endpoint in enumerate(endpoints):
    try:
        print(f"payload: {payload}")
        predictor = Predictor(
            endpoint_name=endpoint,
            sagemaker_session=session,
            serializer=CSVSerializer(),
        )
        prediction = predictor.predict(payload)
        payload=prediction
    except Exception as e:
        print(f"Error invoking Endpoint; {endpoint} n {e}")
        break

이 예제의 전체 구현은 다음 jupyter에서 사용할 수 있습니다. 수첩.

정리

리소스를 정리하려면 생성된 서버리스 엔드포인트, 엔드포인트 구성 및 모델을 삭제할 수 있습니다.

sm_client = boto3.Session().client(service_name='sagemaker',region_name=region)

for _, endpoint in enumerate(endpoints):
    try:
        sm_client.delete_endpoint(EndpointName=endpoint)
    except Exception as e:
        print(f"Exception:n{e}")
        continue
        
for _, endpoint_config in enumerate(endpoint_configs):
    try:
        sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config)
    except Exception as e:
        print(f"Exception:n{e}")
        continue

for _, autopilot_model in enumerate(models):
    try:
        sm_client.delete_model(ModelName=autopilot_model)
    except Exception as e:
        print(f"Exception:n{e}")
        continue

결론

이 게시물에서는 Autopilot 생성 모델을 앙상블 및 HPO 모드 모두에서 서버리스 추론 엔드포인트에 배포하는 방법을 보여주었습니다. 이 솔루션은 Autopilot과 같은 비용 효율적이고 완벽하게 관리되는 ML 서비스를 사용하고 활용하여 원시 데이터에서 신속하게 모델을 생성한 다음 자동 확장이 내장된 완전 관리형 서버리스 추론 엔드포인트에 배포하여 비용을 절감하는 기능을 가속화할 수 있습니다. .

비즈니스 KPI와 관련된 데이터 세트로 이 솔루션을 사용해 보시기 바랍니다. Jupyter 노트북에 구현된 솔루션은 다음에서 참조할 수 있습니다. GitHub 레포.

추가 참조

저자에 관하여

프라빈 차마르티 Amazon Web Services의 선임 AI/ML 전문가입니다. 그는 AI/ML과 AWS의 모든 것에 열정적입니다. 그는 미국 전역의 고객이 AWS에서 ML 워크로드를 효율적으로 확장, 혁신 및 운영할 수 있도록 지원합니다. 여가 시간에 Praveen은 공상 과학 영화를 읽고 즐깁니다.

타임 스탬프 : 2022 년 12 월 8 일2022 년 12 월 11 일

타임 스탬프 : 12월 22, 2023

서버리스 추론 엔드포인트에 Amazon SageMaker Autopilot 모델 배포

플라톤에 의해 재발행

자동조종장치 훈련 모드

솔루션 개요

사전 조건

앙상블 모드로 Autopilot 작업 시작

학습된 모델 배포

학습된 모델에 추론 요청 보내기

HPO 모드로 Autopilot 작업 시작

학습된 모델 배포

학습된 모델에 추론 요청 보내기

정리

결론

추가 참조

저자에 관하여

더보기 AWS 기계 학습

Amazon SageMaker Canvas로 코드 없는 기계 학습을 사용하여 제조 품질을 위한 컴퓨터 비전 결함 감지 대중화 | 아마존 웹 서비스

Accenture는 AWS 생성 AI 서비스를 사용하여 규제 문서 작성 솔루션을 만듭니다 | 아마존 웹 서비스

Amazon Kendra를 사용하여 Adobe Experience Manager 콘텐츠를 지능적으로 검색 | 아마존 웹 서비스

Amazon Lex | 아마존 웹 서비스

Amazon SageMaker JupyterLab 확장을 사용하여 모든 JupyterLab 환경에서 노트북 예약 | 아마존 웹 서비스

Amazon SageMaker 모델 병렬 라이브러리는 이제 PyTorch FSDP 워크로드를 최대 20% 가속화합니다 | 아마존 웹 서비스

회사 소개

수직 검색 및 인공 지능

플랫폼

연결 유지

계정