Instruksi Menyempurnakan Untuk FLAN T5 XL Dengan Amazon SageMaker Jumpstart

Diterbitkan Ulang Oleh Plato

Followers: 0

AI generatif berada di tengah periode pertumbuhan yang menakjubkan. Model pondasi yang semakin mampu terus dirilis, dengan model bahasa besar (LLM) menjadi salah satu kelas model yang paling terlihat. LLM adalah model yang terdiri dari miliaran parameter yang dilatih pada kumpulan teks yang luas, hingga ratusan miliar atau bahkan satu triliun token. Model ini telah terbukti sangat efektif untuk berbagai tugas berbasis teks, mulai dari menjawab pertanyaan hingga analisis sentimen.

Kekuatan LLM berasal dari kapasitasnya untuk belajar dan menggeneralisasi dari data pelatihan yang luas dan beragam. Pelatihan awal model ini dilakukan dengan berbagai tujuan, diawasi, tidak diawasi, atau hibrida. Penyelesaian atau imputasi teks adalah salah satu tujuan tanpa pengawasan yang paling umum: diberikan potongan teks, model belajar memprediksi secara akurat apa yang akan terjadi selanjutnya (misalnya, memprediksi kalimat berikutnya). Model juga dapat dilatih dengan cara yang diawasi menggunakan data berlabel untuk menyelesaikan serangkaian tugas (misalnya, apakah ulasan film ini positif, negatif, atau netral). Apakah model dilatih untuk penyelesaian teks atau tugas lain, sering kali bukan tugas yang diinginkan pelanggan untuk menggunakan model tersebut.

Untuk meningkatkan kinerja LLM terlatih pada tugas tertentu, kita dapat menyetel model menggunakan contoh tugas target dalam proses yang dikenal sebagai instruksi fine-tuning. Penyempurnaan instruksi menggunakan sekumpulan contoh berlabel dalam bentuk pasangan {prompt, response} untuk lebih melatih model yang telah dilatih sebelumnya dalam memprediksi respons yang diberikan prompt secara memadai. Proses ini memodifikasi bobot model.

Posting ini menjelaskan cara melakukan fine-tuning instruksi LLM, yaitu FLAN T5 XL, menggunakan Jumpstart Amazon SageMaker. Kami mendemonstrasikan cara melakukannya menggunakan UI Jumpstart dan notebook di Studio Amazon SageMaker. Anda dapat menemukan notebook terlampir dalam amazon-sagemaker-contoh Repositori GitHub.

Ikhtisar solusi

Tugas target dalam posting ini adalah, diberikan potongan teks di prompt, mengembalikan pertanyaan yang terkait dengan teks tetapi tidak dapat dijawab berdasarkan informasi yang dikandungnya. Ini adalah tugas yang berguna untuk mengidentifikasi informasi yang hilang dalam deskripsi atau mengidentifikasi apakah kueri memerlukan lebih banyak informasi untuk dijawab.

Model FLAN T5 adalah instruksi yang disesuaikan pada berbagai tugas untuk meningkatkan kinerja zero-shot dari model ini pada banyak tugas umum [1]. Penyempurnaan instruksi tambahan untuk tugas pelanggan tertentu dapat semakin meningkatkan akurasi model ini, terutama jika tugas target sebelumnya tidak digunakan untuk melatih model FLAN T5, seperti halnya tugas kita.

Dalam contoh tugas kami, kami tertarik untuk menghasilkan pertanyaan yang relevan namun belum terjawab. Untuk tujuan ini, kami menggunakan subset versi 2 dari Dataset Penjawab Pertanyaan Stanford (SQuAD2.0)[2] untuk menyempurnakan model. Kumpulan data ini berisi pertanyaan yang diajukan oleh anotator manusia pada sekumpulan artikel Wikipedia. Selain pertanyaan dengan jawaban, SQuAD2.0 berisi sekitar 50,000 pertanyaan yang tidak dapat dijawab. Pertanyaan semacam itu masuk akal tetapi tidak dapat langsung dijawab dari konten artikel. Kami hanya menggunakan pertanyaan yang tidak dapat dijawab. Data kami disusun sebagai file Baris JSON, dengan setiap baris berisi konteks dan pertanyaan.

Cuplikan layar dari beberapa entri dataset SQuADv2.

Prasyarat

Untuk memulai, yang Anda butuhkan hanyalah akun AWS tempat Anda dapat menggunakan Studio. Anda harus membuat profil pengguna untuk Studio jika Anda belum memilikinya.

Sempurnakan FLAN-T5 dengan Jumpstart UI

Untuk menyempurnakan model dengan UI Jumpstart, selesaikan langkah-langkah berikut:

Di konsol SageMaker, buka Studio.
Bawah Jumpstart SageMaker di panel navigasi, pilih Model, notebook, solusi.

Anda akan melihat daftar model alas bedak, termasuk FLAN T5 XL, yang ditandai dengan fine-tunable.

Pilih Lihat model.

UI JumpStart dengan FLAN-T5 XL.

Bawah Sumber data, Anda dapat memberikan jalur ke data pelatihan Anda. Sumber data yang digunakan dalam posting ini disediakan secara default.
Anda dapat mempertahankan nilai default untuk konfigurasi penerapan (termasuk jenis instans), keamanan, dan hyperparameter, tetapi Anda harus menambah jumlah zaman menjadi setidaknya tiga untuk mendapatkan hasil yang baik.
Pilih Pelatihan VE untuk melatih model.

UI kereta JumpStart untuk model FLAN-T5 XL.

Anda dapat melacak status tugas pelatihan di UI.

Jumpstart UI untuk pelatihan yang sedang berlangsung.

Saat pelatihan selesai (setelah sekitar 53 menit dalam kasus kami), pilih Menyebarkan untuk menyebarkan model fine-tuned.

Pelatihan JumpStart UI selesai.

Setelah titik akhir dibuat (beberapa menit), Anda dapat membuka buku catatan dan mulai menggunakan model yang telah disempurnakan.

Sempurnakan FLAN-T5 menggunakan notebook Python

Notebook contoh kami menunjukkan cara menggunakan Jumpstart dan SageMaker untuk menyempurnakan dan menerapkan model FLAN T5 XL secara terprogram. Itu dapat dijalankan di Studio atau secara lokal.

Di bagian ini, pertama-tama kita menelusuri beberapa penyiapan umum. Kemudian Anda menyempurnakan model menggunakan kumpulan data SQuADv2. Berikutnya, Anda menerapkan versi model yang telah dilatih sebelumnya di belakang titik akhir SageMaker, dan melakukan hal yang sama dengan model yang telah disempurnakan. Terakhir, Anda dapat mengkueri titik akhir dan membandingkan kualitas keluaran dari model yang telah dilatih sebelumnya dan yang disempurnakan. Anda akan menemukan bahwa output dari model fine-tuned memiliki kualitas yang jauh lebih tinggi.

Siapkan prasyarat

Mulailah dengan menginstal dan memutakhirkan paket yang diperlukan. Mulai ulang kernel setelah menjalankan kode berikut:

!pip install nest-asyncio==1.5.5 --quiet
!pip install ipywidgets==8.0.4 --quiet
!pip install --upgrade sagemaker --quiet

Selanjutnya, dapatkan peran eksekusi yang terkait dengan instance notebook saat ini:

import boto3
import sagemaker
# Get current region, role, and default bucket
aws_region = boto3.Session().region_name
aws_role = sagemaker.session.Session().get_caller_identity_arn()
output_bucket = sagemaker.Session().default_bucket()
# This will be useful for printing
newline, bold, unbold = "n", "33[1m", "33[0m"
print(f"{bold}aws_region:{unbold} {aws_region}")
print(f"{bold}aws_role:{unbold} {aws_role}")
print(f"{bold}output_bucket:{unbold} {output_bucket}"

Anda dapat menentukan menu drop-down yang nyaman yang akan mencantumkan ukuran model yang tersedia untuk penyempurnaan:

import IPython
from ipywidgets import Dropdown
from sagemaker.jumpstart.filters import And
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
# Default model choice
model_id = "huggingface-text2text-flan-t5-xl"
# Identify FLAN T5 models that support fine-tuning
filter_value = And( "task == text2text", "framework == huggingface", "training_supported == true"
)
model_list = [m for m in list_jumpstart_models(filter=filter_value) if "flan-t5" in m]
# Display the model IDs in a dropdown, for user to select
dropdown = Dropdown(
value=model_id,
options=model_list,
description="FLAN T5 models available for fine-tuning:",
style={"description_width": "initial"},
layout={"width": "max-content"},
)
display(IPython.display.Markdown("### Select a pre-trained model from the dropdown below"))
display(dropdown)

Jumpstart secara otomatis mengambil jenis instans pelatihan dan inferensi yang sesuai untuk model yang Anda pilih:

from sagemaker.instance_types import retrieve_default
model_id, model_version = dropdown.value, "*"
# Instance types for training and inference
training_instance_type = retrieve_default(
model_id=model_id, model_version=model_version, scope="training"
)
inference_instance_type = retrieve_default(
model_id=model_id, model_version=model_version, scope="inference"
)
print(f"{bold}model_id:{unbold} {model_id}")
print(f"{bold}training_instance_type:{unbold} {training_instance_type}")
print(f"{bold}inference_instance_type:{unbold} {inference_instance_type}") If you have chosen the FLAN T5 XL, you will see the following output: model_id: huggingface-text2text-flan-t5-xl training_instance_type: ml.p3.16xlarge inference_instance_type: ml.g5.2xlarge

Anda sekarang siap untuk memulai fine-tuning.

Latih ulang model pada set data fine-tuning

Setelah penyiapan Anda selesai, selesaikan langkah-langkah berikut:

Gunakan kode berikut untuk mengambil URI untuk artefak yang diperlukan:

from sagemaker import image_uris, model_uris, script_uris
# Training instance will use this image
train_image_uri = image_uris.retrieve(
region=aws_region,
framework=None,  # automatically inferred from model_id
model_id=model_id,
model_version=model_version,
image_scope="training",
instance_type=training_instance_type,
)
# Pre-trained model
train_model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope="training"
)
# Script to execute on the training instance
train_script_uri = script_uris.retrieve(
model_id=model_id, model_version=model_version, script_scope="training"
)
print(f"{bold}image uri:{unbold} {train_image_uri}")
print(f"{bold}model uri:{unbold} {train_model_uri}")
print(f"{bold}script uri:{unbold} {train_script_uri}")

Data pelatihan terletak di publik Layanan Penyimpanan Sederhana Amazon (Amazon S3).

Gunakan kode berikut untuk menunjuk ke lokasi data dan menyiapkan lokasi keluaran dalam keranjang di akun Anda:

from sagemaker.s3 import S3Downloader # We will use the train split of SQuAD2.0
original_data_file = "train-v2.0.json" # The data was mirrored in the following bucket
original_data_location = f"s3://sagemaker-sample-files/datasets/text/squad2.0/{original_data_file}"
S3Downloader.download(original_data_location, ".")

Data asli tidak dalam format yang sesuai dengan tugas yang Anda sesuaikan modelnya, sehingga Anda dapat memformatnya ulang:

import json local_data_file = "task-data.jsonl"  # any name with .jsonl extension with open(original_data_file) as f:
data = json.load(f) with open(local_data_file, "w") as f:
for article in data["data"]:
for paragraph in article["paragraphs"]:
# iterate over questions for a given paragraph
for qas in paragraph["qas"]:
if qas["is_impossible"]:
# the question is relevant, but cannot be answered
example = {"context": paragraph["context"], "question": qas["question"]}
json.dump(example, f)
f.write("n") template = { "prompt": "Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}", "completion": "{question}",
}
with open("template.json", "w") as f:
json.dump(template, f) from sagemaker.s3 import S3Uploader train_data_location = f"s3://{output_bucket}/train_data"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"{bold}training data:{unbold} {train_data_location}")

Sekarang Anda dapat menentukan beberapa hyperparameter untuk pelatihan:

from sagemaker import hyperparameters # Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version) # We will override some default hyperparameters with custom values
hyperparameters["epochs"] = "3"
# TODO
# hyperparameters["max_input_length"] = "300"  # data inputs will be truncated at this length
# hyperparameters["max_output_length"] = "40"  # data outputs will be truncated at this length
# hyperparameters["generation_max_length"] = "40"  # max length of generated output
print(hyperparameters)

Anda sekarang siap meluncurkan tugas pelatihan:

from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base model_name = "-".join(model_id.split("-")[2:])  # get the most informative part of ID
training_job_name = name_from_base(f"js-demo-{model_name}-{hyperparameters['epochs']}")
print(f"{bold}job name:{unbold} {training_job_name}") training_metric_definitions = [
{"Name": "val_loss", "Regex": "'eval_loss': ([0-9.]+)"},
{"Name": "train_loss", "Regex": "'loss': ([0-9.]+)"},
{"Name": "epoch", "Regex": "'epoch': ([0-9.]+)"},
] # Create SageMaker Estimator instance
sm_estimator = Estimator(
role=aws_role,
image_uri=train_image_uri,
model_uri=train_model_uri,
source_dir=train_script_uri,
entry_point="transfer_learning.py",
instance_count=1,
instance_type=training_instance_type,
volume_size=300,
max_run=360000,
hyperparameters=hyperparameters,
output_path=output_location,
metric_definitions=training_metric_definitions,
) # Launch a SageMaker training job over data located in the given S3 path
# Training jobs can take hours, it is recommended to set wait=False,
# and monitor job status through SageMaker console
sm_estimator.fit({"training": train_data_location}, job_name=training_job_name, wait=False)

Bergantung pada ukuran data fine-tuning dan model yang dipilih, fine-tuning bisa memakan waktu hingga beberapa jam.

Anda dapat memantau metrik kinerja seperti pelatihan dan kehilangan validasi menggunakan amazoncloudwatch selama pelatihan. Dengan mudah, Anda juga dapat mengambil cuplikan metrik terbaru dengan menjalankan kode berikut:

from sagemaker import TrainingJobAnalytics # This can be called while the job is still running
df = TrainingJobAnalytics(training_job_name=training_job_name).dataframe()
df.head(10) model uri: s3://sagemaker-us-west-2-802376408542/avkan/training-huggingface-text2text-huggingface-text2text-flan-t5-xl-repack.tar.gz
job name: jumpstart-demo-xl-3-2023-04-06-08-16-42-738
INFO:sagemaker:Creating training-job with name: jumpstart-demo-xl-3-2023-04-06-08-16-42-738

Saat pelatihan selesai, Anda memiliki model yang disetel dengan baik model_uri. Ayo gunakan!

Anda dapat membuat dua titik akhir inferensi: satu untuk model asli yang telah dilatih sebelumnya, dan satu lagi untuk model yang disempurnakan. Ini memungkinkan Anda untuk membandingkan output dari kedua versi model. Pada langkah berikutnya, Anda menerapkan titik akhir inferensi untuk model yang telah dilatih sebelumnya. Kemudian Anda menerapkan titik akhir untuk model Anda yang telah disempurnakan.

Deploy model yang telah dilatih sebelumnya

Mari kita mulai dengan men-deploy model terlatih mengambil URI gambar Docker inferensi. Ini adalah gambar container dasar Hugging Face. Gunakan kode berikut:

from sagemaker import image_uris # Retrieve the inference docker image URI. This is the base HuggingFace container image
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None,  # automatically inferred from model_id
model_id=model_id,
model_version=model_version,
image_scope="inference",
instance_type=inference_instance_type,
)

Anda sekarang dapat membuat titik akhir dan men-deploy model yang telah dilatih sebelumnya. Perhatikan bahwa Anda harus meneruskan kelas Predictor saat menerapkan model melalui kelas Model agar dapat menjalankan inferensi melalui SageMaker API. Lihat kode berikut:

from sagemaker import model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base # Retrieve the URI of the pre-trained model
pre_trained_model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope="inference"
) pre_trained_name = name_from_base(f"jumpstart-demo-pre-trained-{model_id}") # Create the SageMaker model instance of the pre-trained model
if ("small" in model_id) or ("base" in model_id):
deploy_source_uri = script_uris.retrieve(
model_id=model_id, model_version=model_version, script_scope="inference"
)
pre_trained_model = Model(
image_uri=deploy_image_uri,
source_dir=deploy_source_uri,
entry_point="inference.py",
model_data=pre_trained_model_uri,
role=aws_role,
predictor_cls=Predictor,
name=pre_trained_name,
)
else:
# For those large models, we already repack the inference script and model
# artifacts for you, so the `source_dir` argument to Model is not required.
pre_trained_model = Model(
image_uri=deploy_image_uri,
model_data=pre_trained_model_uri,
role=aws_role,
predictor_cls=Predictor,
name=pre_trained_name,
) print(f"{bold}image URI:{unbold}{newline} {deploy_image_uri}")
print(f"{bold}model URI:{unbold}{newline} {pre_trained_model_uri}")
print("Deploying an endpoint ...") # Deploy the pre-trained model. Note that we need to pass Predictor class when we deploy model
# through Model class, for being able to run inference through the SageMaker API
pre_trained_predictor = pre_trained_model.deploy(
initial_instance_count=1,
instance_type=inference_instance_type,
predictor_cls=Predictor,
endpoint_name=pre_trained_name,
)
print(f"{newline}Deployed an endpoint {pre_trained_name}")

Pembuatan titik akhir dan penerapan model dapat memakan waktu beberapa menit, kemudian titik akhir Anda siap menerima panggilan inferensi.

Terapkan model yang telah disesuaikan

Mari terapkan model yang telah disempurnakan ke titik akhirnya sendiri. Prosesnya hampir identik dengan yang kami gunakan sebelumnya untuk model pra-pelatihan. Satu-satunya perbedaan adalah kami menggunakan nama model dan URI yang disetel dengan baik:

from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base fine_tuned_name = name_from_base(f"jumpstart-demo-fine-tuned-{model_id}")
fine_tuned_model_uri = f"{output_location}{training_job_name}/output/model.tar.gz" # Create the SageMaker model instance of the fine-tuned model
fine_tuned_model = Model(
image_uri=deploy_image_uri,
model_data=fine_tuned_model_uri,
role=aws_role,
predictor_cls=Predictor,
name=fine_tuned_name,
) print(f"{bold}image URI:{unbold}{newline} {deploy_image_uri}")
print(f"{bold}model URI:{unbold}{newline} {fine_tuned_model_uri}")
print("Deploying an endpoint ...") # Deploy the fine-tuned model.
fine_tuned_predictor = fine_tuned_model.deploy(
initial_instance_count=1,
instance_type=inference_instance_type,
predictor_cls=Predictor,
endpoint_name=fine_tuned_name,
)
print(f"{newline}Deployed an endpoint {fine_tuned_name}")

Saat proses ini selesai, model yang telah dilatih sebelumnya dan model yang disempurnakan akan diterapkan di belakang titik akhir mereka sendiri. Mari kita bandingkan output mereka.

Hasilkan keluaran dan bandingkan hasilnya

Tetapkan beberapa fungsi utilitas untuk menanyakan titik akhir dan mem-parsing respons:

import boto3
import json # Parameters of (output) text generation. A great introduction to generation
# parameters can be found at https://huggingface.co/blog/how-to-generate
parameters = { "max_length": 40,  # restrict the length of the generated text "num_return_sequences": 5,  # we will inspect several model outputs "num_beams": 10,  # use beam search
} # Helper functions for running inference queries
def query_endpoint_with_json_payload(payload, endpoint_name):
encoded_json = json.dumps(payload).encode("utf-8")
client = boto3.client("runtime.sagemaker")
response = client.invoke_endpoint(
EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
)
return response def parse_response_multiple_texts(query_response):
model_predictions = json.loads(query_response["Body"].read())
generated_text = model_predictions["generated_texts"]
return generated_text def generate_questions(endpoint_name, text):
expanded_prompt = prompt.replace("{context}", text)
payload = {"text_inputs": expanded_prompt, **parameters}
query_response = query_endpoint_with_json_payload(payload, endpoint_name=endpoint_name)
generated_texts = parse_response_multiple_texts(query_response)
for i, generated_text in enumerate(generated_texts):
print(f"Response {i}: {generated_text}{newline}")

Di potongan kode berikutnya, kita mendefinisikan prompt dan data pengujian. Itu menjelaskan tugas target kami, yaitu menghasilkan pertanyaan yang terkait dengan teks yang disediakan tetapi tidak dapat dijawab berdasarkan itu.

Data uji terdiri dari tiga paragraf berbeda, satu di kota Adelaide Australia dari dua paragraf pertama dari halaman Wikipedia-nya, satu tentang Toko Blok Elastis Amazon (Amazon EBS) dari Dokumentasi Amazon EBS, dan salah satunya Amazon Comprehend dari Dokumentasi Amazon Comprehend. Kami berharap model dapat mengidentifikasi pertanyaan yang terkait dengan paragraf ini, tetapi hal itu tidak dapat dijawab dengan informasi yang diberikan di dalamnya.

prompt = "Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}" test_paragraphs = [ """
Adelaide is the capital city of South Australia, the state's largest city and the fifth-most populous city in Australia. "Adelaide" may refer to either Greater Adelaide (including the Adelaide Hills) or the Adelaide city centre.
The demonym Adelaidean is used to denote the city and the residents of Adelaide. The Traditional Owners of the Adelaide
region are the Kaurna people. The area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language. Adelaide is situated on the Adelaide Plains north of the Fleurieu Peninsula, between the Gulf St Vincent in the west and
the Mount Lofty Ranges in the east. Its metropolitan area extends 20 km (12 mi) from the coast to the foothills of
the Mount Lofty Ranges, and stretches 96 km (60 mi) from Gawler in the north to Sellicks Beach in the south. """, """
Amazon Elastic Block Store (Amazon EBS) provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. You can mount these volumes as devices on your instances. EBS volumes that are attached to an instance are exposed as storage volumes that persist independently from the life of the instance. You can create a file system on top of these volumes, or use them in any way you would use a block device (such as a hard drive). You can dynamically change the configuration of a volume attached to an instance. We recommend Amazon EBS for data that must be quickly accessible and requires long-term persistence. EBS volumes are particularly well-suited for use as the primary storage for file systems, databases, or for any applications that require fine granular updates and access to raw, unformatted, block-level storage. Amazon EBS is well suited to both database-style applications that rely on random reads and writes, and to throughput-intensive applications that perform long, continuous reads and writes. """, """
Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input. Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend's Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages. """
]

Anda sekarang dapat menguji titik akhir menggunakan artikel contoh

print(f"{bold}Prompt:{unbold} {repr(prompt)}")
for paragraph in test_paragraphs:
print("-" * 80)
print(paragraph)
print("-" * 80)
print(f"{bold}pre-trained{unbold}")
generate_questions(pre_trained_name, paragraph)
print(f"{bold}fine-tuned{unbold}")
generate_questions(fine_tuned_name, paragraph)

Data uji: Adelaide

Kami menggunakan konteks berikut:

delaide is the capital city of South Australia, the state's largest city and the fifth-most populous city in Australia. "Adelaide" may refer to either Greater Adelaide (including the Adelaide Hills) or the Adelaide city centre.
The demonym Adelaidean is used to denote the city and the residents of Adelaide. The Traditional Owners of the Adelaide
region are the Kaurna people. The area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language. Adelaide is situated on the Adelaide Plains north of the Fleurieu Peninsula, between the Gulf St Vincent in the west and
the Mount Lofty Ranges in the east. Its metropolitan area extends 20 km (12 mi) from the coast to the foothills of
the Mount Lofty Ranges, and stretches 96 km (60 mi) from Gawler in the north to Sellicks Beach in the south.

Respons model pra-terlatih adalah sebagai berikut:

Response 0: What is the area of the city centre and surrounding parklands called in the Kaurna language?
Response 1: What is the area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language?
Response 2: What is the area of the city centre and surrounding parklands called in Kaurna?
Response 3: What is the capital city of South Australia?
Response 4: What is the area of the city centre and surrounding parklands known as in the Kaurna language?

Respons model yang disesuaikan adalah sebagai berikut:

Response 0: What is the second most populous city in Australia?
Response 1: What is the fourth most populous city in Australia?
Response 2: What is the population of Gawler?
Response 3: What is the largest city in Australia?
Response 4: What is the fifth most populous city in the world?

Data uji: Amazon EBS

Kami menggunakan konteks berikut:

Amazon Elastic Block Store (Amazon EBS) provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. You can mount these volumes as devices on your instances. EBS volumes that are attached to an instance are exposed as storage volumes that persist independently from the life of the instance. You can create a file system on top of these volumes, or use them in any way you would use a block device (such as a hard drive). You can dynamically change the configuration of a volume attached to an instance. We recommend Amazon EBS for data that must be quickly accessible and requires long-term persistence. EBS volumes are particularly well-suited for use as the primary storage for file systems, databases, or for any applications that require fine granular updates and access to raw, unformatted, block-level storage. Amazon EBS is well suited to both database-style applications that rely on random reads and writes, and to throughput-intensive applications that perform long, continuous reads and writes.

Tanggapan model pra-terlatih adalah sebagai berikut:

esponse 0: What is the difference between Amazon EBS and Amazon Elastic Block Store (Amazon EBS)?
Response 1: What is the difference between Amazon EBS and Amazon Elastic Block Store?
Response 2: What is the difference between Amazon EBS and Amazon Simple Storage Service (Amazon S3)?
Response 3: What is Amazon Elastic Block Store (Amazon EBS)?
Response 4: What is the difference between Amazon EBS and a hard drive?

Respons model yang disesuaikan adalah sebagai berikut:

Response 0: What type of applications are not well suited to Amazon EBS?
Response 1: What behaves like formatted block devices?
Response 2: What type of applications are not suited to Amazon EBS?
Response 3: What type of applications are not well suited for Amazon EBS?
Response 4: What type of applications are not suited for Amazon EBS?

Data uji: Amazon Comprehend

Kami menggunakan konteks berikut:

Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input. Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend's Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.

Tanggapan model pra-terlatih adalah sebagai berikut:

Response 0: What does Amazon Comprehend use to extract insights about the content of documents?
Response 1: How does Amazon Comprehend extract insights about the content of documents?
Response 2: What does Amazon Comprehend use to develop insights about the content of documents?
Response 3: How does Amazon Comprehend develop insights about the content of documents?
Response 4: What does Amazon Comprehend use to extract insights about the content of a document?

Respons model yang disesuaikan adalah sebagai berikut:

Response 0: What does Amazon Comprehend use to extract insights about the structure of documents?
Response 1: How does Amazon Comprehend recognize sentiments in a document?
Response 2: What does Amazon Comprehend use to extract insights about the content of social networking feeds?
Response 3: What does Amazon Comprehend use to extract insights about the content of documents?
Response 4: What type of files does Amazon Comprehend reject as input?

Perbedaan kualitas output antara model pra-terlatih dan model yang disetel halus sangat mencolok. Pertanyaan yang diberikan oleh model yang disetel dengan baik menyentuh topik yang lebih luas. Itu adalah pertanyaan yang bermakna secara sistematis, yang tidak selalu berlaku untuk model terlatih, seperti yang diilustrasikan dengan contoh Amazon EBS.

Meskipun ini bukan merupakan evaluasi formal dan sistematis, jelas bahwa proses penyempurnaan telah meningkatkan kualitas respons model pada tugas ini.

Membersihkan

Terakhir, ingatlah untuk membersihkan dan menghapus titik akhir:

# Delete resources
pre_trained_predictor.delete_model()
pre_trained_predictor.delete_endpoint()
fine_tuned_predictor.delete_model()
fine_tuned_predictor.delete_endpoint()

Kesimpulan

Dalam postingan ini, kami menunjukkan cara menggunakan instruksi fine-tuning dengan model FLAN T5 menggunakan UI Jumpstart atau notebook Jupyter yang berjalan di Studio. Kami menyediakan kode yang menjelaskan cara melatih ulang model menggunakan data untuk tugas target dan menerapkan model yang telah disesuaikan di belakang titik akhir. Tugas target dalam posting ini adalah untuk mengidentifikasi pertanyaan yang berhubungan dengan potongan teks yang disediakan dalam input tetapi tidak dapat dijawab berdasarkan informasi yang diberikan dalam teks tersebut. Kami menunjukkan bahwa model yang disesuaikan untuk tugas khusus ini memberikan hasil yang lebih baik daripada model yang telah dilatih sebelumnya.

Sekarang setelah Anda mengetahui cara menyempurnakan model dengan Jumpstart, Anda dapat membuat model canggih yang disesuaikan untuk aplikasi Anda. Kumpulkan beberapa data untuk kasus penggunaan Anda, unggah ke Amazon S3, dan gunakan UI Studio atau notebook untuk menyetel model FLAN T5!

Referensi

[1] Chung, Hyung Won, dkk. “Penskalaan model bahasa yang disesuaikan dengan instruksi.” pracetak arXiv arXiv:2210.11416 (2022).

[2] Rajpurkar, Pranav, Robin Jia, dan Percy Liang. “Ketahui Apa yang Tidak Anda Ketahui: Pertanyaan yang Tidak Dapat Dijawab untuk SQuAD.” Prosiding Pertemuan Tahunan ke-56 Asosiasi Linguistik Komputasi (Volume 2: Makalah Pendek). 2018.

Tentang penulis

Laurent Callot adalah Ilmuwan Terapan Utama dan manajer di AWS AI Labs yang telah menangani berbagai masalah pembelajaran mesin, mulai dari model dasar dan AI generatif hingga prakiraan, deteksi anomali, kausalitas, dan Operasi AI.

Andrei Kan adalah Ilmuwan Terapan Senior di AWS AI Labs dengan minat dan pengalaman di berbagai bidang Machine Learning. Ini termasuk penelitian tentang model fondasi, serta aplikasi ML untuk grafik dan deret waktu.

Dr Ashish Khetan adalah Ilmuwan Terapan Senior dengan algoritme bawaan Amazon SageMaker dan membantu mengembangkan algoritme pembelajaran mesin. Dia mendapatkan gelar PhD dari University of Illinois Urbana Champaign. Dia adalah peneliti aktif dalam pembelajaran mesin dan inferensi statistik dan telah menerbitkan banyak makalah di konferensi NeurIPS, ICML, ICLR, JMLR, ACL, dan EMNLP.

Baris Kurt adalah Ilmuwan Terapan di AWS AI Labs. Minatnya adalah deteksi anomali deret waktu dan model pondasi. Dia suka mengembangkan sistem ML yang ramah pengguna.

Jonas Kubler adalah Ilmuwan Terapan di AWS AI Labs. Dia sedang mengerjakan model dasar dengan tujuan untuk memfasilitasi aplikasi khusus kasus penggunaan.

Konten Bertenaga SEO & Distribusi PR. Dapatkan Amplifikasi Hari Ini.
PlatoAiStream. Kecerdasan Data Web3. Pengetahuan Diperkuat. Akses Di Sini.
Mencetak Masa Depan bersama Adryenn Ashley. Akses Di Sini.
Beli dan Jual Saham di Perusahaan PRE-IPO dengan PREIPO®. Akses Di Sini.
Sumber: https://aws.amazon.com/blogs/machine-learning/instruction-fine-tuning-for-flan-t5-xl-with-amazon-sagemaker-jumpstart/

Stempel Waktu: 22 Mei 2023

Stempel Waktu: Juni 20, 2023

Diterbitkan Ulang Oleh Plato

Deteksi input permusuhan menggunakan Amazon SageMaker Model Monitor dan Amazon SageMaker Debugger

Generasi Augmented Pengambilan dengan LangChain, Amazon SageMaker JumpStart, dan pencarian semantik MongoDB Atlas | Layanan Web Amazon

Membangun aplikasi AI generatif siap produksi untuk pencarian perusahaan menggunakan jaringan pipa Haystack dan Amazon SageMaker JumpStart dengan LLM | Layanan Web Amazon

Rekonstruksi pencitraan medis berbasis cloud menggunakan jaringan saraf dalam

Memperkenalkan visualisasi tersemat baru Amazon SageMaker Data Wrangler

InformedIQ mengotomatiskan verifikasi untuk pinjaman otomatis Origence menggunakan pembelajaran mesin

Kurangi konsumsi energi beban kerja pembelajaran mesin Anda hingga 90% dengan akselerator AWS yang dibuat khusus | Layanan Web Amazon

Tentang Kami

Pencarian Vertikal & Ai

Platform

Tetap Berhubung

Akun