Amazon Lex is excited to announce Test Workbench, a new bot testing solution that provides tools to simplify and automate the bot testing process. During bot development, testing is the phase where developers check whether a bot meets the specific requirements, needs and expectations by identifying errors, defects, or bugs in the system before scaling. Testing helps validate bot performance on several fronts such as conversational flow (understanding user queries and responding accurately), intent overlap handling, and consistency across modalities. However, testing is often manual, error-prone, and non-standardized. Test Workbench standardizes automated test management by allowing chatbot development teams to generate, maintain, and execute test sets with a consistent methodology and avoid custom scripting and ad-hoc integrations. In this post, you will learn how Test Workbench streamlines automated testing of a bot’s voice and text modalities and provides accuracy and performance measures for parameters such as audio transcription, intent recognition, and slot resolution for both single utterance inputs and multi-turn conversations. This allows you to quickly identify bot improvement areas and maintain a consistent baseline to measure accuracy over time and observe any accuracy regression due to bot updates.
Amazon Lex is a fully managed service for building conversational voice and text interfaces. Amazon Lex helps you build and deploy chatbots and virtual assistants on websites, contact center services, and messaging channels. Amazon Lex bots help increase interactive voice response (IVR) productivity, automate simple tasks, and drive operational efficiencies across the organization. Test Workbench for Amazon Lex standardizes and simplifies the bot testing lifecycle, which is critical to improving bot design.
Features of Test Workbench
Test Workbench for Amazon Lex includes the following features:
- Generate test datasets automatically from a bot’s conversation logs
- Upload manually built test set baselines
- Perform end-to-end testing of single input or multi-turn conversations
- Test both audio and text modalities of a bot
- Review aggregated and drill-down metrics for bot dimensions:
- Transcrierea vorbirii
- Recunoașterea intenției
- Slot resolution (including multi-valued slots or composite slots)
- Context tags
- Atributele sesiunii
- Request attributes
- Runtime hints
- Time delay in seconds
Cerințe preliminare
To test this feature, you should have the following:
In addition, you should have knowledge and understanding of the following services and features:
Create a test set
To create your test set, complete the following steps:
- Pe Consola Amazon Lex, sub Test workbench în panoul de navigare, alegeți Test sets.
You can review a list of existing test sets, including basic information such as name, description, number of test inputs, modality, and status. In the following steps, you can choose between generating a test set from the conversation logs associated with the bot or uploading an existing manually built test set in a CSV file format.
- Alege Creați un set de testare.
- Generating test sets from conversation logs allows you to do the following:
- Include real multi-turn conversations from the bot’s logs in CloudWatch
- Include audio logs and conduct tests that account for real speech nuances, background noises, and accents
- Speed up the creation of test sets
- Uploading a manually built test set allows you to do the following:
- Test new bots for which there is no production data
- Perform regression tests on existing bots for any new or modified intents, slots, and conversation flows
- Test carefully crafted and detailed scenarios that specify session attributes and request attributes
To generate a test set, complete the following steps. To upload a manually built test set, skip to step 7.
- Alege Generate a baseline test set.
- Choose your options for Numele botului, Bot alias, și Limbă.
- Pentru Interval de timp, set a time range for the logs.
- Pentru Rol IAM existent, alege un rol.
Ensure that the IAM role is able to grant you access to retrieve information from the conversation logs. Refer to Creating IAM roles to create an IAM role with the appropriate policy.
- If you prefer to use a manually created test set, select Upload a file to this test set.
- Pentru Upload a file to this test set, choose from the following options:
- Selectați Upload from S3 bucket to upload a CSV file from an Serviciul Amazon de stocare simplă Găleată (Amazon S3)
- Selectați Upload a file to this test set to upload a CSV file from your computer.
Aveți posibilitatea să utilizați sample test set provided in this post. For more information about templates, choose the CSV Template link on the page.
- Pentru Modalitatea, select the modality of your test set, either Text or Audio.
Test Workbench provides testing support for audio and text input formats.
- Pentru Locația S3, enter the S3 bucket location where the results will be stored.
- Optionally, choose an AWS Service Management Service (AWS KMS) key to encrypt output transcripts.
- Alege Crea.
Your newly created test set will be listed on the Test sets page with one of the following statuses:
- Ready for annotation – For test sets generated from Amazon Lex bot conversation logs, the annotation step serves as a manual gating mechanism to ensure quality test inputs. By annotating values for expected intents and expected slots for each test line item, you indicate the “ground truth” for that line. The test results from the bot run are collected and compared against the ground truth to mark test results as pass or fail. This line level comparison then allows for creating aggregated measures.
- Ready for testing – This indicates that the test set is ready to be executed against an Amazon Lex bot.
- Validation error – Uploaded test files are checked for errors such as exceeding maximum supported length, invalid characters in intent names, or invalid Amazon S3 links containing audio files. If the test set is in the Eroare de validatie state, download the file showing the validation details to see test input issues or errors on a line-by-line basis. Once they are addressed, you can manually upload the corrected test set CSV into the test set.
Executing a test set
A test set is de-coupled from a bot. The same test set can be executed against a different bot or bot alias in the future as your business use case evolves. To report performance metrics of a bot against the baseline test data, complete the following steps:
- Import the sample bot definition and build the bot (refer to Importing a bot pentru îndrumare).
- În consola Amazon Lex, alegeți Test sets în panoul de navigare.
- Choose your validated test set.
Here you can review basic information about the test set and the imported test data.
- Alege Executați testul.
- Choose the appropriate options for Numele botului, Bot alias, și Limbă.
- Pentru Tipul testului, Selectați Audio or Text.
- Pentru Endpoint selection, selectați fie streaming or Non-streaming.
- Alege Validate discrepancy to validate your test dataset.
Before executing a test set, you can validate test coverage, including identifying intents and slots present in the test set but not in the bot. This early warning serves to set tester expectation for unexpected test failures. If discrepancies between your test dataset and your bot are detected, the Executați testul page will update with the Vezi detalii butonul.
Intents and slots found in the test data set but not in the bot alias are listed as shown in the following screenshots.
- After you validate the discrepancies, choose Executa pentru a rula testul.
Examinați rezultatele
The performance measures generated after executing a test set help you identify areas of bot design that need improvements and are useful for expediting bot development and delivery to support your customers. Test Workbench provides insights on intent classification and slot resolution in end-to-end conversation and single-line input level. The completed test runs are stored with timestamps in your S3 bucket, and can be used for future comparative reviews.
- În consola Amazon Lex, alegeți Rezultatele testului în panoul de navigare.
- Choose the test result ID for the results you want to review.
On the next page, the test results will include a breakdown of results organized in four main tabs: Overall results, Conversation results, Intent and slot results, și Detailed results.
Rezultate generale
The Overall results tab contains three main sections:
- Test set input breakdown — A chart showing the total number of end-to-end conversations and single input utterances in the test set.
- Single input breakdown — A chart showing the number of passed or failed single inputs.
- Conversation breakdown — A chart showing the number of passed or failed multi-turn inputs.
For test sets run in audio modality, speech transcription charts are provided to show the number of passed or failed speech transcriptions on both single input and conversation types. In audio modality, a single input or multi-turn conversation could pass the speech transcription test, yet fail the overall end-to-end test. This can be caused, for instance, by a slot resolution or an intent recognition issue.
Conversation results
Test Workbench helps you drill down into conversation failures that can be attributed to specific intents or slots. The Conversation results tab is organized into three main areas, covering all intents and slots used in the test set:
- Conversation pass rates — A table used to visualize which intents and slots are responsible for possible conversation failures.
- Conversation intent failure metrics — A bar graph showing the top five worst performing intents in the test set, if any.
- Conversation slot failure metrics — A bar graph showing the top five worst performing slots in the test set, if any.
Intent and slot results
The Intent and slot results tab provides drill-down metrics for bot dimensions such as intent recognition and slot resolution.
- Intent recognition metrics — A table showing the intent recognition success rate.
- Slot resolution metrics — A table showing the slot resolution success rate, by
Rezultate detaliate
You can access a detailed report of the executed test run on the Detailed results tab. A table is displayed to show the actual transcription, output intent, and slot values in a test set. The report can be downloaded as a CSV for further analysis.
The line-level output provides insights to help improve the bot design and boost accuracy. For instance, misrecognized or missed speech inputs such as branded words can be added to custom vocabulary of an intent or as utterances under an intent.
In order to further improve conversation design, you can refer to acest post, outlining best practices on using ML to create a bot that will delight your customers by accurately understanding them.
Concluzie
In this post, we presented the Test Workbench for Amazon Lex, a native capability that standardizes a chatbot automated testing process and allows developers and conversation designers to streamline and iterate quickly through bot design and development.
We look forward to hearing how you use this new functionality of Amazon Lex and welcome feedback! For any questions, bugs, or feature requests, please reach us through AWS re:Post pentru Amazon Lex or your AWS Support contacts.
Pentru a afla mai multe, vedeți Întrebări frecvente Amazon Lex si Ghid pentru dezvoltatori Amazon Lex V2.
Despre autori
Sandeep Srinivasan este manager de produs în echipa Amazon Lex. În calitate de observator pasionat al comportamentului uman, este pasionat de experiența clienților. El își petrece orele de veghe la intersecția dintre oameni, tehnologie și viitor.
Grazia Russo Lassner este consultant senior al echipei AWS Professional Services Natural Language AI. Ea este specializată în proiectarea și dezvoltarea de soluții conversaționale AI folosind tehnologii AWS pentru clienți din diverse industrii. În afara serviciului, îi place weekendurile pe plajă, citind cele mai recente cărți de ficțiune și familia.
- Distribuție de conținut bazat pe SEO și PR. Amplifică-te astăzi.
- EVM Finance. Interfață unificată pentru finanțare descentralizată. Accesați Aici.
- Grupul Quantum Media. IR/PR amplificat. Accesați Aici.
- PlatoAiStream. Web3 Data Intelligence. Cunoștințe amplificate. Accesați Aici.
- Sursa: https://aws.amazon.com/blogs/machine-learning/expedite-the-amazon-lex-chatbot-development-lifecycle-with-test-workbench/
- :este
- :nu
- :Unde
- $UP
- 10
- 100
- 500
- 610
- 7
- 9
- a
- Capabil
- Despre Noi
- acces
- Cont
- precizie
- precis
- peste
- curent
- adăugat
- plus
- După
- împotriva
- AI
- TOATE
- Permiterea
- permite
- Amazon
- Amazon Lex
- Amazon Web Services
- an
- analiză
- și
- anunța
- Orice
- adecvat
- SUNT
- domenii
- AS
- asistenți
- asociate
- At
- atribute
- audio
- automatizarea
- Automata
- în mod automat
- evita
- AWS
- Servicii profesionale AWS
- fundal
- bar
- De bază
- de bază
- bază
- BE
- Plajă
- înainte
- CEL MAI BUN
- Cele mai bune practici
- între
- Manuale
- a stimula
- Bot
- atât
- roboţii
- marcă
- Defalcarea
- gandaci
- construi
- Clădire
- construit
- afaceri
- dar
- buton
- by
- CAN
- cu grijă
- caz
- cauzată
- Centru
- canale
- caractere
- Diagramă
- Grafice
- chatbot
- chatbots
- verifica
- verificat
- Alege
- clasificare
- comparație
- comparație
- Completă
- Terminat
- calculator
- Conduce
- consistent
- Consoleze
- consultant
- contactați-ne
- centru de contact
- contacte
- conține
- Conversație
- de conversaţie
- AI de conversație
- conversații
- corectat
- ar putea
- acoperire
- acoperire
- crea
- a creat
- Crearea
- creaţie
- critic
- personalizat
- client
- experienta clientului
- clienţii care
- de date
- set de date
- seturi de date
- întârziere
- încânta
- livrare
- implementa
- descriere
- Amenajări
- Designerii
- proiect
- detaliat
- detalii
- detectat
- Dezvoltator
- Dezvoltatorii
- în curs de dezvoltare
- Dezvoltare
- diferit
- Dimensiuni
- do
- jos
- Descarca
- conduce
- două
- în timpul
- fiecare
- Devreme
- eficiență
- oricare
- un capăt la altul
- asigura
- Intrați
- eroare
- Erori
- evoluează
- depășire
- excitat
- a executa
- executat
- executând
- existent
- aşteptare
- aşteptări
- de aşteptat
- experienţă
- FAIL
- A eșuat
- Eșec
- familie
- Caracteristică
- DESCRIERE
- Ficţiune
- Fișier
- Fişiere
- debit
- următor
- Pentru
- format
- Înainte
- găsit
- patru
- din
- complet
- funcționalitate
- mai mult
- viitor
- genera
- generată
- generator
- acordarea
- grafic
- Teren
- îndrumare
- Manipularea
- Avea
- he
- auz
- ajutor
- ajută
- lui
- ORE
- Cum
- Totuși
- HTML
- http
- HTTPS
- uman
- ID
- identifica
- identificarea
- if
- importatoare
- îmbunătăţi
- îmbunătățire
- îmbunătățiri
- îmbunătățirea
- in
- include
- include
- Inclusiv
- Crește
- indica
- indică
- industrii
- informații
- intrare
- intrări
- perspective
- instanță
- integrările
- scop
- interactiv
- interfeţe
- intersecție
- în
- problema
- probleme de
- jpg
- pasionat
- Cheie
- cunoştinţe
- limbă
- Ultimele
- AFLAȚI
- Lungime
- Nivel
- ciclu de viață
- Linie
- LINK
- Link-uri
- Listă
- listat
- locaţie
- Uite
- Principal
- menține
- gestionate
- administrare
- manager
- manual
- manual
- marca
- maxim
- măsura
- măsuri
- mecanism
- se intalneste
- mesagerie
- Metodologie
- Metrici
- ratat
- ML
- modificată
- mai mult
- nume
- nume
- nativ
- Natural
- Navigare
- Nevoie
- nevoilor
- Nou
- recent
- următor
- Nu.
- număr
- observa
- of
- de multe ori
- on
- dată
- ONE
- operațional
- Opţiuni
- or
- comandă
- organizație
- Organizat
- producție
- exterior
- peste
- global
- pagină
- pâine
- parametrii
- trece
- Trecut
- pasionat
- oameni
- performanță
- efectuarea
- fază
- Plato
- Informații despre date Platon
- PlatoData
- "vă rog"
- Politica
- posibil
- Post
- practicile
- a prefera
- prezenta
- prezentat
- proces
- Produs
- manager de produs
- producere
- productivitate
- profesional
- prevăzut
- furnizează
- calitate
- interogări
- Întrebări
- repede
- gamă
- rată
- RE
- ajunge
- Citind
- gata
- real
- recunoaştere
- raportează
- solicita
- cereri de
- Cerinţe
- Rezoluţie
- răspunde
- răspuns
- responsabil
- rezultat
- REZULTATE
- revizuiască
- Recenzii
- Rol
- Alerga
- acelaşi
- scalare
- scenarii
- capturi de ecran
- secțiuni
- vedea
- senior
- servește
- serviciu
- Servicii
- sesiune
- set
- Seturi
- câteva
- ea
- să
- Arăta
- indicat
- simplu
- simplifica
- singur
- sloturi
- soluţie
- soluţii
- specializată
- specific
- discurs
- Stat
- Stare
- Pas
- paşi
- depozitare
- stocate
- simplifica
- succes
- astfel de
- a sustine
- Suportat
- sistem
- tabel
- sarcini
- echipă
- echipe
- Tehnologii
- Tehnologia
- şabloane
- test
- Testarea
- teste
- acea
- Viitorul
- Lor
- apoi
- Acolo.
- ei
- acest
- trei
- Prin
- timp
- la
- Unelte
- top
- Total
- Adevăr
- Tipuri
- în
- înţelegere
- Neașteptat
- Actualizează
- actualizări
- Se încarcă
- us
- utilizare
- carcasa de utilizare
- utilizat
- Utilizator
- folosind
- VALIDA
- validate
- validare
- Valori
- diverse
- Virtual
- Voce
- vrea
- de avertizare
- we
- web
- servicii web
- site-uri web
- bun venit
- dacă
- care
- voi
- cu
- cuvinte
- Apartamente
- Mini rulouri de absorbție
- încă
- Tu
- Ta
- zephyrnet
- Zip