LangChain：完整指南和教程

由柏拉图重新发布

关注： 0

其核心，浪链是一个专为构建利用语言模型功能的应用程序而定制的创新框架。它是一个工具包，专为开发人员创建具有上下文感知且能够进行复杂推理的应用程序而设计。

这意味着LangChain应用程序可以理解上下文，例如提示指令或内容基础响应，并使用语言模型进行复杂的推理任务，例如决定如何响应或采取什么操作。 LangChain代表了一种开发智能应用程序的统一方法，通过其多样化的组件简化了从概念到执行的过程。

了解浪链

LangChain不仅仅是一个框架；它是一个成熟的生态系统，由几个组成部分组成。

首先，有Python和JavaScript版本的LangChain库。这些库是LangChain的支柱，为各种组件提供接口和集成。它们提供了一个基本的运行时，用于将这些组件组合成内聚链和代理，以及可供立即使用的现成实现。
接下来，我们有 LangChain 模板。这些是针对各种任务量身定制的可部署参考架构的集合。无论您是构建聊天机器人还是复杂的分析工具，这些模板都提供了坚实的起点。
LangServe 作为一个多功能库介入，用于将 LangChain 链部署为 REST API。该工具对于将 LangChain 项目转变为可访问且可扩展的 Web 服务至关重要。
最后，LangSmith 充当开发者平台。它旨在调试、测试、评估和监控基于任何 LLM 框架构建的链。与LangChain的无缝集成使其成为开发者完善其应用程序不可或缺的工具。

这些组件一起使您能够轻松开发、生产和部署应用程序。使用 LangChain，您首先使用库编写应用程序，并引用模板作为指导。然后，LangSmith 帮助您检查、测试和监控您的链，确保您的应用程序不断改进并准备好部署。最后，借助 LangServe，您可以轻松地将任何链转换为 API，从而使部署变得轻而易举。

在接下来的部分中，我们将更深入地探讨如何设置 LangChain，并开始您创建由语言模型驱动的智能应用程序的旅程。

使用由 Nanonets 为您和您的团队设计的人工智能驱动的工作流程构建器，自动执行手动任务和工作流程。

安装与设定

你准备好进入浪链的世界了吗？设置非常简单，本指南将引导您逐步完成该过程。

LangChain之旅的第一步是安装它。您可以使用 pip 或 conda 轻松完成此操作。在终端中运行以下命令：

pip install langchain

对于那些喜欢最新功能并喜欢更多冒险的人，您可以直接从源代码安装 LangChain。克隆存储库并导航到 langchain/libs/langchain 目录。然后，运行：

pip install -e .

对于实验性功能，请考虑安装 langchain-experimental。它是一个包含尖端代码的软件包，旨在用于研究和实验目的。使用以下命令安装它：

pip install langchain-experimental

LangChain CLI 是一个用于处理 LangChain 模板和 LangServe 项目的便捷工具。要安装 LangChain CLI，请使用：

pip install langchain-cli

LangServe 对于将 LangChain 链部署为 REST API 至关重要。它与 LangChain CLI 一起安装。

LangChain 通常需要与模型提供者、数据存储、API 等集成。在本例中，我们将使用 OpenAI 的模型 API。使用以下命令安装 OpenAI Python 包：

pip install openai

要访问 API，请将 OpenAI API 密钥设置为环境变量：

export OPENAI_API_KEY="your_api_key"

或者，直接在 python 环境中传递密钥：

import os
os.environ['OPENAI_API_KEY'] = 'your_api_key'

LangChain允许通过模块创建语言模型应用程序。这些模块可以独立使用，也可以针对复杂的用例进行组合。这些模块是 –

模型输入/输出：促进与各种语言模型的交互，有效地处理它们的输入和输出。
恢复：允许访问特定于应用程序的数据并与之交互，这对于动态数据利用至关重要。
中介代理：使应用程序能够根据高层指令选择合适的工具，增强决策能力。
链条：提供预定义的、可重复使用的组合，作为应用程序开发的构建块。
内存：跨多个链执行维护应用程序状态，这对于上下文感知交互至关重要。

每个模块都针对特定的开发需求，使 LangChain 成为创建高级语言模型应用程序的综合工具包。

除了上述组件之外，我们还有 浪链表达语言（LCEL），这是一种轻松地将模块组合在一起的声明式方式，并且可以使用通用的 Runnable 接口来链接组件。

LCEL 看起来像这样 –

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import BaseOutputParser # Example chain
chain = ChatPromptTemplate() | ChatOpenAI() | CustomOutputParser()

现在我们已经介绍了基础知识，我们将继续：

详细深入研究每个 Langchain 模块。
了解如何使用LangChain表达式语言。
探索常见用例并实施它们。
使用 LangServe 部署端到端应用程序。
查看 LangSmith 进行调试、测试和监控。

让我们开始吧！

模块 I：模型 I/O

在LangChain中，任何应用程序的核心元素都是围绕语言模型展开的。该模块提供了与任何语言模型有效交互的基本构建块，确保无缝集成和通信。

模型 I/O 的关键组件

LLM 和聊天模型（可互换使用）：
- 法学硕士：
  - 定义：纯文本完成模型。
  - 输入输出：将文本字符串作为输入，返回文本字符串作为输出。
- 聊天模型

定义：使用语言模型作为基础但输入和输出格式不同的模型。
输入输出：接受聊天消息列表作为输入并返回聊天消息。

提示：模板化、动态选择和管理模型输入。允许创建灵活且针对特定上下文的提示来指导语言模型的响应。
输出解析器：从模型输出中提取并格式化信息。对于将语言模型的原始输出转换为应用程序所需的结构化数据或特定格式非常有用。

法学硕士

LangChain 与 OpenAI、Cohere 和 Hugging Face 等大型语言模型 (LLM) 的集成是其功能的一个基本方面。 LangChain本身不托管LLM，但提供统一的接口与各种LLM交互。

本节概述了在 LangChain 中使用 OpenAI LLM 包装器，也适用于其他 LLM 类型。我们已经在“入门”部分安装了它。让我们初始化 LLM。

from langchain.llms import OpenAI
llm = OpenAI()

法学硕士实施可运行的界面，基本构建块浪链表达语言（LCEL）。这意味着他们支持 invoke, ainvoke, stream, astream, batch, abatch, astream_log 调用。
LLM 接受 字符串 作为输入或可以强制为字符串提示的对象，包括 List[BaseMessage] 和 PromptValue。（稍后会详细介绍这些）

让我们看一些例子。

response = llm.invoke("List the seven wonders of the world.")
print(response)

您也可以调用 Stream 方法来流式传输文本响应。

for chunk in llm.stream("Where were the 2012 Olympics held?"): print(chunk, end="", flush=True)

聊天模型

LangChain 与聊天模型（语言模型的一种特殊变体）的集成对于创建交互式聊天应用程序至关重要。虽然它们在内部使用语言模型，但聊天模型提供了一个以聊天消息为中心的独特界面作为输入和输出。本节详细介绍了在LangChain中使用OpenAI的聊天模型。

from langchain.chat_models import ChatOpenAI
chat = ChatOpenAI()

LangChain 中的聊天模型可处理不同的消息类型，例如 AIMessage, HumanMessage, SystemMessage, FunctionMessage及 ChatMessage （具有任意角色参数）。一般来说， HumanMessage, AIMessage及 SystemMessage 是最常用的。

聊天模型主要接受 List[BaseMessage] 作为输入。字符串可以转换为 HumanMessage及 PromptValue 也受支持。

from langchain.schema.messages import HumanMessage, SystemMessage
messages = [ SystemMessage(content="You are Micheal Jordan."), HumanMessage(content="Which shoe manufacturer are you associated with?"),
]
response = chat.invoke(messages)
print(response.content)

提示

提示对于指导语言模型生成相关且连贯的输出至关重要。它们的范围可以从简单的说明到复杂的少量示例。在LangChain中，由于有几个专用的类和函数，处理提示可以是一个非常简化的过程。

LangChain的 PromptTemplate class 是用于创建字符串提示的多功能工具。它使用Python的 str.format 语法，允许动态提示生成。您可以使用占位符定义模板，并根据需要填充特定值。

from langchain.prompts import PromptTemplate # Simple prompt with placeholders
prompt_template = PromptTemplate.from_template( "Tell me a {adjective} joke about {content}."
) # Filling placeholders to create a prompt
filled_prompt = prompt_template.format(adjective="funny", content="robots")
print(filled_prompt)

对于聊天模型，提示更加结构化，涉及具有特定角色的消息。浪链优惠 ChatPromptTemplate 为了这个。

from langchain.prompts import ChatPromptTemplate # Defining a chat prompt with various roles
chat_template = ChatPromptTemplate.from_messages( [ ("system", "You are a helpful AI bot. Your name is {name}."), ("human", "Hello, how are you doing?"), ("ai", "I'm doing well, thanks!"), ("human", "{user_input}"), ]
) # Formatting the chat prompt
formatted_messages = chat_template.format_messages(name="Bob", user_input="What is your name?")
for message in formatted_messages: print(message)

这种方法可以创建具有动态响应的交互式、引人入胜的聊天机器人。

以上皆是 PromptTemplate 和 ChatPromptTemplate 与 LangChain 表达式语言（LCEL）无缝集成，使它们能够成为更大、复杂工作流程的一部分。我们稍后将对此进行更多讨论。

自定义提示模板有时对于需要独特格式或特定说明的任务至关重要。创建自定义提示模板涉及定义输入变量和自定义格式化方法。这种灵活性使 LangChain 能够满足各种特定应用程序的需求。在这里阅读更多。

LangChain还支持few-shot提示，使模型能够从示例中学习。此功能对于需要上下文理解或特定模式的任务至关重要。可以根据一组示例或利用示例选择器对象来构建少镜头提示模板。在这里阅读更多。

输出解析器

输出解析器在 Langchain 中发挥着至关重要的作用，使用户能够构建语言模型生成的响应。在本节中，我们将探讨输出解析器的概念，并提供使用 Langchain 的 PydanticOutputParser、SimpleJsonOutputParser、CommaSeparatedListOutputParser、DatetimeOutputParser 和 XMLOutputParser 的代码示例。

Pydantic输出解析器

Langchain 提供了 PydanticOutputParser，用于将响应解析为 Pydantic 数据结构。下面是如何使用它的分步示例：

from typing import List
from langchain.llms import OpenAI
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.pydantic_v1 import BaseModel, Field, validator # Initialize the language model
model = OpenAI(model_name="text-davinci-003", temperature=0.0) # Define your desired data structure using Pydantic
class Joke(BaseModel): setup: str = Field(description="question to set up a joke") punchline: str = Field(description="answer to resolve the joke") @validator("setup") def question_ends_with_question_mark(cls, field): if field[-1] != "?": raise ValueError("Badly formed question!") return field # Set up a PydanticOutputParser
parser = PydanticOutputParser(pydantic_object=Joke) # Create a prompt with format instructions
prompt = PromptTemplate( template="Answer the user query.n{format_instructions}n{query}n", input_variables=["query"], partial_variables={"format_instructions": parser.get_format_instructions()},
) # Define a query to prompt the language model
query = "Tell me a joke." # Combine prompt, model, and parser to get structured output
prompt_and_model = prompt | model
output = prompt_and_model.invoke({"query": query}) # Parse the output using the parser
parsed_result = parser.invoke(output) # The result is a structured object
print(parsed_result)

输出将是：

SimpleJson输出解析器

当您想要解析类似 JSON 的输出时，可以使用 Langchain 的 SimpleJsonOutputParser。这是一个例子：

from langchain.output_parsers.json import SimpleJsonOutputParser # Create a JSON prompt
json_prompt = PromptTemplate.from_template( "Return a JSON object with `birthdate` and `birthplace` key that answers the following question: {question}"
) # Initialize the JSON parser
json_parser = SimpleJsonOutputParser() # Create a chain with the prompt, model, and parser
json_chain = json_prompt | model | json_parser # Stream through the results
result_list = list(json_chain.stream({"question": "When and where was Elon Musk born?"})) # The result is a list of JSON-like dictionaries
print(result_list)

逗号分隔列表输出解析器

当您想要从模型响应中提取逗号分隔列表时，CommaSeparatedListOutputParser 非常方便。这是一个例子：

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI # Initialize the parser
output_parser = CommaSeparatedListOutputParser() # Create format instructions
format_instructions = output_parser.get_format_instructions() # Create a prompt to request a list
prompt = PromptTemplate( template="List five {subject}.n{format_instructions}", input_variables=["subject"], partial_variables={"format_instructions": format_instructions}
) # Define a query to prompt the model
query = "English Premier League Teams" # Generate the output
output = model(prompt.format(subject=query)) # Parse the output using the parser
parsed_result = output_parser.parse(output) # The result is a list of items
print(parsed_result)

日期时间输出解析器

Langchain 的 DatetimeOutputParser 旨在解析日期时间信息。使用方法如下：

from langchain.prompts import PromptTemplate
from langchain.output_parsers import DatetimeOutputParser
from langchain.chains import LLMChain
from langchain.llms import OpenAI # Initialize the DatetimeOutputParser
output_parser = DatetimeOutputParser() # Create a prompt with format instructions
template = """
Answer the user's question:
{question}
{format_instructions} """ prompt = PromptTemplate.from_template( template, partial_variables={"format_instructions": output_parser.get_format_instructions()},
) # Create a chain with the prompt and language model
chain = LLMChain(prompt=prompt, llm=OpenAI()) # Define a query to prompt the model
query = "when did Neil Armstrong land on the moon in terms of GMT?" # Run the chain
output = chain.run(query) # Parse the output using the datetime parser
parsed_result = output_parser.parse(output) # The result is a datetime object
print(parsed_result)

这些示例展示了如何使用 Langchain 的输出解析器来构建各种类型的模型响应，使它们适合不同的应用程序和格式。输出解析器是增强 Langchain 语言模型输出的可用性和可解释性的宝贵工具。

使用由 Nanonets 为您和您的团队设计的人工智能驱动的工作流程构建器，自动执行手动任务和工作流程。

模块二：检索

LangChain 中的检索在需要用户特定数据（不包含在模型训练集中）的应用程序中起着至关重要的作用。这个过程称为检索增强生成（RAG），涉及获取外部数据并将其集成到语言模型的生成过程中。 LangChain 提供了一套全面的工具和功能来促进这一过程，满足简单和复杂的应用程序。

LangChain通过一系列组件来实现检索，我们将一一讨论。

文档加载器

LangChain 中的文档加载器可以从各种来源提取数据。它们拥有 100 多个可用的加载程序，支持一系列文档类型、应用程序和来源（私有 s3 存储桶、公共网站、数据库）。

您可以根据您的要求选择文档加载器相关信息.

所有这些加载器将数据摄取到文件类。稍后我们将学习如何使用摄取到 Document 类中的数据。

文本文件加载器： 加载一个简单的 .txt 文件到一个文档中。

from langchain.document_loaders import TextLoader loader = TextLoader("./sample.txt")
document = loader.load()

CSV 加载器： 将 CSV 文件加载到文档中。

from langchain.document_loaders.csv_loader import CSVLoader loader = CSVLoader(file_path='./example_data/sample.csv')
documents = loader.load()

我们可以选择通过指定字段名称来自定义解析——

loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', csv_args={ 'delimiter': ',', 'quotechar': '"', 'fieldnames': ['MLB Team', 'Payroll in millions', 'Wins']
})
documents = loader.load()

PDF 加载器： LangChain中的PDF加载器提供了多种从PDF文件中解析和提取内容的方法。每个加载器满足不同的需求并使用不同的底层库。下面是每个加载器的详细示例。

PyPDFLoader 用于基本的 PDF 解析。

from langchain.document_loaders import PyPDFLoader loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()

MathPixLoader 非常适合提取数学内容和图表。

from langchain.document_loaders import MathpixPDFLoader loader = MathpixPDFLoader("example_data/math-content.pdf")
data = loader.load()

PyMuPDFLoader 速度很快，并且包含详细的元数据提取。

from langchain.document_loaders import PyMuPDFLoader loader = PyMuPDFLoader("example_data/layout-parser-paper.pdf")
data = loader.load() # Optionally pass additional arguments for PyMuPDF's get_text() call
data = loader.load(option="text")

PDFMiner Loader 用于对文本提取进行更精细的控制。

from langchain.document_loaders import PDFMinerLoader loader = PDFMinerLoader("example_data/layout-parser-paper.pdf")
data = loader.load()

AmazonTextractPDFParser 利用 AWS Textract 实现 OCR 和其他高级 PDF 解析功能。

from langchain.document_loaders import AmazonTextractPDFLoader # Requires AWS account and configuration
loader = AmazonTextractPDFLoader("example_data/complex-layout.pdf")
documents = loader.load()

PDFMinerPDFasHTMLLoader 从 PDF 生成 HTML 以进行语义解析。

from langchain.document_loaders import PDFMinerPDFasHTMLLoader loader = PDFMinerPDFasHTMLLoader("example_data/layout-parser-paper.pdf")
data = loader.load()

PDFPlumberLoader 提供详细的元数据并支持每页一个文档。

from langchain.document_loaders import PDFPlumberLoader loader = PDFPlumberLoader("example_data/layout-parser-paper.pdf")
data = loader.load()

集成装载机： LangChain 提供了各种自定义加载器，可以直接从您的应用程序（例如 Slack、Sigma、Notion、Confluence、Google Drive 等）和数据库加载数据，并在 LLM 应用程序中使用它们。

完整列表是相关信息.

下面是几个例子来说明这一点 –

示例一——Slack

Slack 是一种广泛使用的即时消息平台，可以集成到 LLM 工作流程和应用程序中。

转到 Slack 工作区管理页面。
导航 {your_slack_domain}.slack.com/services/export.
选择所需的日期范围并启动导出。
导出准备就绪后，Slack 会通过电子邮件和 DM 通知您。
导出结果为 .zip 文件位于您的下载文件夹或指定的下载路径中。
指定下载的路径 .zip 文件以 LOCAL_ZIPFILE.
使用 SlackDirectoryLoader 来自 langchain.document_loaders 包。

from langchain.document_loaders import SlackDirectoryLoader SLACK_WORKSPACE_URL = "https://xxx.slack.com" # Replace with your Slack URL
LOCAL_ZIPFILE = "" # Path to the Slack zip file loader = SlackDirectoryLoader(LOCAL_ZIPFILE, SLACK_WORKSPACE_URL)
docs = loader.load()
print(docs)

示例二——Figma

Figma 是一种流行的界面设计工具，提供用于数据集成的 REST API。

从 URL 格式获取 Figma 文件密钥： https://www.figma.com/file/{filekey}/sampleFilename.
节点 ID 可在 URL 参数中找到 ?node-id={node_id}.
按照以下说明生成访问令牌 Figma 帮助中心.
FigmaFileLoader 班级来自 langchain.document_loaders.figma 用于加载 Figma 数据。
各种LangChain模块，例如 CharacterTextSplitter, ChatOpenAI等进行加工。

import os
from langchain.document_loaders.figma import FigmaFileLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain.indexes import VectorstoreIndexCreator
from langchain.chains import ConversationChain, LLMChain
from langchain.memory import ConversationBufferWindowMemory
from langchain.prompts.chat import ChatPromptTemplate, SystemMessagePromptTemplate, AIMessagePromptTemplate, HumanMessagePromptTemplate figma_loader = FigmaFileLoader( os.environ.get("ACCESS_TOKEN"), os.environ.get("NODE_IDS"), os.environ.get("FILE_KEY"),
) index = VectorstoreIndexCreator().from_loaders([figma_loader])
figma_doc_retriever = index.vectorstore.as_retriever()

generate_code 函数使用 Figma 数据创建 HTML/CSS 代码。
它采用基于 GPT 模型的模板化对话。

def generate_code(human_input): # Template for system and human prompts system_prompt_template = "Your coding instructions..." human_prompt_template = "Code the {text}. Ensure it's mobile responsive" # Creating prompt templates system_message_prompt = SystemMessagePromptTemplate.from_template(system_prompt_template) human_message_prompt = HumanMessagePromptTemplate.from_template(human_prompt_template) # Setting up the AI model gpt_4 = ChatOpenAI(temperature=0.02, model_name="gpt-4") # Retrieving relevant documents relevant_nodes = figma_doc_retriever.get_relevant_documents(human_input) # Generating and formatting the prompt conversation = [system_message_prompt, human_message_prompt] chat_prompt = ChatPromptTemplate.from_messages(conversation) response = gpt_4(chat_prompt.format_prompt(context=relevant_nodes, text=human_input).to_messages()) return response # Example usage
response = generate_code("page top header")
print(response.content)

generate_code 函数在执行时会根据 Figma 设计输入返回 HTML/CSS 代码。

现在让我们利用我们的知识来创建一些文档集。

我们首先加载 PDF，即 BCG 年度可持续发展报告。

为此，我们使用 PyPDFLoader。

from langchain.document_loaders import PyPDFLoader loader = PyPDFLoader("bcg-2022-annual-sustainability-report-apr-2023.pdf")
pdfpages = loader.load_and_split()

我们现在将从 Airtable 获取数据。我们有一个 Airtable，其中包含有关各种 OCR 和数据提取模型的信息 –

让我们使用集成加载器列表中的 AirtableLoader 来实现此目的。

from langchain.document_loaders import AirtableLoader api_key = "XXXXX"
base_id = "XXXXX"
table_id = "XXXXX" loader = AirtableLoader(api_key, table_id, base_id)
airtabledocs = loader.load()

现在让我们继续学习如何使用这些文档类。

文档转换器

LangChain 中的文档转换器是我们在上一小节中创建的用于操作文档的重要工具。

它们用于将长文档分割成较小的块、组合和过滤等任务，这对于使文档适应模型的上下文窗口或满足特定应用程序需求至关重要。

其中一个工具是 RecursiveCharacterTextSplitter，这是一种使用字符列表进行拆分的多功能文本拆分器。它允许使用块大小、重叠和起始索引等参数。下面是一个如何在 Python 中使用它的示例：

from langchain.text_splitter import RecursiveCharacterTextSplitter state_of_the_union = "Your long text here..." text_splitter = RecursiveCharacterTextSplitter( chunk_size=100, chunk_overlap=20, length_function=len, add_start_index=True,
) texts = text_splitter.create_documents([state_of_the_union])
print(texts[0])
print(texts[1])

另一个工具是CharacterTextSplitter，它根据指定字符分割文本，并包括对块大小和重叠的控制：

from langchain.text_splitter import CharacterTextSplitter text_splitter = CharacterTextSplitter( separator="nn", chunk_size=1000, chunk_overlap=200, length_function=len, is_separator_regex=False,
) texts = text_splitter.create_documents([state_of_the_union])
print(texts[0])

HTMLHeaderTextSplitter 旨在根据标头标签拆分 HTML 内容，保留语义结构：

from langchain.text_splitter import HTMLHeaderTextSplitter html_string = "Your HTML content here..."
headers_to_split_on = [("h1", "Header 1"), ("h2", "Header 2")] html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
html_header_splits = html_splitter.split_text(html_string)
print(html_header_splits[0])

通过将 HTMLHeaderTextSplitter 与另一个拆分器（例如 Pipelined Splitter）相结合，可以实现更复杂的操作：

from langchain.text_splitter import HTMLHeaderTextSplitter, RecursiveCharacterTextSplitter url = "https://example.com"
headers_to_split_on = [("h1", "Header 1"), ("h2", "Header 2")]
html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
html_header_splits = html_splitter.split_text_from_url(url) chunk_size = 500
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size)
splits = text_splitter.split_documents(html_header_splits)
print(splits[0])

LangChain还为不同的编程语言提供了特定的拆分器，例如Python Code Splitter和JavaScript Code Splitter：

from langchain.text_splitter import RecursiveCharacterTextSplitter, Language python_code = """
def hello_world(): print("Hello, World!")
hello_world() """ python_splitter = RecursiveCharacterTextSplitter.from_language( language=Language.PYTHON, chunk_size=50
)
python_docs = python_splitter.create_documents([python_code])
print(python_docs[0]) js_code = """
function helloWorld() { console.log("Hello, World!");
}
helloWorld(); """ js_splitter = RecursiveCharacterTextSplitter.from_language( language=Language.JS, chunk_size=60
)
js_docs = js_splitter.create_documents([js_code])
print(js_docs[0])

为了根据标记计数分割文本（这对于具有标记限制的语言模型非常有用），使用 TokenTextSplitter：

from langchain.text_splitter import TokenTextSplitter text_splitter = TokenTextSplitter(chunk_size=10)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])

最后，LongContextReorder 对文档重新排序，以防止由于长上下文而导致模型性能下降：

from langchain.document_transformers import LongContextReorder reordering = LongContextReorder()
reordered_docs = reordering.transform_documents(docs)
print(reordered_docs[0])

这些工具演示了在 LangChain 中转换文档的各种方法，从简单的文本拆分到复杂的重新排序和特定于语言的拆分。对于更深入和具体的用例，应查阅 LangChain 文档和集成部分。

在我们的示例中，加载器已经为我们创建了分块文档，并且这部分已经处理完毕。

文本嵌入模型

LangChain中的文本嵌入模型为OpenAI、Cohere、Hugging Face等各种嵌入模型提供商提供了标准化接口。这些模型将文本转换为向量表示，从而实现通过向量空间中的文本相似性进行语义搜索等操作。

要开始使用文本嵌入模型，您通常需要安装特定的包并设置 API 密钥。我们已经为 OpenAI 做到了这一点

在浪链中， embed_documents 方法用于嵌入多个文本，提供向量表示的列表。例如：

from langchain.embeddings import OpenAIEmbeddings # Initialize the model
embeddings_model = OpenAIEmbeddings() # Embed a list of texts
embeddings = embeddings_model.embed_documents( ["Hi there!", "Oh, hello!", "What's your name?", "My friends call me World", "Hello World!"]
)
print("Number of documents embedded:", len(embeddings))
print("Dimension of each embedding:", len(embeddings[0]))

对于嵌入单个文本，例如搜索查询， embed_query 使用方法。这对于将查询与一组文档嵌入进行比较非常有用。例如：

from langchain.embeddings import OpenAIEmbeddings # Initialize the model
embeddings_model = OpenAIEmbeddings() # Embed a single query
embedded_query = embeddings_model.embed_query("What was the name mentioned in the conversation?")
print("First five dimensions of the embedded query:", embedded_query[:5])

理解这些嵌入至关重要。每一段文本都被转换为一个向量，其维度取决于所使用的模型。例如，OpenAI 模型通常会生成 1536 维向量。然后使用这些嵌入来检索相关信息。

LangChain的嵌入功能不仅限于OpenAI，而且旨在与各种提供商合作。根据提供商的不同，设置和用法可能略有不同，但将文本嵌入向量空间的核心概念保持不变。对于详细的使用，包括高级配置以及与不同嵌入模型提供商的集成，集成部分中的 LangChain 文档是一个宝贵的资源。

矢量商店

LangChain中的向量存储支持文本嵌入的高效存储和搜索。 LangChain 与超过 50 个向量商店集成，提供标准化接口以方便使用。

示例：存储和搜索嵌入

嵌入文本后，我们可以将它们存储在向量存储中，例如 Chroma 并执行相似性搜索：

from langchain.vectorstores import Chroma db = Chroma.from_texts(embedded_texts)
similar_texts = db.similarity_search("search query")

或者让我们使用 FAISS 矢量存储来为我们的文档创建索引。

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS pdfstore = FAISS.from_documents(pdfpages, embedding=OpenAIEmbeddings()) airtablestore = FAISS.from_documents(airtabledocs, embedding=OpenAIEmbeddings())

猎犬

LangChain 中的检索器是返回文档以响应非结构化查询的接口。它们比向量存储更通用，专注于检索而不是存储。虽然矢量存储可以用作检索器的骨干，但也有其他类型的检索器。

要设置 Chroma 检索器，首先使用以下命令安装它 pip install chromadb。然后，您可以使用一系列 Python 命令加载、拆分、嵌入和检索文档。以下是设置 Chroma 检索器的代码示例：

from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma full_text = open("state_of_the_union.txt", "r").read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
texts = text_splitter.split_text(full_text) embeddings = OpenAIEmbeddings()
db = Chroma.from_texts(texts, embeddings)
retriever = db.as_retriever() retrieved_docs = retriever.invoke("What did the president say about Ketanji Brown Jackson?")
print(retrieved_docs[0].page_content)

MultiQueryRetriever 通过为用户输入查询生成多个查询并组合结果来自动进行提示调整。这是其简单用法的示例：

from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever question = "What are the approaches to Task Decomposition?"
llm = ChatOpenAI(temperature=0)
retriever_from_llm = MultiQueryRetriever.from_llm( retriever=db.as_retriever(), llm=llm
) unique_docs = retriever_from_llm.get_relevant_documents(query=question)
print("Number of unique documents:", len(unique_docs))

LangChain 中的上下文压缩使用查询的上下文来压缩检索到的文档，确保仅返回相关信息。这涉及减少内容和过滤掉不太相关的文档。以下代码示例展示了如何使用上下文压缩检索器：

from langchain.llms import OpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever) compressed_docs = compression_retriever.get_relevant_documents("What did the president say about Ketanji Jackson Brown")
print(compressed_docs[0].page_content)

EnsembleRetriever 结合了不同的检索算法以实现更好的性能。下面的代码显示了 BM25 和 FAISS Retrievers 组合的示例：

from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.vectorstores import FAISS bm25_retriever = BM25Retriever.from_texts(doc_list).set_k(2)
faiss_vectorstore = FAISS.from_texts(doc_list, OpenAIEmbeddings())
faiss_retriever = faiss_vectorstore.as_retriever(search_kwargs={"k": 2}) ensemble_retriever = EnsembleRetriever( retrievers=[bm25_retriever, faiss_retriever], weights=[0.5, 0.5]
) docs = ensemble_retriever.get_relevant_documents("apples")
print(docs[0].page_content)

LangChain 中的 MultiVector Retriever 允许查询每个文档具有多个向量的文档，这对于捕获文档中的不同语义方面非常有用。创建多个向量的方法包括分成更小的块、总结或生成假设问题。为了将文档分割成更小的块，可以使用以下 Python 代码：


python
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.storage import InMemoryStore
from langchain.document_loaders from TextLoader
import uuid loaders = [TextLoader("file1.txt"), TextLoader("file2.txt")]
docs = [doc for loader in loaders for doc in loader.load()]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000)
docs = text_splitter.split_documents(docs) vectorstore = Chroma(collection_name="full_documents", embedding_function=OpenAIEmbeddings())
store = InMemoryStore()
id_key = "doc_id"
retriever = MultiVectorRetriever(vectorstore=vectorstore, docstore=store, id_key=id_key) doc_ids = [str(uuid.uuid4()) for _ in docs]
child_text_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
sub_docs = [sub_doc for doc in docs for sub_doc in child_text_splitter.split_documents([doc])]
for sub_doc in sub_docs: sub_doc.metadata[id_key] = doc_ids[sub_docs.index(sub_doc)] retriever.vectorstore.add_documents(sub_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))

另一种方法是通过更集中的内容表示来生成更好的检索摘要。以下是生成摘要的示例：

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.document import Document chain = (lambda x: x.page_content) | ChatPromptTemplate.from_template("Summarize the following document:nn{doc}") | ChatOpenAI(max_retries=0) | StrOutputParser()
summaries = chain.batch(docs, {"max_concurrency": 5}) summary_docs = [Document(page_content=s, metadata={id_key: doc_ids[i]}) for i, s in enumerate(summaries)]
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))

使用法学硕士生成与每个文档相关的假设问题是另一种方法。这可以通过以下代码完成：

functions = [{"name": "hypothetical_questions", "parameters": {"questions": {"type": "array", "items": {"type": "string"}}}}]
from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser chain = (lambda x: x.page_content) | ChatPromptTemplate.from_template("Generate 3 hypothetical questions:nn{doc}") | ChatOpenAI(max_retries=0).bind(functions=functions, function_call={"name": "hypothetical_questions"}) | JsonKeyOutputFunctionsParser(key_name="questions")
hypothetical_questions = chain.batch(docs, {"max_concurrency": 5}) question_docs = [Document(page_content=q, metadata={id_key: doc_ids[i]}) for i, questions in enumerate(hypothetical_questions) for q in questions]
retriever.vectorstore.add_documents(question_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))

父文档检索器是另一个检索器，它通过存储小块并检索较大的父文档来在嵌入准确性和上下文保留之间取得平衡。其实现如下：

from langchain.retrievers import ParentDocumentRetriever loaders = [TextLoader("file1.txt"), TextLoader("file2.txt")]
docs = [doc for loader in loaders for doc in loader.load()] child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
vectorstore = Chroma(collection_name="full_documents", embedding_function=OpenAIEmbeddings())
store = InMemoryStore()
retriever = ParentDocumentRetriever(vectorstore=vectorstore, docstore=store, child_splitter=child_splitter) retriever.add_documents(docs, ids=None) retrieved_docs = retriever.get_relevant_documents("query")

自查询检索器根据自然语言输入构造结构化查询并将它们应用到其底层 VectorStore。其实现如以下代码所示：

from langchain.chat_models from ChatOpenAI
from langchain.chains.query_constructor.base from AttributeInfo
from langchain.retrievers.self_query.base from SelfQueryRetriever metadata_field_info = [AttributeInfo(name="genre", description="...", type="string"), ...]
document_content_description = "Brief summary of a movie"
llm = ChatOpenAI(temperature=0) retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info) retrieved_docs = retriever.invoke("query")

WebResearchRetriever 根据给定的查询执行网络研究 –

from langchain.retrievers.web_research import WebResearchRetriever # Initialize components
llm = ChatOpenAI(temperature=0)
search = GoogleSearchAPIWrapper()
vectorstore = Chroma(embedding_function=OpenAIEmbeddings()) # Instantiate WebResearchRetriever
web_research_retriever = WebResearchRetriever.from_llm(vectorstore=vectorstore, llm=llm, search=search) # Retrieve documents
docs = web_research_retriever.get_relevant_documents("query")

对于我们的示例，我们还可以使用已作为向量存储对象的一部分实现的标准检索器，如下所示 –

我们现在可以查询检索器。我们的查询的输出将是与查询相关的文档对象。这些最终将用于在后续部分中创建相关响应。

使用由 Nanonets 为您和您的团队设计的人工智能驱动的工作流程构建器，自动执行手动任务和工作流程。

模块 III：代理

LangChain引入了一个强大的概念，称为“代理”，将链的概念提升到一个全新的水平。代理利用语言模型动态确定要执行的操作序列，使其具有令人难以置信的多功能性和适应性。与传统链不同，传统链的操作是硬编码在代码中的，代理使用语言模型作为推理引擎来决定采取哪些操作以及按什么顺序进行。

中介是负责决策的核心组件。它利用语言模型的力量和提示来确定实现特定目标的后续步骤。代理的输入通常包括：

工具： 可用工具的描述（稍后详细介绍）。
用户输入： 用户的高级目标或查询。
中间步骤： 为达到当前用户输入而执行的（操作、工具输出）对的历史记录。

代理的输出可以是下一个行动采取行动（代理动作）或决赛响应发送给用户（代理完成）。一个行动指定一个工具和输入对于那个工具。

工具

工具是代理可以用来与世界交互的接口。它们使代理能够执行各种任务，例如搜索网络、运行 shell 命令或访问外部 API。在LangChain中，工具对于扩展代理的能力并使他们能够完成不同的任务至关重要。

要使用 LangChain 中的工具，您可以使用以下代码片段加载它们：

from langchain.agents import load_tools tool_names = [...]
tools = load_tools(tool_names)

某些工具可能需要基本语言模型 (LLM) 才能初始化。在这种情况下，您也可以通过法学硕士：

from langchain.agents import load_tools tool_names = [...]
llm = ...
tools = load_tools(tool_names, llm=llm)

此设置允许您访问各种工具并将它们集成到代理的工作流程中。带有使用文档的工具的完整列表是相关信息.

让我们看一些工具的例子。

DuckDuckGo

DuckDuckGo 工具使您能够使用其搜索引擎执行网络搜索。使用方法如下：

from langchain.tools import DuckDuckGoSearchRun
search = DuckDuckGoSearchRun()
search.run("manchester united vs luton town match summary")

搜索引擎优化数据

DataForSeo 工具包允许您使用 DataForSeo API 获取搜索引擎结果。要使用此工具包，您需要设置 API 凭据。以下是配置凭据的方法：

import os os.environ["DATAFORSEO_LOGIN"] = "<your_api_access_username>"
os.environ["DATAFORSEO_PASSWORD"] = "<your_api_access_password>"

设置凭据后，您可以创建一个 DataForSeoAPIWrapper 访问API的工具：

from langchain.utilities.dataforseo_api_search import DataForSeoAPIWrapper wrapper = DataForSeoAPIWrapper() result = wrapper.run("Weather in Los Angeles")

DataForSeoAPIWrapper 工具从各种来源检索搜索引擎结果。

您可以自定义 JSON 响应中返回的结果和字段的类型。例如，您可以指定结果类型、字段，并设置要返回的顶部结果数的最大计数：

json_wrapper = DataForSeoAPIWrapper( json_result_types=["organic", "knowledge_graph", "answer_box"], json_result_fields=["type", "title", "description", "text"], top_count=3,
) json_result = json_wrapper.results("Bill Gates")

此示例通过指定结果类型、字段和限制结果数量来自定义 JSON 响应。

您还可以通过将其他参数传递给 API 包装器来指定搜索结果的位置和语言：

customized_wrapper = DataForSeoAPIWrapper( top_count=10, json_result_types=["organic", "local_pack"], json_result_fields=["title", "description", "type"], params={"location_name": "Germany", "language_code": "en"},
) customized_result = customized_wrapper.results("coffee near me")

通过提供位置和语言参数，您可以根据特定区域和语言定制搜索结果。

您可以灵活选择要使用的搜索引擎。只需指定所需的搜索引擎：

customized_wrapper = DataForSeoAPIWrapper( top_count=10, json_result_types=["organic", "local_pack"], json_result_fields=["title", "description", "type"], params={"location_name": "Germany", "language_code": "en", "se_name": "bing"},
) customized_result = customized_wrapper.results("coffee near me")

在此示例中，搜索被定制为使用 Bing 作为搜索引擎。

API 包装器还允许您指定要执行的搜索类型。例如，您可以执行地图搜索：

maps_search = DataForSeoAPIWrapper( top_count=10, json_result_fields=["title", "value", "address", "rating", "type"], params={ "location_coordinate": "52.512,13.36,12z", "language_code": "en", "se_type": "maps", },
) maps_search_result = maps_search.results("coffee near me")

这将自定义搜索以检索地图相关信息。

外壳（bash）

Shell 工具包为代理提供了对 shell 环境的访问权限，允许它们执行 shell 命令。此功能功能强大，但应谨慎使用，尤其是在沙盒环境中。以下是使用 Shell 工具的方法：

from langchain.tools import ShellTool shell_tool = ShellTool() result = shell_tool.run({"commands": ["echo 'Hello World!'", "time"]})

在此示例中，Shell 工具运行两个 shell 命令：回显“Hello World!” 并显示当前时间。

您可以向代理提供 Shell 工具来执行更复杂的任务。以下是代理使用 Shell 工具从网页获取链接的示例：

from langchain.agents import AgentType, initialize_agent
from langchain.chat_models import ChatOpenAI llm = ChatOpenAI(temperature=0.1) shell_tool.description = shell_tool.description + f"args {shell_tool.args}".replace( "{", "{{"
).replace("}", "}}")
self_ask_with_search = initialize_agent( [shell_tool], llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
self_ask_with_search.run( "Download the langchain.com webpage and grep for all urls. Return only a sorted list of them. Be sure to use double quotes."
)

在此场景中，代理使用 Shell 工具执行一系列命令来从网页中获取、筛选和排序 URL。

提供的示例演示了 LangChain 中可用的一些工具。这些工具最终扩展了代理的能力（在下一小节中探讨），并使他们能够有效地执行各种任务。根据您的要求，您可以选择最适合您的项目需求的工具和工具包，并将它们集成到代理的工作流程中。

返回代理

现在让我们转向代理。

AgentExecutor 是代理的运行时环境。它负责调用代理，执行它选择的操作，将操作输出传递回代理，并重复该过程直到代理完成。在伪代码中，AgentExecutor 可能看起来像这样：

next_action = agent.get_action(...)
while next_action != AgentFinish: observation = run(next_action) next_action = agent.get_action(..., next_action, observation)
return next_action

AgentExecutor 处理各种复杂性，例如处理代理选择不存在的工具的情况、处理工具错误、管理代理生成的输出以及提供各个级别的日志记录和可观察性。

虽然 AgentExecutor 类是 LangChain 中的主要代理运行时，但还支持其他更多实验性运行时，包括：

计划与执行代理
婴儿通用人工智能
自动GPT

为了更好地理解代理框架，让我们从头开始构建一个基本代理，然后继续探索预构建的代理。

在我们深入构建代理之前，有必要重新审视一些关键术语和模式：

代理动作： 这是代表代理应采取的操作的数据类。它由一个 tool 属性（要调用的工具的名称）和 tool_input 属性（该工具的输入）。
代理完成： 该数据类指示代理已完成其任务并应向用户返回响应。它通常包括返回值字典，通常带有包含响应文本的键“输出”。
中间步骤： 这些是先前代理操作和相应输出的记录。它们对于将上下文传递给代理的未来迭代至关重要。

在我们的示例中，我们将使用 OpenAI 函数调用来创建代理。这种方法对于代理创建来说是可靠的。我们将首先创建一个计算单词长度的简单工具。该工具很有用，因为语言模型在计算单词长度时有时会由于标记化而出错。

首先，让我们加载用于控制代理的语言模型：

from langchain.chat_models import ChatOpenAI llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

让我们通过字长计算来测试模型：

llm.invoke("how many letters in the word educa?")

响应应指出单词“educa”中的字母数量。

接下来，我们将定义一个简单的 Python 函数来计算单词的长度：

from langchain.agents import tool @tool
def get_word_length(word: str) -> int: """Returns the length of a word.""" return len(word)

我们创建了一个名为 get_word_length 它接受一个单词作为输入并返回其长度。

现在，让我们为代理创建提示。提示指示代理如何推理和格式化输出。在我们的例子中，我们使用 OpenAI 函数调用，这需要最少的指令。我们将使用用户输入的占位符和代理暂存器定义提示：

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are a very powerful assistant but not great at calculating word lengths.", ), ("user", "{input}"), MessagesPlaceholder(variable_name="agent_scratchpad"), ]
)

现在，代理如何知道它可以使用哪些工具？我们依赖 OpenAI 函数调用语言模型，它需要单独传递函数。为了向代理提供我们的工具，我们将它们格式化为 OpenAI 函数调用：

from langchain.tools.render import format_tool_to_openai_function llm_with_tools = llm.bind(functions=[format_tool_to_openai_function(t) for t in tools])

现在，我们可以通过定义输入映射并连接组件来创建代理：

这是 LCEL 语言。我们稍后会详细讨论这个问题。

from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser agent = ( { "input": lambda x: x["input"], "agent_scratchpad": lambda x: format_to_openai _function_messages( x["intermediate_steps"] ), } | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser()
)

我们创建了代理，它可以理解用户输入、使用可用工具并格式化输出。现在，让我们与它进行交互：

agent.invoke({"input": "how many letters in the word educa?", "intermediate_steps": []})

代理应使用 AgentAction 进行响应，指示下一步要采取的操作。

我们已经创建了代理，但现在我们需要为其编写运行时。最简单的运行时是连续调用代理、执行操作并重复直到代理完成的运行时。这是一个例子：

from langchain.schema.agent import AgentFinish user_input = "how many letters in the word educa?"
intermediate_steps = [] while True: output = agent.invoke( { "input": user_input, "intermediate_steps": intermediate_steps, } ) if isinstance(output, AgentFinish): final_result = output.return_values["output"] break else: print(f"TOOL NAME: {output.tool}") print(f"TOOL INPUT: {output.tool_input}") tool = {"get_word_length": get_word_length}[output.tool] observation = tool.run(output.tool_input) intermediate_steps.append((output, observation)) print(final_result)

在此循环中，我们重复调用代理、执行操作并更新中间步骤，直到代理完成。我们还在循环内处理工具交互。

为了简化这个过程，LangChain 提供了 AgentExecutor 类，它封装了代理执行并提供错误处理、提前停止、跟踪等改进。让我们使用 AgentExecutor 与代理交互：

from langchain.agents import AgentExecutor agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) agent_executor.invoke({"input": "how many letters in the word educa?"})

AgentExecutor简化了执行过程，并提供了一种便捷的与代理交互的方式。

稍后还将详细讨论内存。

到目前为止，我们创建的代理是无状态的，这意味着它不记得以前的交互。为了启用后续问题和对话，我们需要为代理添加内存。这涉及两个步骤：

在提示中添加内存变量来存储聊天记录。
在互动过程中跟踪聊天记录。

让我们首先在提示中添加一个内存占位符：

from langchain.prompts import MessagesPlaceholder MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are a very powerful assistant but not great at calculating word lengths.", ), MessagesPlaceholder(variable_name=MEMORY_KEY), ("user", "{input}"), MessagesPlaceholder(variable_name="agent_scratchpad"), ]
)

现在，创建一个列表来跟踪聊天历史记录：

from langchain.schema.messages import HumanMessage, AIMessage chat_history = []

在代理创建步骤中，我们还将包括内存：

agent = ( { "input": lambda x: x["input"], "agent_scratchpad": lambda x: format_to_openai_function_messages( x["intermediate_steps"] ), "chat_history": lambda x: x["chat_history"], } | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser()
)

现在，运行代理时，请确保更新聊天历史记录：

input1 = "how many letters in the word educa?"
result = agent_executor.invoke({"input": input1, "chat_history": chat_history})
chat_history.extend([ HumanMessage(content=input1), AIMessage(content=result["output"]),
])
agent_executor.invoke({"input": "is that a real word?", "chat_history": chat_history})

这使得代理能够维护对话历史记录并根据之前的交互回答后续问题。

恭喜！您已在 LangChain 中成功创建并执行了第一个端到端代理。要更深入地了解 LangChain 的功能，您可以探索：

支持不同的代理类型。
预建代理
如何使用工具和工具集成。

代理类型

LangChain 提供各种代理类型，每种类型都适合特定的用例。以下是一些可用的代理：

零次反应： 该代理使用 ReAct 框架仅根据工具的描述来选择工具。它需要对每个工具进行描述，并且用途广泛。
结构化输入 ReAct： 该代理可以处理多输入工具，适合复杂的任务，例如浏览网络浏览器。它使用工具的参数模式进行结构化输入。
OpenAI功能： 该代理专为针对函数调用进行微调的模型而设计，与 gpt-3.5-turbo-0613 和 gpt-4-0613 等模型兼容。 我们用它来创建上面的第一个代理。
会话： 该代理专为对话设置而设计，使用 ReAct 进行工具选择，并利用内存来记住之前的交互。
通过搜索自我询问： 该代理依赖于一个工具“中间答案”，它可以查找问题的事实答案。相当于原来用搜索论文自问。
ReAct 文档存储： 该代理使用 ReAct 框架与文档存储进行交互。它需要“搜索”和“查找”工具，类似于原始 ReAct 论文的维基百科示例。

探索这些代理类型，在 LangChain 中找到最适合您需求的一种。这些代理允许您在其中绑定一组工具来处理操作并生成响应。了解更多关于如何使用此处的工具构建您自己的代理.

预建代理

让我们继续探索代理，重点关注浪链中可用的预建代理。

Gmail的

LangChain提供了一个Gmail工具包，允许您将LangChain电子邮件连接到Gmail API。首先，您需要设置您的凭据，Gmail API 文档中对此进行了说明。一旦您下载了 credentials.json 文件后，您可以继续使用 Gmail API。此外，您需要使用以下命令安装一些必需的库：

pip install --upgrade google-api-python-client > /dev/null
pip install --upgrade google-auth-oauthlib > /dev/null
pip install --upgrade google-auth-httplib2 > /dev/null
pip install beautifulsoup4 > /dev/null # Optional for parsing HTML messages

您可以按如下方式创建 Gmail 工具包：

from langchain.agents.agent_toolkits import GmailToolkit toolkit = GmailToolkit()

您还可以根据需要自定义身份验证。在幕后，使用以下方法创建 googleapi 资源：

from langchain.tools.gmail.utils import build_resource_service, get_gmail_credentials credentials = get_gmail_credentials( token_file="token.json", scopes=["https://mail.google.com/"], client_secrets_file="credentials.json",
)
api_resource = build_resource_service(credentials=credentials)
toolkit = GmailToolkit(api_resource=api_resource)

该工具包提供了可在代理中使用的各种工具，包括：

GmailCreateDraft：创建具有指定消息字段的草稿电子邮件。
GmailSendMessage：发送电子邮件。
GmailSearch：搜索电子邮件或话题。
GmailGetMessage：通过邮件ID获取邮件。
GmailGetThread：搜索电子邮件。

要在代理中使用这些工具，您可以按如下方式初始化代理：

from langchain.llms import OpenAI
from langchain.agents import initialize_agent, AgentType llm = OpenAI(temperature=0)
agent = initialize_agent( tools=toolkit.get_tools(), llm=llm, agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
)

以下是如何使用这些工具的几个示例：

创建 Gmail 草稿进行编辑：

agent.run( "Create a gmail draft for me to edit of a letter from the perspective of a sentient parrot " "who is looking to collaborate on some research with her estranged friend, a cat. " "Under no circumstances may you send the message, however."
)

在草稿中搜索最新的电子邮件：

agent.run("Could you search in my drafts for the latest email?")

这些示例展示了 LangChain Gmail 工具包在代理中的功能，使您能够以编程方式与 Gmail 进行交互。

SQL数据库代理

本节概述了旨在与 SQL 数据库（特别是 Chinook 数据库）交互的代理。该代理可以回答有关数据库的一般问题并从错误中恢复。请注意，它仍在积极开发中，并非所有答案都可能是正确的。在敏感数据上运行它时要小心，因为它可能会在您的数据库上执行 DML 语句。

要使用此代理，您可以按如下方式初始化它：

from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain.llms.openai import OpenAI
from langchain.agents import AgentExecutor
from langchain.agents.agent_types import AgentType
from langchain.chat_models import ChatOpenAI db = SQLDatabase.from_uri("sqlite:///../../../../../notebooks/Chinook.db")
toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0)) agent_executor = create_sql_agent( llm=OpenAI(temperature=0), toolkit=toolkit, verbose=True, agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

该代理可以使用以下命令进行初始化 ZERO_SHOT_REACT_DESCRIPTION 代理类型。它旨在回答问题并提供描述。或者，您可以使用以下命令初始化代理 OPENAI_FUNCTIONS 代理类型采用 OpenAI 的 GPT-3.5-turbo 模型，我们在早期的客户端中使用了该模型。

免责声明

查询链可以生成插入/更新/删除查询。请小心谨慎，并根据需要使用自定义提示或创建没有写入权限的 SQL 用户。
请注意，运行某些查询（例如“运行可能的最大查询”）可能会使 SQL 数据库超载，尤其是在包含数百万行的情况下。
面向数据仓库的数据库通常支持用户级配额来限制资源使用。

您可以要求代理描述一个表，例如“playlisttrack”表。以下是如何执行此操作的示例：

agent_executor.run("Describe the playlisttrack table")

代理将提供有关表架构和示例行的信息。

如果您错误地询问不存在的表，代理可以恢复并提供有关最接近匹配表的信息。例如：

agent_executor.run("Describe the playlistsong table")

代理将找到最近的匹配表并提供有关它的信息。

您还可以要求代理对数据库运行查询。例如：

agent_executor.run("List the total sales per country. Which country's customers spent the most?")

代理将执行查询并提供结果，例如总销售额最高的国家/地区。

要获取每个播放列表中的曲目总数，您可以使用以下查询：

agent_executor.run("Show the total number of tracks in each playlist. The Playlist name should be included in the result.")

代理将返回播放列表名称以及相应的总曲目数。

如果代理遇到错误，它可以恢复并提供准确的响应。例如：

agent_executor.run("Who are the top 3 best selling artists?")

即使在遇到初始错误后，经纪人也会调整并提供正确的答案，在本例中，该答案是最畅销的前 3 位艺术家。

Pandas 数据帧代理

本节介绍一个旨在与 Pandas DataFrame 交互以回答问题的代理。请注意，该代理在底层利用 Python 代理来执行由语言模型 (LLM) 生成的 Python 代码。使用此代理时请务必小心，以防止 LLM 生成的恶意 Python 代码造成潜在危害。

您可以按如下方式初始化 Pandas DataFrame 代理：

from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType from langchain.llms import OpenAI
import pandas as pd df = pd.read_csv("titanic.csv") # Using ZERO_SHOT_REACT_DESCRIPTION agent type
agent = create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True) # Alternatively, using OPENAI_FUNCTIONS agent type
# agent = create_pandas_dataframe_agent(
# ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613"),
# df,
# verbose=True,
# agent_type=AgentType.OPENAI_FUNCTIONS,
# )

您可以要求代理计算 DataFrame 中的行数：

agent.run("how many rows are there?")

代理将执行代码 df.shape[0] 并提供答案，例如“数据框中有 891 行”。

您还可以要求代理根据特定条件过滤行，例如查找拥有超过 3 个兄弟姐妹的人数：

agent.run("how many people have more than 3 siblings")

代理将执行代码 df[df['SibSp'] > 3].shape[0] 并提供答案，例如“30 人有超过 3 个兄弟姐妹”。

如果你想计算平均年龄的平方根，可以询问中介：

agent.run("whats the square root of the average age?")

代理将使用以下方法计算平均年龄 df['Age'].mean() 然后使用计算平方根 math.sqrt()。它将提供答案，例如“平均年龄的平方根是 5.449689683556195”。

让我们创建 DataFrame 的副本，并用平均年龄填充缺失的年龄值：

df1 = df.copy()
df1["Age"] = df1["Age"].fillna(df1["Age"].mean())

然后，您可以使用两个 DataFrame 初始化代理并询问它一个问题：

agent = create_pandas_dataframe_agent(OpenAI(temperature=0), [df, df1], verbose=True)
agent.run("how many rows in the age column are different?")

代理将比较两个 DataFrame 中的年龄列并提供答案，例如“年龄列中的 177 行不同”。

吉拉工具包

本节介绍如何使用 Jira 工具包，该工具包允许代理与 Jira 实例交互。您可以使用此工具包执行各种操作，例如搜索问题和创建问题。它利用 atlassian-python-api 库。要使用此工具包，您需要为 Jira 实例设置环境变量，包括 JIRA_API_TOKEN、JIRA_USERNAME 和 JIRA_INSTANCE_URL。此外，您可能需要将 OpenAI API 密钥设置为环境变量。

首先，安装 atlassian-python-api 库并设置所需的环境变量：

%pip install atlassian-python-api import os
from langchain.agents import AgentType
from langchain.agents import initialize_agent
from langchain.agents.agent_toolkits.jira.toolkit import JiraToolkit
from langchain.llms import OpenAI
from langchain.utilities.jira import JiraAPIWrapper os.environ["JIRA_API_TOKEN"] = "abc"
os.environ["JIRA_USERNAME"] = "123"
os.environ["JIRA_INSTANCE_URL"] = "https://jira.atlassian.com"
os.environ["OPENAI_API_KEY"] = "xyz" llm = OpenAI(temperature=0)
jira = JiraAPIWrapper()
toolkit = JiraToolkit.from_jira_api_wrapper(jira)
agent = initialize_agent( toolkit.get_tools(), llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

您可以指示代理在特定项目中创建一个新问题，并附上摘要和描述：

agent.run("make a new issue in project PW to remind me to make more fried rice")

代理将执行必要的操作来创建问题并提供响应，例如“项目 PW 中创建了一个新问题，摘要为‘多做炒饭’，描述为‘提醒多做炒饭’。”

这允许您使用自然语言指令和 Jira 工具包与 Jira 实例进行交互。

使用由 Nanonets 为您和您的团队设计的人工智能驱动的工作流程构建器，自动执行手动任务和工作流程。

模块 IV：链条

LangChain 是一款专为在复杂应用程序中利用大型语言模型（LLM）而设计的工具。它提供了用于创建组件链的框架，包括法学硕士和其他类型的组件。两个主要框架

LangChain表达语言（LCEL）
遗留链接口

LangChain 表达式语言（LCEL）是一种允许直观地组合链的语法。它支持流、异步调用、批处理、并行化、重试、回退和跟踪等高级功能。例如，您可以在 LCEL 中编写提示、模型和输出解析器，如以下代码所示：

from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
prompt = ChatPromptTemplate.from_messages([ ("system", "You're a very knowledgeable historian who provides accurate and eloquent answers to historical questions."), ("human", "{question}")
])
runnable = prompt | model | StrOutputParser() for chunk in runnable.stream({"question": "What are the seven wonders of the world"}): print(chunk, end="", flush=True)

另外，LLMChain 是类似于 LCEL 的一个选项，用于组合组件。 LLMChain示例如下：

from langchain.chains import LLMChain chain = LLMChain(llm=model, prompt=prompt, output_parser=StrOutputParser())
chain.run(question="What are the seven wonders of the world")

LangChain 中的链也可以通过合并 Memory 对象来实现有状态。这允许跨调用保留数据，如本示例所示：

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory conversation = ConversationChain(llm=chat, memory=ConversationBufferMemory())
conversation.run("Answer briefly. What are the first 3 colors of a rainbow?")
conversation.run("And the next 4?")

LangChain还支持与OpenAI的函数调用API集成，这对于获取结构化输出和在链内执行函数非常有用。为了获得结构化输出，您可以使用 Pydantic 类或 JsonSchema 指定它们，如下所示：

from langchain.pydantic_v1 import BaseModel, Field
from langchain.chains.openai_functions import create_structured_output_runnable
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate class Person(BaseModel): name: str = Field(..., description="The person's name") age: int = Field(..., description="The person's age") fav_food: Optional[str] = Field(None, description="The person's favorite food") llm = ChatOpenAI(model="gpt-4", temperature=0)
prompt = ChatPromptTemplate.from_messages([ # Prompt messages here
]) runnable = create_structured_output_runnable(Person, llm, prompt)
runnable.invoke({"input": "Sally is 13"})

对于结构化输出，还可以使用使用 LLMChain 的传统方法：

from langchain.chains.openai_functions import create_structured_output_chain class Person(BaseModel): name: str = Field(..., description="The person's name") age: int = Field(..., description="The person's age") chain = create_structured_output_chain(Person, llm, prompt, verbose=True)
chain.run("Sally is 13")

LangChain利用OpenAI功能创建各种特定用途的链。其中包括用于提取、标记、OpenAPI 和带引用的 QA 链。

在提取的背景下，该过程类似于结构化输出链，但侧重于信息或实体提取。对于标记，其想法是用情绪、语言、风格、涵盖的主题或政治倾向等类别来标记文档。

可以使用 Python 代码来演示 LangChain 中标签如何工作的示例。该过程首先安装必要的软件包并设置环境：

pip install langchain openai
# Set env var OPENAI_API_KEY or load from a .env file:
# import dotenv
# dotenv.load_dotenv() from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic

定义了标记模式，指定属性及其预期类型：

schema = { "properties": { "sentiment": {"type": "string"}, "aggressiveness": {"type": "integer"}, "language": {"type": "string"}, }
} llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
chain = create_tagging_chain(schema, llm)

使用不同输入运行标签链的示例显示了模型解释情绪、语言和攻击性的能力：

inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
chain.run(inp)
# {'sentiment': 'positive', 'language': 'Spanish'} inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
chain.run(inp)
# {'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'es'}

为了更好地控制，可以更具体地定义模式，包括可能的值、描述和所需的属性。这种增强控制的示例如下所示：

schema = { "properties": { # Schema definitions here }, "required": ["language", "sentiment", "aggressiveness"],
} chain = create_tagging_chain(schema, llm)

Pydantic 模式还可以用于定义标记标准，提供 Pythonic 方式来指定所需的属性和类型：

from enum import Enum
from pydantic import BaseModel, Field class Tags(BaseModel): # Class fields here chain = create_tagging_chain_pydantic(Tags, llm)

此外，LangChain的元数据标记器文档转换器可用于从LangChain文档中提取元数据，提供与标记链类似的功能，但应用于LangChain文档。

引用检索源是LangChain的另一个特点，使用OpenAI函数从文本中提取引文。下面的代码演示了这一点：

from langchain.chains import create_citation_fuzzy_match_chain
from langchain.chat_models import ChatOpenAI llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
chain = create_citation_fuzzy_match_chain(llm)
# Further code for running the chain and displaying results

在 LangChain 中，大型语言模型 (LLM) 应用程序中的链接通常涉及将提示模板与 LLM 以及可选的输出解析器相结合。建议的方法是通过 LangChain 表达式语言 (LCEL)，但也支持传统的 LLMChain 方法。

使用 LCEL，BasePromptTemplate、BaseLanguageModel 和 BaseOutputParser 都实现了 Runnable 接口，并且可以轻松地通过管道相互连接。下面是一个演示这一点的示例：

from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema import StrOutputParser prompt = PromptTemplate.from_template( "What is a good name for a company that makes {product}?"
)
runnable = prompt | ChatOpenAI() | StrOutputParser()
runnable.invoke({"product": "colorful socks"})
# Output: 'VibrantSocks'

LangChain 中的路由允许创建非确定性链，其中前一步的输出决定下一步。这有助于构建和保持与法学硕士互动的一致性。例如，如果您有两个针对不同类型问题优化的模板，您可以根据用户输入选择模板。

以下是如何使用带有 RunnableBranch 的 LCEL 来实现此目的，该分支使用（条件、可运行）对列表和默认可运行进行初始化：

from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableBranch
# Code for defining physics_prompt and math_prompt general_prompt = PromptTemplate.from_template( "You are a helpful assistant. Answer the question as accurately as you can.nn{input}"
)
prompt_branch = RunnableBranch( (lambda x: x["topic"] == "math", math_prompt), (lambda x: x["topic"] == "physics", physics_prompt), general_prompt,
) # More code for setting up the classifier and final chain

然后使用各种组件（例如主题分类器、提示分支和输出解析器）构建最终链，以根据输入主题确定流程：

from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough final_chain = ( RunnablePassthrough.assign(topic=itemgetter("input") | classifier_chain) | prompt_branch | ChatOpenAI() | StrOutputParser()
) final_chain.invoke( { "input": "What is the first prime number greater than 40 such that one plus the prime number is divisible by 3?" }
)
# Output: Detailed answer to the math question

这种方法体现了 LangChain 在处理复杂查询并根据输入适当路由它们方面的灵活性和强大功能。

在语言模型领域，常见的做法是在初始调用之后进行一系列后续调用，使用一个调用的输出作为下一个调用的输入。当您想要基于先前交互中生成的信息时，这种顺序方法特别有用。虽然 LangChain 表达式语言 (LCEL) 是创建这些序列的推荐方法，但 SequentialChain 方法仍因其向后兼容性而被记录。

为了说明这一点，让我们考虑一个场景，首先生成游戏概要，然后根据该概要进行评论。使用Python的 langchain.prompts，我们创建两个 PromptTemplate 实例：一个用于概要，另一个用于评论。以下是设置这些模板的代码：

from langchain.prompts import PromptTemplate synopsis_prompt = PromptTemplate.from_template( "You are a playwright. Given the title of play, it is your job to write a synopsis for that title.nnTitle: {title}nPlaywright: This is a synopsis for the above play:"
) review_prompt = PromptTemplate.from_template( "You are a play critic from the New York Times. Given the synopsis of play, it is your job to write a review for that play.nnPlay Synopsis:n{synopsis}nReview from a New York Times play critic of the above play:"
)

在 LCEL 方法中，我们将这些提示与 ChatOpenAI 和 StrOutputParser 创建一个序列，首先生成概要，然后生成评论。代码片段如下：

from langchain.chat_models import ChatOpenAI
from langchain.schema import StrOutputParser llm = ChatOpenAI()
chain = ( {"synopsis": synopsis_prompt | llm | StrOutputParser()} | review_prompt | llm | StrOutputParser()
)
chain.invoke({"title": "Tragedy at sunset on the beach"})

如果我们同时需要概要和评论，我们可以使用 RunnablePassthrough 为每个创建一个单独的链，然后将它们组合起来：

from langchain.schema.runnable import RunnablePassthrough synopsis_chain = synopsis_prompt | llm | StrOutputParser()
review_chain = review_prompt | llm | StrOutputParser()
chain = {"synopsis": synopsis_chain} | RunnablePassthrough.assign(review=review_chain)
chain.invoke({"title": "Tragedy at sunset on the beach"})

对于涉及更复杂序列的场景， SequentialChain 方法就发挥作用了。这允许多个输入和输出。考虑这样一种情况，我们需要基于戏剧标题和时代的概要。我们可以这样设置：

from langchain.llms import OpenAI
from langchain.chains import LLMChain, SequentialChain
from langchain.prompts import PromptTemplate llm = OpenAI(temperature=0.7) synopsis_template = "You are a playwright. Given the title of play and the era it is set in, it is your job to write a synopsis for that title.nnTitle: {title}nEra: {era}nPlaywright: This is a synopsis for the above play:"
synopsis_prompt_template = PromptTemplate(input_variables=["title", "era"], template=synopsis_template)
synopsis_chain = LLMChain(llm=llm, prompt=synopsis_prompt_template, output_key="synopsis") review_template = "You are a play critic from the New York Times. Given the synopsis of play, it is your job to write a review for that play.nnPlay Synopsis:n{synopsis}nReview from a New York Times play critic of the above play:"
prompt_template = PromptTemplate(input_variables=["synopsis"], template=review_template)
review_chain = LLMChain(llm=llm, prompt=prompt_template, output_key="review") overall_chain = SequentialChain( chains=[synopsis_chain, review_chain], input_variables=["era", "title"], output_variables=["synopsis", "review"], verbose=True,
) overall_chain({"title": "Tragedy at sunset on the beach", "era": "Victorian England"})

在您想要维护整个链或链的后续部分的上下文的情况下， SimpleMemory 可以使用。这对于管理复杂的输入/输出关系特别有用。例如，在我们想要根据戏剧的标题、时代、概要和评论生成社交媒体帖子的场景中， SimpleMemory 可以帮助管理这些变量：

from langchain.memory import SimpleMemory
from langchain.chains import SequentialChain template = "You are a social media manager for a theater company. Given the title of play, the era it is set in, the date, time and location, the synopsis of the play, and the review of the play, it is your job to write a social media post for that play.nnHere is some context about the time and location of the play:nDate and Time: {time}nLocation: {location}nnPlay Synopsis:n{synopsis}nReview from a New York Times play critic of the above play:n{review}nnSocial Media Post:"
prompt_template = PromptTemplate(input_variables=["synopsis", "review", "time", "location"], template=template)
social_chain = LLMChain(llm=llm, prompt=prompt_template, output_key="social_post_text") overall_chain = SequentialChain( memory=SimpleMemory(memories={"time": "December 25th, 8pm PST", "location": "Theater in the Park"}), chains=[synopsis_chain, review_chain, social_chain], input_variables=["era", "title"], output_variables=["social_post_text"], verbose=True,
) overall_chain({"title": "Tragedy at sunset on the beach", "era": "Victorian England"})

除了顺序链之外，还有用于处理文档的专用链。这些链中的每一个都有不同的目的，从组合文档到基于迭代文档分析的细化答案，到映射和减少文档内容以基于评分响应进行摘要或重新排名。这些链可以使用 LCEL 重新创建，以获得额外的灵活性和定制性。

StuffDocumentsChain 将文档列表合并为传递给法学硕士的单个提示。
RefineDocumentsChain 迭代地更新每个文档的答案，适用于文档超出模型上下文容量的任务。
MapReduceDocumentsChain 将链分别应用于每个文档，然后组合结果。
MapRerankDocumentsChain 对每个基于文档的响应进行评分并选择得分最高的响应。

以下是您如何设置的示例 MapReduceDocumentsChain 使用 LCEL：

from functools import partial
from langchain.chains.combine_documents import collapse_docs, split_list_of_docs
from langchain.schema import Document, StrOutputParser
from langchain.schema.prompt_template import format_document
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough llm = ChatAnthropic()
document_prompt = PromptTemplate.from_template("{page_content}")
partial_format_document = partial(format_document, prompt=document_prompt) map_chain = ( {"context": partial_format_document} | PromptTemplate.from_template("Summarize this content:nn{context}") | llm | StrOutputParser()
) map_as_doc_chain = ( RunnableParallel({"doc": RunnablePassthrough(), "content": map_chain}) | (lambda x: Document(page_content=x["content"], metadata=x["doc"].metadata))
).with_config(run_name="Summarize (return doc)") def format_docs(docs): return "nn".join(partial_format_document(doc) for doc in docs) collapse_chain = ( {"context": format_docs} | PromptTemplate.from_template("Collapse this content:nn{context}") | llm | StrOutputParser()
) reduce_chain = ( {"context": format_docs} | PromptTemplate.from_template("Combine these summaries:nn{context}") | llm | StrOutputParser()
).with_config(run_name="Reduce") map_reduce = (map_as_doc_chain.map() | collapse | reduce_chain).with_config(run_name="Map reduce")

此配置可以利用 LCEL 和底层语言模型的优势，对文档内容进行详细而全面的分析。

使用由 Nanonets 为您和您的团队设计的人工智能驱动的工作流程构建器，自动执行手动任务和工作流程。

模块五：内存

在LangChain中，记忆是对话界面的一个基本方面，允许系统参考过去的交互。这是通过存储和查询信息来实现的，有两个主要操作：读取和写入。存储系统在运行期间与链交互两次，增加用户输入并存储输入和输出以供将来参考。

将内存构建到系统中

存储聊天消息： LangChain内存模块集成了多种方法来存储聊天消息，从内存列表到数据库。这可确保记录所有聊天交互以供将来参考。
查询聊天消息： 除了存储聊天消息之外，LangChain 还使用数据结构和算法来创建这些消息的有用视图。简单的记忆系统可能会返回最近的消息，而更高级的系统可以总结过去的交互或关注当前交互中提到的实体。

为了演示 LangChain 中内存的使用，请考虑 ConversationBufferMemory 类，一种简单的内存形式，将聊天消息存储在缓冲区中。这是一个例子：

from langchain.memory import ConversationBufferMemory memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("Hello!")
memory.chat_memory.add_ai_message("How can I assist you?")

将内存集成到链中时，了解从内存返回的变量以及它们在链中的使用方式至关重要。例如， load_memory_variables 方法有助于将从内存中读取的变量与链的期望对齐。

LangChain 的端到端示例

考虑使用 ConversationBufferMemory 的 LLMChain。该链与适当的提示模板和内存相结合，提供了无缝的对话体验。这是一个简化的示例：

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory llm = OpenAI(temperature=0)
template = "Your conversation template here..."
prompt = PromptTemplate.from_template(template)
memory = ConversationBufferMemory(memory_key="chat_history")
conversation = LLMChain(llm=llm, prompt=prompt, memory=memory) response = conversation({"question": "What's the weather like?"})

这个例子说明了LangChain的记忆系统如何与其链集成以提供连贯且上下文感知的对话体验。

Langchain 中的内存类型

Langchain 提供各种内存类型，可用于增强与人工智能模型的交互。每种内存类型都有自己的参数和返回类型，适合不同的场景。让我们通过代码示例来探索 Langchain 中可用的一些内存类型。

1. 会话缓冲存储器

这种内存类型允许您存储和提取对话中的消息。您可以将历史记录提取为字符串或消息列表。

from langchain.memory import ConversationBufferMemory memory = ConversationBufferMemory()
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.load_memory_variables({}) # Extract history as a string
{'history': 'Human: hinAI: whats up'} # Extract history as a list of messages
{'history': [HumanMessage(content='hi', additional_kwargs={}), AIMessage(content='whats up', additional_kwargs={})]}

您还可以在链中使用对话缓冲区内存来进行类似聊天的交互。

2. 会话缓冲区窗口内存

此内存类型保留最近交互的列表并使用最后 K 个交互，以防止缓冲区变得太大。

from langchain.memory import ConversationBufferWindowMemory memory = ConversationBufferWindowMemory(k=1)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({}) {'history': 'Human: not much younAI: not much'}

与对话缓冲区内存一样，您也可以在链中使用此内存类型进行类似聊天的交互。

3. 对话实体记忆

这种记忆类型会记住对话中特定实体的事实，并使用 LLM 提取信息。

from langchain.memory import ConversationEntityMemory
from langchain.llms import OpenAI llm = OpenAI(temperature=0)
memory = ConversationEntityMemory(llm=llm)
_input = {"input": "Deven & Sam are working on a hackathon project"}
memory.load_memory_variables(_input)
memory.save_context( _input, {"output": " That sounds like a great project! What kind of project are they working on?"}
)
memory.load_memory_variables({"input": 'who is Sam'}) {'history': 'Human: Deven & Sam are working on a hackathon projectnAI: That sounds like a great project! What kind of project are they working on?', 'entities': {'Sam': 'Sam is working on a hackathon project with Deven.'}}

4. 会话知识图谱记忆

这种内存类型使用知识图来重新创建内存。您可以从消息中提取当前实体和知识三元组。

from langchain.memory import ConversationKGMemory
from langchain.llms import OpenAI llm = OpenAI(temperature=0)
memory = ConversationKGMemory(llm=llm)
memory.save_context({"input": "say hi to sam"}, {"output": "who is sam"})
memory.save_context({"input": "sam is a friend"}, {"output": "okay"})
memory.load_memory_variables({"input": "who is sam"}) {'history': 'On Sam: Sam is friend.'}

您还可以在链中使用这种内存类型来进行基于对话的知识检索。

5. 对话总结记忆

这种记忆类型会随着时间的推移创建对话摘要，对于压缩较长对话中的信息很有用。

from langchain.memory import ConversationSummaryMemory
from langchain.llms import OpenAI llm = OpenAI(temperature=0)
memory = ConversationSummaryMemory(llm=llm)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.load_memory_variables({}) {'history': 'nThe human greets the AI, to which the AI responds.'}

6. 会话摘要缓冲存储器

这种记忆类型结合了对话摘要和缓冲区，在最近的交互和摘要之间保持平衡。它使用令牌长度来确定何时刷新交互。

from langchain.memory import ConversationSummaryBufferMemory
from langchain.llms import OpenAI llm = OpenAI()
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({}) {'history': 'System: nThe human says "hi", and the AI responds with "whats up".nHuman: not much younAI: not much'}

您可以使用这些内存类型来增强与 Langchain 中的 AI 模型的交互。每种内存类型都有特定的用途，可以根据您的要求进行选择。

7. 会话令牌缓冲存储器

ConversationTokenBufferMemory 是另一种内存类型，它在内存中保存最近交互的缓冲区。与之前关注交互数量的内存类型不同，此内存类型使用令牌长度来确定何时刷新交互。

在 LLM 中使用内存：

from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI llm = OpenAI() memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"}) memory.load_memory_variables({}) {'history': 'Human: not much younAI: not much'}

在此示例中，内存被设置为根据令牌长度而不是交互次数来限制交互。

使用此内存类型时，您还可以以消息列表的形式获取历史记录。

memory = ConversationTokenBufferMemory( llm=llm, max_token_limit=10, return_messages=True
)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

在链中使用：

您可以在链中使用 ConversationTokenBufferMemory 来增强与 AI 模型的交互。

from langchain.chains import ConversationChain conversation_with_summary = ConversationChain( llm=llm, # We set a very low max_token_limit for the purposes of testing. memory=ConversationTokenBufferMemory(llm=OpenAI(), max_token_limit=60), verbose=True,
)
conversation_with_summary.predict(input="Hi, what's up?")

在此示例中，ConversationTokenBufferMemory 在 ConversationChain 中使用来管理对话并根据令牌长度限制交互。

8.VectorStoreRetrieverMemory

VectorStoreRetrieverMemory 将内存存储在向量存储中，并在每次调用时查询前 K 个最“显着”的文档。这种记忆类型不会显式地跟踪交互的顺序，而是使用向量检索来获取相关记忆。

from datetime import datetime
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate # Initialize your vector store (specifics depend on the chosen vector store)
import faiss
from langchain.docstore import InMemoryDocstore
from langchain.vectorstores import FAISS embedding_size = 1536 # Dimensions of the OpenAIEmbeddings
index = faiss.IndexFlatL2(embedding_size)
embedding_fn = OpenAIEmbeddings().embed_query
vectorstore = FAISS(embedding_fn, index, InMemoryDocstore({}), {}) # Create your VectorStoreRetrieverMemory
retriever = vectorstore.as_retriever(search_kwargs=dict(k=1))
memory = VectorStoreRetrieverMemory(retriever=retriever) # Save context and relevant information to the memory
memory.save_context({"input": "My favorite food is pizza"}, {"output": "that's good to know"})
memory.save_context({"input": "My favorite sport is soccer"}, {"output": "..."})
memory.save_context({"input": "I don't like the Celtics"}, {"output": "ok"}) # Retrieve relevant information from memory based on a query
print(memory.load_memory_variables({"prompt": "what sport should i watch?"})["history"])

在此示例中，VectorStoreRetrieverMemory 用于基于向量检索来存储和检索对话中的相关信息。

您还可以在链中使用 VectorStoreRetrieverMemory 进行基于对话的知识检索，如前面的示例所示。

Langchain 中的这些不同的内存类型提供了多种方式来管理和检索对话中的信息，从而增强了 AI 模型理解和响应用户查询和上下文的能力。可以根据应用的具体要求选择每种存储器类型。

现在我们将学习如何通过 LLMChain 使用内存。 LLMChain 中的内存允许模型记住之前的交互和上下文，以提供更连贯和上下文感知的响应。

要在LLMChain中设置内存，您需要创建一个内存类，例如ConversationBufferMemory。设置方法如下：

from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate template = """You are a chatbot having a conversation with a human. {chat_history}
Human: {human_input}
Chatbot:""" prompt = PromptTemplate( input_variables=["chat_history", "human_input"], template=template
)
memory = ConversationBufferMemory(memory_key="chat_history") llm = OpenAI()
llm_chain = LLMChain( llm=llm, prompt=prompt, verbose=True, memory=memory,
) llm_chain.predict(human_input="Hi there my friend")

在此示例中，ConversationBufferMemory 用于存储对话历史记录。这 memory_key 参数指定用于存储对话历史记录的密钥。

如果您使用聊天模型而不是完成式模型，则可以以不同的方式构建提示以更好地利用内存。以下是如何使用内存设置基于聊天模型的 LLMChain 的示例：

from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage
from langchain.prompts import ( ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder,
) # Create a ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages( [ SystemMessage( content="You are a chatbot having a conversation with a human." ), # The persistent system prompt MessagesPlaceholder( variable_name="chat_history" ), # Where the memory will be stored. HumanMessagePromptTemplate.from_template( "{human_input}" ), # Where the human input will be injected ]
) memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) llm = ChatOpenAI() chat_llm_chain = LLMChain( llm=llm, prompt=prompt, verbose=True, memory=memory,
) chat_llm_chain.predict(human_input="Hi there my friend")

在此示例中，ChatPromptTemplate 用于构建提示，ConversationBufferMemory 用于存储和检索对话历史记录。这种方法对于上下文和历史起着至关重要作用的聊天式对话特别有用。

内存还可以添加到具有多个输入的链中，例如问答链。以下是如何在问答链中设置内存的示例：

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch
from langchain.vectorstores import Chroma
from langchain.docstore.document import Document
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory # Split a long document into smaller chunks
with open("../../state_of_the_union.txt") as f: state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union) # Create an ElasticVectorSearch instance to index and search the document chunks
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_texts( texts, embeddings, metadatas=[{"source": i} for i in range(len(texts))]
) # Perform a question about the document
query = "What did the president say about Justice Breyer"
docs = docsearch.similarity_search(query) # Set up a prompt for the question-answering chain with memory
template = """You are a chatbot having a conversation with a human. Given the following extracted parts of a long document and a question, create a final answer. {context} {chat_history}
Human: {human_input}
Chatbot:""" prompt = PromptTemplate( input_variables=["chat_history", "human_input", "context"], template=template
)
memory = ConversationBufferMemory(memory_key="chat_history", input_key="human_input")
chain = load_qa_chain( OpenAI(temperature=0), chain_type="stuff", memory=memory, prompt=prompt
) # Ask the question and retrieve the answer
query = "What did the president say about Justice Breyer"
result = chain({"input_documents": docs, "human_input": query}, return_only_outputs=True) print(result)
print(chain.memory.buffer)

在此示例中，通过将文档拆分为较小的块来回答问题。 ConversationBufferMemory 用于存储和检索对话历史记录，允许模型提供上下文感知答案。

向代理添加内存使其能够记住并使用之前的交互来回答问题并提供上下文感知响应。以下是在代理中设置内存的方法：

from langchain.agents import ZeroShotAgent, Tool, AgentExecutor
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.utilities import GoogleSearchAPIWrapper # Create a tool for searching
search = GoogleSearchAPIWrapper()
tools = [ Tool( name="Search", func=search.run, description="useful for when you need to answer questions about current events", )
] # Create a prompt with memory
prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!" {chat_history}
Question: {input}
{agent_scratchpad}""" prompt = ZeroShotAgent.create_prompt( tools, prefix=prefix, suffix=suffix, input_variables=["input", "chat_history", "agent_scratchpad"],
)
memory = ConversationBufferMemory(memory_key="chat_history") # Create an LLMChain with memory
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_chain = AgentExecutor.from_agent_and_tools( agent=agent, tools=tools, verbose=True, memory=memory
) # Ask a question and retrieve the answer
response = agent_chain.run(input="How many people live in Canada?")
print(response) # Ask a follow-up question
response = agent_chain.run(input="What is their national anthem called?")
print(response)

在此示例中，为代理添加了内存，使其能够记住之前的对话历史记录并提供上下文感知答案。这使得代理能够根据内存中存储的信息准确回答后续问题。

浪链表达语言

在自然语言处理和机器学习领域，构建复杂的操作链可能是一项艰巨的任务。幸运的是，LangChain 表达式语言（LCEL）可以解决这个问题，它提供了一种声明性且有效的方法来构建和部署复杂的语言处理管道。 LCEL 旨在简化组成链的过程，使从原型设计到生产变得轻松。在本博客中，我们将探讨 LCEL 是什么以及您可能想要使用它的原因，并提供实际代码示例来说明其功能。

LCEL，即LangChain Expression Language，是构建语言处理链的强大工具。它是专门为支持从原型设计到生产的无缝过渡而设计的，无需进行大量代码更改。无论您是构建简单的“提示 + LLM”链还是具有数百个步骤的复杂管道，LCEL 都能满足您的需求。

以下是在语言处理项目中使用 LCEL 的一些原因：

快速令牌流：LCEL 将令牌从语言模型实时传送到输出解析器，从而提高响应能力和效率。
多功能 API：LCEL 支持用于原型设计和生产使用的同步和异步 API，从而有效处理多个请求。
自动并行化：LCEL 在可能的情况下优化并行执行，减少同步和异步接口中的延迟。
可靠的配置：配置重试和回退，以大规模增强链的可靠性，并在开发中提供流支持。
流式传输中间结果：在处理过程中访问中间结果以供用户更新或调试目的。
模式生成：LCEL 生成 Pydantic 和 JSONSchema 模式以进行输入和输出验证。
全面跟踪：LangSmith 自动跟踪复杂链中的所有步骤，以实现可观察性和调试。
轻松部署：使用 LangServe 轻松部署 LCEL 创建的链。

现在，让我们深入研究展示 LCEL 强大功能的实际代码示例。我们将探讨 LCEL 发挥作用的常见任务和场景。

提示+法学硕士

最基本的组成涉及将提示和语言模型组合起来，以创建一个链，该链接受用户输入，将其添加到提示中，将其传递给模型，然后返回原始模型输出。这是一个例子：

from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI prompt = ChatPromptTemplate.from_template("tell me a joke about {foo}")
model = ChatOpenAI()
chain = prompt | model result = chain.invoke({"foo": "bears"})
print(result)

在这个例子中，该链生成了一个关于熊的笑话。

您可以将停止序列附加到链上以控制它处理文本的方式。例如：

chain = prompt | model.bind(stop=["n"])
result = chain.invoke({"foo": "bears"})
print(result)

当遇到换行符时，此配置将停止文本生成。

LCEL 支持将函数调用信息附加到您的链上。这是一个例子：

functions = [ { "name": "joke", "description": "A joke", "parameters": { "type": "object", "properties": { "setup": {"type": "string", "description": "The setup for the joke"}, "punchline": { "type": "string", "description": "The punchline for the joke", }, }, "required": ["setup", "punchline"], }, }
]
chain = prompt | model.bind(function_call={"name": "joke"}, functions=functions)
result = chain.invoke({"foo": "bears"}, config={})
print(result)

此示例附加函数调用信息来生成笑话。

提示 + LLM + 输出解析器

您可以添加输出解析器将原始模型输出转换为更可行的格式。您可以这样做：

from langchain.schema.output_parser import StrOutputParser chain = prompt | model | StrOutputParser()
result = chain.invoke({"foo": "bears"})
print(result)

输出现在是字符串格式，这对于下游任务来说更方便。

当指定要返回的函数时，可以直接使用 LCEL 对其进行解析。例如：

from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser chain = ( prompt | model.bind(function_call={"name": "joke"}, functions=functions) | JsonOutputFunctionsParser()
)
result = chain.invoke({"foo": "bears"})
print(result)

此示例直接解析“joke”函数的输出。

这些只是 LCEL 如何简化复杂语言处理任务的几个示例。无论您是构建聊天机器人、生成内容还是执行复杂的文本转换，LCEL 都可以简化您的工作流程并使您的代码更易于维护。

RAG（检索增强生成）

LCEL 可用于创建检索增强生成链，它将检索和语言生成步骤结合起来。这是一个例子：

from operator import itemgetter from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda
from langchain.vectorstores import FAISS # Create a vector store and retriever
vectorstore = FAISS.from_texts( ["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever() # Define templates for prompts
template = """Answer the question based only on the following context:
{context} Question: {question} """
prompt = ChatPromptTemplate.from_template(template) model = ChatOpenAI() # Create a retrieval-augmented generation chain
chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | model | StrOutputParser()
) result = chain.invoke("where did harrison work?")
print(result)

在此示例中，链从上下文中检索相关信息并生成对问题的响应。

会话检索链

您可以轻松地将对话历史记录添加到您的链中。这是会话检索链的示例：

from langchain.schema.runnable import RunnableMap
from langchain.schema import format_document from langchain.prompts.prompt import PromptTemplate # Define templates for prompts
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language. Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template) template = """Answer the question based only on the following context:
{context} Question: {question} """
ANSWER_PROMPT = ChatPromptTemplate.from_template(template) # Define input map and context
_inputs = RunnableMap( standalone_question=RunnablePassthrough.assign( chat_history=lambda x: _format_chat_history(x["chat_history"]) ) | CONDENSE_QUESTION_PROMPT | ChatOpenAI(temperature=0) | StrOutputParser(),
)
_context = { "context": itemgetter("standalone_question") | retriever | _combine_documents, "question": lambda x: x["standalone_question"],
}
conversational_qa_chain = _inputs | _context | ANSWER_PROMPT | ChatOpenAI() result = conversational_qa_chain.invoke( { "question": "where did harrison work?", "chat_history": [], }
)
print(result)

在此示例中，该链处理对话上下文中的后续问题。

记忆和返回源文件

LCEL还支持内存和返回源文档。以下是如何使用链中的内存：

from operator import itemgetter
from langchain.memory import ConversationBufferMemory # Create a memory instance
memory = ConversationBufferMemory( return_messages=True, output_key="answer", input_key="question"
) # Define steps for the chain
loaded_memory = RunnablePassthrough.assign( chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
) standalone_question = { "standalone_question": { "question": lambda x: x["question"], "chat_history": lambda x: _format_chat_history(x["chat_history"]), } | CONDENSE_QUESTION_PROMPT | ChatOpenAI(temperature=0) | StrOutputParser(),
} retrieved_documents = { "docs": itemgetter("standalone_question") | retriever, "question": lambda x: x["standalone_question"],
} final_inputs = { "context": lambda x: _combine_documents(x["docs"]), "question": itemgetter("question"),
} answer = { "answer": final_inputs | ANSWER_PROMPT | ChatOpenAI(), "docs": itemgetter("docs"),
} # Create the final chain by combining the steps
final_chain = loaded_memory | standalone_question | retrieved_documents | answer inputs = {"question": "where did harrison work?"}
result = final_chain.invoke(inputs)
print(result)

在此示例中，内存用于存储和检索对话历史记录和源文档。

多链

您可以使用 Runnables 将多个链串在一起。这是一个例子：

from operator import itemgetter from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser prompt1 = ChatPromptTemplate.from_template("what is the city {person} is from?")
prompt2 = ChatPromptTemplate.from_template( "what country is the city {city} in? respond in {language}"
) model = ChatOpenAI() chain1 = prompt1 | model | StrOutputParser() chain2 = ( {"city": chain1, "language": itemgetter("language")} | prompt2 | model | StrOutputParser()
) result = chain2.invoke({"person": "obama", "language": "spanish"})
print(result)

在此示例中，两个链组合在一起以指定语言生成有关城市及其国家/地区的信息。

分支和合并

LCEL 允许您使用 RunnableMaps 拆分和合并链。这是分支和合并的示例：

from operator import itemgetter from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser planner = ( ChatPromptTemplate.from_template("Generate an argument about: {input}") | ChatOpenAI() | StrOutputParser() | {"base_response": RunnablePassthrough()}
) arguments_for = ( ChatPromptTemplate.from_template( "List the pros or positive aspects of {base_response}" ) | ChatOpenAI() | StrOutputParser()
)
arguments_against = ( ChatPromptTemplate.from_template( "List the cons or negative aspects of {base_response}" ) | ChatOpenAI() | StrOutputParser()
) final_responder = ( ChatPromptTemplate.from_messages( [ ("ai", "{original_response}"), ("human", "Pros:n{results_1}nnCons:n{results_2}"), ("system", "Generate a final response given the critique"), ] ) | ChatOpenAI() | StrOutputParser()
) chain = ( planner | { "results_1": arguments_for, "results_2": arguments_against, "original_response": itemgetter("base_response"), } | final_responder
) result = chain.invoke({"input": "scrum"})
print(result)

在此示例中，分支和合并链用于生成参数并在生成最终响应之前评估其优缺点。

使用 LCEL 编写 Python 代码

浪链表达式语言（LCEL）的强大应用之一是编写Python代码来解决用户问题。下面是如何使用 LCEL 编写 Python 代码的示例：

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain_experimental.utilities import PythonREPL template = """Write some python code to solve the user's problem. Return only python code in Markdown format, e.g.: ```python
....
```"""
prompt = ChatPromptTemplate.from_messages([("system", template), ("human", "{input}")]) model = ChatOpenAI() def _sanitize_output(text: str): _, after = text.split("```python") return after.split("```")[0] chain = prompt | model | StrOutputParser() | _sanitize_output | PythonREPL().run result = chain.invoke({"input": "what's 2 plus 2"})
print(result)

在此示例中，用户提供输入，LCEL 生成 Python 代码来解决问题。然后使用 Python REPL 执行代码，并以 Markdown 格式返回生成的 Python 代码。

请注意，使用 Python REPL 可以执行任意代码，因此请谨慎使用。

向链添加内存

记忆在许多对话式人工智能应用中至关重要。以下是向任意链添加内存的方法：

from operator import itemgetter
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder model = ChatOpenAI()
prompt = ChatPromptTemplate.from_messages( [ ("system", "You are a helpful chatbot"), MessagesPlaceholder(variable_name="history"), ("human", "{input}"), ]
) memory = ConversationBufferMemory(return_messages=True) # Initialize memory
memory.load_memory_variables({}) chain = ( RunnablePassthrough.assign( history=RunnableLambda(memory.load_memory_variables) | itemgetter("history") ) | prompt | model
) inputs = {"input": "hi, I'm Bob"}
response = chain.invoke(inputs)
response # Save the conversation in memory
memory.save_context(inputs, {"output": response.content}) # Load memory to see the conversation history
memory.load_memory_variables({})

在此示例中，内存用于存储和检索对话历史记录，使聊天机器人能够维护上下文并做出适当的响应。

将外部工具与可运行对象一起使用

LCEL 允许您将外部工具与 Runnable 无缝集成。以下是使用 DuckDuckGo 搜索工具的示例：

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.tools import DuckDuckGoSearchRun search = DuckDuckGoSearchRun() template = """Turn the following user input into a search query for a search engine: {input}"""
prompt = ChatPromptTemplate.from_template(template) model = ChatOpenAI() chain = prompt | model | StrOutputParser() | search search_result = chain.invoke({"input": "I'd like to figure out what games are tonight"})
print(search_result)

在此示例中，LCEL 将 DuckDuckGo 搜索工具集成到链中，使其能够根据用户输入生成搜索查询并检索搜索结果。

LCEL 的灵活性使您可以轻松地将各种外部工具和服务合并到您的语言处理管道中，从而增强其能力和功能。

为 LLM 申请添加审核

为了确保您的 LLM 申请遵守内容政策并包含审核保障措施，您可以将审核检查集成到您的链中。以下是使用 LangChain 添加审核的方法：

from langchain.chains import OpenAIModerationChain
from langchain.llms import OpenAI
from langchain.prompts import ChatPromptTemplate moderate = OpenAIModerationChain() model = OpenAI()
prompt = ChatPromptTemplate.from_messages([("system", "repeat after me: {input}")]) chain = prompt | model # Original response without moderation
response_without_moderation = chain.invoke({"input": "you are stupid"})
print(response_without_moderation) moderated_chain = chain | moderate # Response after moderation
response_after_moderation = moderated_chain.invoke({"input": "you are stupid"})
print(response_after_moderation)

在这个例子中， OpenAIModerationChain 用于对 LLM 生成的响应进行审核。审核链检查响应中是否存在违反 OpenAI 内容政策的内容。如果发现任何违规行为，它将相应地标记响应。

通过语义相似性进行路由

LCEL 允许您根据用户输入的语义相似性实现自定义路由逻辑。下面是如何根据用户输入动态确定链逻辑的示例：

from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableLambda, RunnablePassthrough
from langchain.utils.math import cosine_similarity physics_template = """You are a very smart physics professor. You are great at answering questions about physics in a concise and easy to understand manner. When you don't know the answer to a question you admit that you don't know. Here is a question:
{query}""" math_template = """You are a very good mathematician. You are great at answering math questions. You are so good because you are able to break down hard problems into their component parts, answer the component parts, and then put them together to answer the broader question. Here is a question:
{query}""" embeddings = OpenAIEmbeddings()
prompt_templates = [physics_template, math_template]
prompt_embeddings = embeddings.embed_documents(prompt_templates) def prompt_router(input): query_embedding = embeddings.embed_query(input["query"]) similarity = cosine_similarity([query_embedding], prompt_embeddings)[0] most_similar = prompt_templates[similarity.argmax()] print("Using MATH" if most_similar == math_template else "Using PHYSICS") return PromptTemplate.from_template(most_similar) chain = ( {"query": RunnablePassthrough()} | RunnableLambda(prompt_router) | ChatOpenAI() | StrOutputParser()
) print(chain.invoke({"query": "What's a black hole"}))
print(chain.invoke({"query": "What's a path integral"}))

在这个例子中， prompt_router 函数计算用户输入与物理和数学问题的预定义提示模板之间的余弦相似度。根据相似度得分，链动态选择最相关的提示模板，确保聊天机器人正确响应用户的问题。

使用代理和可运行对象

LangChain 允许您通过组合 Runnables、提示、模型和工具来创建代理。以下是构建代理并使用它的示例：

from langchain.agents import XMLAgent, tool, AgentExecutor
from langchain.chat_models import ChatAnthropic model = ChatAnthropic(model="claude-2") @tool
def search(query: str) -> str: """Search things about current events.""" return "32 degrees" tool_list = [search] # Get prompt to use
prompt = XMLAgent.get_default_prompt() # Logic for going from intermediate steps to a string to pass into the model
def convert_intermediate_steps(intermediate_steps): log = "" for action, observation in intermediate_steps: log += ( f"<tool>{action.tool}</tool><tool_input>{action.tool_input}" f"</tool_input><observation>{observation}</observation>" ) return log # Logic for converting tools to a string to go in the prompt
def convert_tools(tools): return "n".join([f"{tool.name}: {tool.description}" for tool in tools]) agent = ( { "question": lambda x: x["question"], "intermediate_steps": lambda x: convert_intermediate_steps( x["intermediate_steps"] ), } | prompt.partial(tools=convert_tools(tool_list)) | model.bind(stop=["</tool_input>", "</final_answer>"]) | XMLAgent.get_default_output_parser()
) agent_executor = AgentExecutor(agent=agent, tools=tool_list, verbose=True) result = agent_executor.invoke({"question": "What's the weather in New York?"})
print(result)

在此示例中，通过组合模型、工具、提示以及用于中间步骤和工具转换的自定义逻辑来创建代理。然后执行代理，提供对用户查询的响应。

查询 SQL 数据库

您可以使用LangChain查询SQL数据库并根据用户问题生成SQL查询。这是一个例子：

from langchain.prompts import ChatPromptTemplate template = """Based on the table schema below, write a SQL query that would answer the user's question:
{schema} Question: {question}
SQL Query:"""
prompt = ChatPromptTemplate.from_template(template) from langchain.utilities import SQLDatabase # Initialize the database (you'll need the Chinook sample DB for this example)
db = SQLDatabase.from_uri("sqlite:///./Chinook.db") def get_schema(_): return db.get_table_info() def run_query(query): return db.run(query) from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough model = ChatOpenAI() sql_response = ( RunnablePassthrough.assign(schema=get_schema) | prompt | model.bind(stop=["nSQLResult:"]) | StrOutputParser()
) result = sql_response.invoke({"question": "How many employees are there?"})
print(result) template = """Based on the table schema below, question, SQL query, and SQL response, write a natural language response:
{schema} Question: {question}
SQL Query: {query}
SQL Response: {response}"""
prompt_response = ChatPromptTemplate.from_template(template) full_chain = ( RunnablePassthrough.assign(query=sql_response) | RunnablePassthrough.assign( schema=get_schema, response=lambda x: db.run(x["query"]), ) | prompt_response | model
) response = full_chain.invoke({"question": "How many employees are there?"})
print(response)

在此示例中，LangChain 用于根据用户问题生成 SQL 查询并从 SQL 数据库检索响应。提示和响应经过格式化以提供与数据库的自然语言交互。

使用由 Nanonets 为您和您的团队设计的人工智能驱动的工作流程构建器，自动执行手动任务和工作流程。

LangServe 和 LangSmith

LangServe 帮助开发人员将 LangChain 可运行对象和链部署为 REST API。该库与 FastAPI 集成，并使用 pydantic 进行数据验证。此外，它还提供了一个客户端，可用于调用部署在服务器上的可运行对象，并且 LangChainJS 中提供了 JavaScript 客户端。

特征

输入和输出模式是从 LangChain 对象中自动推断出来的，并在每个 API 调用上强制执行，并带有丰富的错误消息。
包含 JSONSchema 和 Swagger 的 API 文档页面可用。
高效的 /invoke、/batch 和 /stream 端点，支持单个服务器上的许多并发请求。
/stream_log 端点，用于流式传输链/代理中的所有（或部分）中间步骤。
Playground 页面位于 /playground，具有流输出和中间步骤。
内置（可选）跟踪 LangSmith；只需添加您的 API 密钥（请参阅说明）。
所有这些都是使用经过实战考验的开源 Python 库（例如 FastAPI、Pydantic、uvloop 和 asyncio）构建的。

限制

对于源自服务器的事件尚不支持客户端回调。
使用 Pydantic V2 时不会生成 OpenAPI 文档。 FastAPI 不支持混合 pydantic v1 和 v2 命名空间。请参阅以下部分了解更多详细信息。

使用 LangChain CLI 快速启动 LangServe 项目。要使用 langchain CLI，请确保您安装了最新版本的 langchain-cli。您可以使用 pip install -U langchain-cli 安装它。

langchain app new ../path/to/directory

使用 LangChain 模板快速启动您的 LangServe 实例。有关更多示例，请参阅模板索引或示例目录。

这是一个部署 OpenAI 聊天模型、Anthropic 聊天模型和一个使用 Anthropic 模型讲述某个主题的笑话的链的服务器。

#!/usr/bin/env python
from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatAnthropic, ChatOpenAI
from langserve import add_routes app = FastAPI( title="LangChain Server", version="1.0", description="A simple api server using Langchain's Runnable interfaces",
) add_routes( app, ChatOpenAI(), path="/openai",
) add_routes( app, ChatAnthropic(), path="/anthropic",
) model = ChatAnthropic()
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
add_routes( app, prompt | model, path="/chain",
) if __name__ == "__main__": import uvicorn uvicorn.run(app, host="localhost", port=8000)

部署上述服务器后，您可以使用以下命令查看生成的 OpenAPI 文档：

curl localhost:8000/docs

确保添加 /docs 后缀。

from langchain.schema import SystemMessage, HumanMessage
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableMap
from langserve import RemoteRunnable openai = RemoteRunnable("http://localhost:8000/openai/")
anthropic = RemoteRunnable("http://localhost:8000/anthropic/")
joke_chain = RemoteRunnable("http://localhost:8000/chain/") joke_chain.invoke({"topic": "parrots"}) # or async
await joke_chain.ainvoke({"topic": "parrots"}) prompt = [ SystemMessage(content='Act like either a cat or a parrot.'), HumanMessage(content='Hello!')
] # Supports astream
async for msg in anthropic.astream(prompt): print(msg, end="", flush=True) prompt = ChatPromptTemplate.from_messages( [("system", "Tell me a long story about {topic}")]
) # Can define custom chains
chain = prompt | RunnableMap({ "openai": openai, "anthropic": anthropic,
}) chain.batch([{ "topic": "parrots" }, { "topic": "cats" }])

在 TypeScript 中（需要 LangChain.js 版本 0.0.166 或更高版本）：

import { RemoteRunnable } from "langchain/runnables/remote"; const chain = new RemoteRunnable({ url: `http://localhost:8000/chain/invoke/`,
});
const result = await chain.invoke({ topic: "cats",
});

Python 使用请求：

import requests
response = requests.post( "http://localhost:8000/chain/invoke/", json={'input': {'topic': 'cats'}}
)
response.json()

您还可以使用卷曲：

curl --location --request POST 'http://localhost:8000/chain/invoke/' --header 'Content-Type: application/json' --data-raw '{ "input": { "topic": "cats" } }'

如下代码：

...
add_routes( app, runnable, path="/my_runnable",
)

将这些端点添加到服务器：

POST /my_runnable/invoke – 在单个输入上调用可运行程序
POST /my_runnable/batch – 在一批输入上调用 runnable
POST /my_runnable/stream – 调用单个输入并流式传输输出
POST /my_runnable/stream_log – 调用单个输入并流式传输输出，包括生成的中间步骤的输出
GET /my_runnable/input_schema – 可运行输入的 json 架构
GET /my_runnable/output_schema – 可运行的输出的 json 模式
GET /my_runnable/config_schema – 可运行配置的 json 架构

您可以在 /my_runnable/playground 找到您的 runnable 的 Playground 页面。这公开了一个简单的 UI，用于配置和调用具有流输出和中间步骤的可运行对象。

对于客户端和服务器：

pip install "langserve[all]"

或 pip install “langserve[client]” 用于客户端代码，pip install “langserve[server]” 用于服务器代码。

如果您需要向服务器添加身份验证，请参考FastAPI的安全文档和中间件文档。

您可以使用以下命令部署到 GCP Cloud Run：

gcloud run deploy [your-service-name] --source . --port 8001 --allow-unauthenticated --region us-central1 --set-env-vars=OPENAI_API_KEY=your_key

LangServe 提供对 Pydantic 2 的支持，但有一些限制。使用 Pydantic V2 时，不会为 invoke/batch/stream/stream_log 生成 OpenAPI 文档。 Fast API 不支持混合 pydantic v1 和 v2 命名空间。 LangChain 在 Pydantic v1 中使用 v2 命名空间。请阅读以下指南以确保与 LangChain 的兼容性。除了这些限制之外，我们希望 API 端点、playground 和任何其他功能都能按预期工作。

LLM 应用程序经常处理文件。可以采用不同的架构来实现文件处理；高层次上：

文件可以通过专用端点上传到服务器并使用单独的端点进行处理。
文件可以通过值（文件字节）或引用（例如，文件内容的 s3 url）上传。
处理端点可以是阻塞的或非阻塞的。
如果需要大量处理，则可以将处理卸载到专用进程池。

您应该确定适合您的应用程序的架构。目前，要将文件按值上传到可运行文件，请对文件使用 base64 编码（尚不支持 multipart/form-data）。

这里有一个例子显示如何使用 base64 编码将文件发送到远程可运行程序。请记住，您始终可以通过引用（例如，s3 url）上传文件或将它们作为多部分/表单数据上传到专用端点。

输入和输出类型在所有可运行对象上定义。您可以通过 input_schema 和 output_schema 属性访问它们。 LangServe 使用这些类型进行验证和记录。如果要覆盖默认的推断类型，可以使用 with_types 方法。

这是一个演示这个想法的玩具示例：

from typing import Any
from fastapi import FastAPI
from langchain.schema.runnable import RunnableLambda app = FastAPI() def func(x: Any) -> int: """Mistyped function that should accept an int but accepts anything.""" return x + 1 runnable = RunnableLambda(func).with_types( input_schema=int,
) add_routes(app, runnable)

如果您希望数据反序列化为 pydantic 模型而不是等效的 dict 表示形式，请继承自 CustomUserType。目前，此类型仅适用于服务器端，并用于指定所需的解码行为。如果继承此类型，服务器会将解码后的类型保留为 pydantic 模型，而不是将其转换为字典。

from fastapi import FastAPI
from langchain.schema.runnable import RunnableLambda
from langserve import add_routes
from langserve.schema import CustomUserType app = FastAPI() class Foo(CustomUserType): bar: int def func(foo: Foo) -> int: """Sample function that expects a Foo type which is a pydantic model""" assert isinstance(foo, Foo) return foo.bar add_routes(app, RunnableLambda(func), path="/foo")

Playground 允许您从后端为可运行对象定义自定义小部件。小部件在字段级别指定，并作为输入类型的 JSON 架构的一部分提供。小部件必须包含一个名为 type 的键，其值是众所周知的小部件列表之一。其他小部件键将与描述 JSON 对象中的路径的值关联。

总体架构：

type JsonPath = number | string | (number | string)[];
type NameSpacedPath = { title: string; path: JsonPath }; // Using title to mimic json schema, but can use namespace
type OneOfPath = { oneOf: JsonPath[] }; type Widget = { type: string // Some well-known type (e.g., base64file, chat, etc.) [key: string]: JsonPath | NameSpacedPath | OneOfPath;
};

允许在 UI Playground 中为作为 Base64 编码字符串上传的文件创建文件上传输入。这是完整的示例。

try: from pydantic.v1 import Field
except ImportError: from pydantic import Field from langserve import CustomUserType # ATTENTION: Inherit from CustomUserType instead of BaseModel otherwise
# the server will decode it into a dict instead of a pydantic model.
class FileProcessingRequest(CustomUserType): """Request including a base64 encoded file.""" # The extra field is used to specify a widget for the playground UI. file: str = Field(..., extra={"widget": {"type": "base64file"}}) num_chars: int = 100

使用由 Nanonets 为您和您的团队设计的人工智能驱动的工作流程构建器，自动执行手动任务和工作流程。

兰史密斯简介

LangChain 可以轻松构建 LLM 应用程序和代理的原型。然而，将 LLM 应用程序投入生产可能看似困难。您可能需要对提示、链和其他组件进行大量自定义和迭代才能创建高质量的产品。

为了帮助这一过程，引入了 LangSmith，这是一个用于调试、测试和监控 LLM 应用程序的统一平台。

这什么时候可以派上用场？当您想要快速调试新的链、代理或工具集、可视化组件（链、llms、检索器等）如何关联和使用、评估单个组件的不同提示和 LLM 时，您可能会发现它很有用，在数据集上多次运行给定的链，以确保其始终满足质量标准，或者捕获使用跟踪并使用 LLM 或分析管道来生成见解。

前提课程：

创建 LangSmith 帐户并创建 API 密钥（请参见左下角）。
通过查看文档来熟悉该平台。

现在，让我们开始吧！

首先，配置环境变量以告诉 LangChain 记录跟踪。这是通过将 LANGCHAIN_TRACING_V2 环境变量设置为 true 来完成的。您可以通过设置LANGCHAIN_PROJECT环境变量来告诉LangChain要登录到哪个项目（如果未设置，运行将被记录到默认项目）。如果项目不存在，这将自动为您创建该项目。您还必须设置 LANGCHAIN_ENDPOINT 和 LANGCHAIN_API_KEY 环境变量。

注意：您还可以使用 python 中的上下文管理器来记录跟踪：

from langchain.callbacks.manager import tracing_v2_enabled with tracing_v2_enabled(project_name="My Project"): agent.run("How many people live in canada as of 2023?")

但是，在本示例中，我们将使用环境变量。

%pip install openai tiktoken pandas duckduckgo-search --quiet import os
from uuid import uuid4 unique_id = uuid4().hex[0:8]
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"Tracing Walkthrough - {unique_id}"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "<YOUR-API-KEY>" # Update to your API key # Used by the agent in this tutorial
os.environ["OPENAI_API_KEY"] = "<YOUR-OPENAI-API-KEY>"

创建 LangSmith 客户端以与 API 交互：

from langsmith import Client client = Client()

创建LangChain组件并将运行日志记录到平台。在此示例中，我们将创建一个 ReAct 风格的代理，可以访问通用搜索工具 (DuckDuckGo)。可以在此处的 Hub 中查看代理的提示：

from langchain import hub
from langchain.agents import AgentExecutor
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.chat_models import ChatOpenAI
from langchain.tools import DuckDuckGoSearchResults
from langchain.tools.render import format_tool_to_openai_function # Fetches the latest version of this prompt
prompt = hub.pull("wfh/langsmith-agent-prompt:latest") llm = ChatOpenAI( model="gpt-3.5-turbo-16k", temperature=0,
) tools = [ DuckDuckGoSearchResults( name="duck_duck_go" ), # General internet search using DuckDuckGo
] llm_with_tools = llm.bind(functions=[format_tool_to_openai_function(t) for t in tools]) runnable_agent = ( { "input": lambda x: x["input"], "agent_scratchpad": lambda x: format_to_openai_function_messages( x["intermediate_steps"] ), } | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser()
) agent_executor = AgentExecutor( agent=runnable_agent, tools=tools, handle_parsing_errors=True
)

我们在多个输入上同时运行代理以减少延迟。运行会在后台记录到 LangSmith，因此执行延迟不受影响：

inputs = [ "What is LangChain?", "What's LangSmith?", "When was Llama-v2 released?", "What is the langsmith cookbook?", "When did langchain first announce the hub?",
] results = agent_executor.batch([{"input": x} for x in inputs], return_exceptions=True) results[:2]

假设您已成功设置环境，您的代理跟踪应显示在应用程序的“项目”部分中。恭喜！

但看起来代理并没有有效地使用这些工具。让我们对此进行评估，以便我们有一个基线。

除了记录运行之外，LangSmith 还允许您测试和评估您的 LLM 申请。

在本部分中，您将利用 LangSmith 创建基准数据集并在代理上运行 AI 辅助评估器。您将通过几个步骤完成此操作：

创建 LangSmith 数据集：

下面，我们使用 LangSmith 客户端根据上面的输入问题和列表标签创建一个数据集。稍后您将使用这些来衡量新代理的性能。数据集是示例的集合，它们只不过是可用作应用程序测试用例的输入输出对：

outputs = [ "LangChain is an open-source framework for building applications using large language models. It is also the name of the company building LangSmith.", "LangSmith is a unified platform for debugging, testing, and monitoring language model applications and agents powered by LangChain", "July 18, 2023", "The langsmith cookbook is a github repository containing detailed examples of how to use LangSmith to debug, evaluate, and monitor large language model-powered applications.", "September 5, 2023",
] dataset_name = f"agent-qa-{unique_id}" dataset = client.create_dataset( dataset_name, description="An example dataset of questions over the LangSmith documentation.",
) for query, answer in zip(inputs, outputs): client.create_example( inputs={"input": query}, outputs={"output": answer}, dataset_id=dataset.id )

初始化一个新代理来进行基准测试：

LangSmith 可让您评估任何 LLM、链条、代理，甚至自定义函数。会话代理是有状态的（它们有记忆）；为了确保这个状态不会在数据集运行之间共享，我们将传入一个 chain_factory (

又名构造函数）为每次调用初始化的函数：

# Since chains can be stateful (e.g. they can have memory), we provide
# a way to initialize a new chain for each row in the dataset. This is done
# by passing in a factory function that returns a new chain for each row.
def agent_factory(prompt): llm_with_tools = llm.bind( functions=[format_tool_to_openai_function(t) for t in tools] ) runnable_agent = ( { "input": lambda x: x["input"], "agent_scratchpad": lambda x: format_to_openai_function_messages( x["intermediate_steps"] ), } | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser() ) return AgentExecutor(agent=runnable_agent, tools=tools, handle_parsing_errors=True)

配置评估：

在 UI 中手动比较链的结果是有效的，但可能很耗时。使用自动化指标和人工智能辅助反馈来评估组件的性能会很有帮助：

from langchain.evaluation import EvaluatorType
from langchain.smith import RunEvalConfig evaluation_config = RunEvalConfig( evaluators=[ EvaluatorType.QA, EvaluatorType.EMBEDDING_DISTANCE, RunEvalConfig.LabeledCriteria("helpfulness"), RunEvalConfig.LabeledScoreString( { "accuracy": """
Score 1: The answer is completely unrelated to the reference.
Score 3: The answer has minor relevance but does not align with the reference.
Score 5: The answer has moderate relevance but contains inaccuracies.
Score 7: The answer aligns with the reference but has minor errors or omissions.
Score 10: The answer is completely accurate and aligns perfectly with the reference.""" }, normalize_by=10, ), ], custom_evaluators=[],
)

运行代理和评估器：

使用 run_on_dataset （或异步 arun_on_dataset）函数来评估您的模型。这会：

从指定数据集中获取示例行。
在每个示例上运行您的代理（或任何自定义函数）。
将评估器应用于生成的运行跟踪和相应的参考示例，以生成自动反馈。

结果将在 LangSmith 应用程序中可见：

chain_results = run_on_dataset( dataset_name=dataset_name, llm_or_chain_factory=functools.partial(agent_factory, prompt=prompt), evaluation=evaluation_config, verbose=True, client=client, project_name=f"runnable-agent-test-5d466cbc-{unique_id}", tags=[ "testing-notebook", "prompt:5d466cbc", ],
)

现在我们有了测试运行结果，我们可以对代理进行更改并对它们进行基准测试。让我们用不同的提示再次尝试并查看结果：

candidate_prompt = hub.pull("wfh/langsmith-agent-prompt:39f3bbd0") chain_results = run_on_dataset( dataset_name=dataset_name, llm_or_chain_factory=functools.partial(agent_factory, prompt=candidate_prompt), evaluation=evaluation_config, verbose=True, client=client, project_name=f"runnable-agent-test-39f3bbd0-{unique_id}", tags=[ "testing-notebook", "prompt:39f3bbd0", ],
)

LangSmith 允许您直接在 Web 应用程序中将数据导出为常见格式，例如 CSV 或 JSONL。您还可以使用客户端获取运行以进行进一步分析、存储在您自己的数据库中或与其他人共享。让我们从评估运行中获取运行跟踪：

runs = client.list_runs(project_name=chain_results["project_name"], execution_order=1) # After some time, these will be populated.
client.read_project(project_name=chain_results["project_name"]).feedback_stats

这是一个快速入门指南，但还有更多方法可以使用 LangSmith 来加快开发流程并产生更好的结果。

有关如何充分利用 LangSmith 的更多信息，请查看 LangSmith 文档。

与纳米网一起升级

虽然 LangChain 是将语言模型 (LLM) 与应用程序集成的宝贵工具，但在企业用例方面它可能面临限制。让我们探讨 Nanonets 如何超越 LangChain 来应对这些挑战：

1. 全面的数据连接：
LangChain提供连接器，但它可能无法涵盖企业依赖的所有工作区应用程序和数据格式。 Nanonets 为 100 多个广泛使用的工作区应用程序提供数据连接器，包括 Slack、Notion、Google Suite、Salesforce、Zendesk 等。它还支持所有非结构化数据类型，如 PDF、TXT、图像、音频文件和视频文件，以及结构化数据类型，如 CSV、电子表格、MongoDB 和 SQL 数据库。

2. 工作区应用程序的任务自动化：
虽然文本/响应生成效果很好，但在使用自然语言在各种应用程序中执行任务时，LangChain 的功能受到限制。 Nanonets 为最流行的工作区应用程序提供触发器/操作代理，允许您设置侦听事件和执行操作的工作流程。例如，您可以通过自然语言命令自动执行电子邮件响应、CRM 条目、SQL 查询等。

3. 实时数据同步：
LangChain通过数据连接器获取静态数据，这可能无法跟上源数据库中的数据变化。相比之下，Nanonets 可确保与数据源实时同步，确保您始终使用最新信息。

3. 简化配置：
配置 LangChain 管道的元素（例如检索器和合成器）可能是一个复杂且耗时的过程。 Nanonets 通过为每种数据类型提供优化的数据摄取和索引来简化这一过程，所有这些都由 AI 助手在后台处理。这减少了微调的负担，并使其更易于设置和使用。

4、统一解决方案：
与 LangChain 不同，LangChain 可能需要针对每项任务进行独特的实现，而 Nanonets 可以作为将您的数据与 LLM 连接的一站式解决方案。无论您需要创建 LLM 应用程序还是 AI 工作流程，Nanonets 都能提供满足您多样化需求的统一平台。

Nanonets 人工智能工作流程

Nanonets Workflows 是一款安全、多用途的人工智能助手，可简化您的知识和数据与法学硕士的集成，并促进无代码应用程序和工作流程的创建。它提供了易于使用的用户界面，使个人和组织都可以使用它。

首先，您可以安排与我们的一位 AI 专家通话，他们可以根据您的特定用例提供个性化的 Nanonets 工作流程演示和试用。

设置完成后，您可以使用自然语言来设计和执行由法学硕士支持的复杂应用程序和工作流程，与您的应用程序和数据无缝集成。

利用 Nanonets AI 为您的团队提供强大支持，以创建应用程序并将您的数据与 AI 驱动的应用程序和工作流程集成，让您的团队能够专注于真正重要的事情。

使用由 Nanonets 为您和您的团队设计的人工智能驱动的工作流程构建器，自动执行手动任务和工作流程。

SEO 支持的内容和 PR 分发。今天得到放大。
PlatoData.Network 垂直生成人工智能。赋予自己力量。访问这里。
柏拉图爱流。 Web3 智能。知识放大。访问这里。
柏拉图ESG。碳，清洁科技, 能源，环境，太阳能，废物管理。访问这里。
柏拉图健康。生物技术和临床试验情报。访问这里。
Sumber: https://nanonets.com/blog/langchain/

时间戳记： 2023 年 11 月 15 日

时间戳记： 2022 年 6 月 20 日

由柏拉图重新发布

2023 年使用 Node JS 进行网页抓取

5 年排名前 2023 的发票管理软件

2023 年最佳电子邮件解析器

10 Beste OCR 软件 | OCR Texterkennung Vergleich

什么是发票处理？ – 发票处理说明

深入了解自动入职培训及其如何改变行业

关于我们

垂直搜索和Ai

应用平台

保持联系

账号管理

了解浪链

安装与设定

模块 I：模型 I/O

模型 I/O 的关键组件

法学硕士

聊天模型

提示

输出解析器

模块二：检索

文档加载器

示例一——Slack

示例二——Figma

文档转换器

文本嵌入模型

矢量商店

猎犬

模块 III：代理

工具

DuckDuckGo

搜索引擎优化数据

外壳（bash）

返回代理

代理类型

预建代理

模块 IV：链条

模块五：内存

浪链表达语言

提示+法学硕士

提示 + LLM + 输出解析器

RAG（检索增强生成）

会话检索链

记忆和返回源文件

多链

分支和合并

使用 LCEL 编写 Python 代码

向链添加内存

将外部工具与可运行对象一起使用

为 LLM 申请添加审核

通过语义相似性进行路由

使用代理和可运行对象

查询 SQL 数据库

LangServe 和 LangSmith

兰史密斯简介

与纳米网一起升级

Nanonets 人工智能工作流程

更多来自 人工智能与机器学习

关于我们

垂直搜索和Ai

应用平台

保持联系

账号管理

更多来自人工智能与机器学习