工具呼叫(又稱函式呼叫)
瞭解如何使用 Firebase AI Logic SDK 實現工具呼叫、管理智慧體迴圈以及整合“人在迴路”(human-in-the-loop)互動。
雖然大語言模型(LLM)確實是在整個網際網路的資料上進行訓練的,但它們並非無所不知。它們只瞭解訓練截止日期前的公開網際網路資訊,而不瞭解任何更新的資訊。它們也不瞭解任何屬於你個人或組織的私有資料。即便對於它們已知的資訊,有時也容易出現混淆。
針對這些以及許多其他場景,我們通常會為 LLM 提供一個或多個工具。
工具的定義
#工具包含名稱、描述以及用於規定 LLM“呼叫”工具時輸入資料格式的 JSON 模式(JSON schema)。例如,如果我們提示 LLM“減少奶奶的經典全美早餐食譜中的碳水化合物”,除非我們提供一個接收查詢字串並用於查詢食譜的 "lookupRecipe" 工具,否則它不會知道奶奶的食譜是什麼。
從概念上講,工具就是當 LLM 需要特定資料或服務時,我們提供給它使用的東西。LLM 呼叫工具的方式是嚮應用的請求回覆一條特殊格式的訊息,意為“工具呼叫”。工具呼叫訊息中包含工具的名稱和 JSON 引數。應用處理該工具呼叫,並將結果打包到下一次 LLM 請求中,LLM 隨後會對該請求做出響應。
這個過程可能會持續一段時間。應用可以使用任意數量的工具來配置模型例項(儘管 LLM 在使用功能不重疊、目標明確的小型工具集時表現更好)。LLM 可以在其響應中打包任意數量的工具呼叫,並在請求中接收任意數量的工具結果。LLM 透過一個訊息棧來合併多次提示詞與工具呼叫的往返過程,該棧構成了請求/響應對的歷史記錄。
當工具呼叫完成後,LLM 會返回其最終響應,例如:“這是為您改良後的奶奶經典全美早餐食譜,它具備高蛋白、低碳水化合物的特點……”
Gemini 函式
#在 Firebase AI Logic SDK 中,工具被稱為“函式”(function),但本質是一樣的。在示例中,線索求解模型被配置了一個用於查詢單詞詳情的函式。如果 LLM 需要關於某個單詞的詳情來輔助解題過程,呼叫該函式即可從 Free Dictionary API 獲取資料。
[
{
"word": "tool",
"phonetic": "/tuːl/",
"phonetics": [
{
"text": "/tuːl/",
"audio": "https://api.dictionaryapi.dev/media/pronunciations/en/tool-uk.mp3",
"sourceUrl": "https://commons.wikimedia.org/w/index.php?curid=94709459",
"license": {
"name": "BY-SA 4.0",
"url": "https://creativecommons.org/licenses/by-sa/4.0"
}
}
],
"meanings": [
{
"partOfSpeech": "noun",
"definitions": [
{
"definition": "A mechanical device intended to make a task easier.",
"synonyms": [],
"antonyms": [],
"example": "Hand me that tool, would you? I don't have the right tools to start fiddling around with the engine."
},
...
應用擁有一個執行查詢任務的 Dart 函式
// Look up the metadata for a word in the dictionary API.
Future<Map<String, dynamic>> _getWordMetadataFromApi(String word) async {
final url = Uri.parse(
'https://api.dictionaryapi.dev/api/v2/entries/en/${Uri.encodeComponent(word)}',
);
final response = await http.get(url);
return response.statusCode == 200
? {'result': jsonDecode(response.body)}
: {'error': 'Could not find a definition for "$word".'};
}
模型在初始化時被配置了查詢函式
// The model for solving clues.
_clueSolverModel = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.5-flash',
systemInstruction: Content.text(clueSolverSystemInstruction),
tools: [
Tool.functionDeclarations([
FunctionDeclaration(
'getWordMetadata',
'Gets grammatical metadata for a word, like its part of speech. '
'Best used to verify a candidate answer against a clue that implies a '
'grammatical constraint.',
parameters: {
'word': Schema(SchemaType.string, description: 'The word to look up.'),
},
),
]),
],
);
為了可靠起見,最好也在系統指令中列出這些工具
static String get clueSolverSystemInstruction =>
'''
You are an expert crossword puzzle solver.
...
### Tool: `getWordMetadata`
You have a tool to get grammatical information about a word.
**When to use:**
- This tool is most helpful as a verification step after you have a likely answer.
- Consider using this tool when a clue contains a grammatical hint that could be ambiguous.
- **Good candidates for verification:**
- Clues that seem to be verbs (e.g., "To run," "Waving").
- Clues that are adverbs (e.g., "Happily," "Quickly").
- Clues that specify a plural form.
- **Try to avoid using the tool for:**
- Simple definitions (e.g., "A small dog").
- Fill-in-the-blank clues (e.g., "___ and flow").
- Proper nouns (e.g., "Capital of France").
**Function signature:**
```json
${jsonEncode(_getWordMetadataFunction.toJson())}
```
''';
當應用發出請求時,模型現在擁有了一個可以在認為有幫助時使用的工具。為了支援工具呼叫,我們需要實現一個智慧體迴圈(agentic loop)。
智慧體迴圈
#LLM 在功能上是無狀態的,這意味著你必須在每次請求時提供它所需的所有資料。對於僅包含提示詞和任何你想要傳送的檔案請求,Firebase AI Logic SDK 在你的模型例項上公開了 generateContent 方法。
然而,工具呼叫需要保留構成初始提示詞的訊息歷史記錄,以及組成工具呼叫和工具結果的響應/請求對。為了支援這一點,Firebase Logic AI 提供了一個“聊天”(chat)物件來收集歷史記錄。我們利用它來構建智慧體迴圈:
- 啟動一個聊天會話,以儲存跨多個請求/響應對的訊息歷史記錄
- 收集其提供的任何工具呼叫的工具結果
- 將工具結果打包到一個新的請求中
- 迴圈執行,直到模型在沒有工具呼叫的情況下給出響應
- 返回在所有響應中累積的文字
以下是將上述演算法作為 GenerativeModel 類的一個擴充套件方法來實現的程式碼,這樣我們就可以像呼叫 generateContent 一樣呼叫它
extension on GenerativeModel {
Future<String> generateContentWithFunctions({
required String prompt,
required Future<Map<String, dynamic>> Function(FunctionCall) onFunctionCall,
}) async {
// Use a chat session to support multiple request/response pairs, which is
// needed to support function calls.
final chat = startChat();
final buffer = StringBuffer();
var response = await chat.sendMessage(Content.text(prompt));
while (true) {
// Append the response text to the buffer.
buffer.write(response.text ?? '');
// If no function calls were collected, we're done
if (response.functionCalls.isEmpty) break;
// Append a newline to separate responses.
buffer.write('\n');
// Execute all function calls
final functionResponses = <FunctionResponse>[];
for (final functionCall in response.functionCalls) {
try {
functionResponses.add(
FunctionResponse(
functionCall.name,
await onFunctionCall(functionCall),
),
);
} catch (ex) {
functionResponses.add(
FunctionResponse(functionCall.name, {'error': ex.toString()}),
);
}
}
// Get the next response stream with function results
response = await chat.sendMessage(
Content.functionResponses(functionResponses),
);
}
return buffer.toString();
}
}
該方法接收一個提示詞和一個用於處理特定工具呼叫的回撥函式,示例中呼叫該回調來處理單詞查詢函式
await _clueSolverModel.generateContentWithFunctions(
prompt: getSolverPrompt(clue, length, pattern),
onFunctionCall: (functionCall) async => switch (functionCall.name) {
'getWordMetadata' => await _getWordMetadataFromApi(
functionCall.args['word'] as String,
),
_ => throw Exception('Unknown function call: ${functionCall.name}'),
},
);
結構化輸出使 LLM 易於程式設計,但正是工具將 LLM 轉變為“智慧體”(關於此點的更多內容請參閱“互動模式”部分)。
結構化輸出與工具呼叫
#將結構化輸出與工具呼叫結合使用會產生強大的效果。在示例中,線索求解器有一個查詢單詞詳情的工具。它還被要求返回一個包含解決方案和置信度分數的 JSON,這兩者都會在應用的列表任務中顯示。
遺憾的是,在撰寫本文時,在使用 Firebase AI Logic SDK 同時使用結構化輸出和函式會產生異常。
Function calling with a response mime type: 'application/json' is unsupported
作為對此問題的(希望是暫時的)變通方案,示例移除了結構化輸出配置,改用名為 returnResult 的工具來模擬結構化輸出。
// The model for solving clues.
_clueSolverModel = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.5-flash',
systemInstruction: Content.text(clueSolverSystemInstruction),
tools: [
Tool.functionDeclarations([
...,
FunctionDeclaration(
'returnResult',
'Returns the final result of the clue solving process.',
parameters: {
'answer': Schema(
SchemaType.string,
description: 'The answer to the clue.',
),
'confidence': Schema(
SchemaType.number,
description: 'The confidence score in the answer from 0.0 to 1.0.',
),
},
),
]),
],
);
returnResult 方法也在系統指令中提及
static String get clueSolverSystemInstruction =>
'''
You are an expert crossword puzzle solver.
...
### Tool: `returnResult`
You have a tool to return the final result of the clue solving process.
**When to use:**
- Use this tool when you have a final answer and confidence score to return. You
must use this tool exactly once, and only once, to return the final result.
**Function signature:**
```json
${jsonEncode(_returnResultFunction.toJson())}
```
''';
當模型呼叫 returnResult 時,示例會快取該結果,並在呼叫 generateContentWithFunctions 後由 solveClue 進行查詢。
// Buffer for the result of the clue solving process.
final _returnResult = <String, dynamic>{};
// Cache the return result of the clue solving process via a function call.
// This is how we get JSON responses from the model with functions, since the
// model cannot return JSON directly when tools are used.
Map<String, dynamic> _cacheReturnResult(Map<String, dynamic> returnResult) {
assert(_returnResult.isEmpty);
_returnResult.addAll(returnResult);
return {'status': 'success'};
}
Future<ClueAnswer?> solveClue(Clue clue, int length, String pattern) async {
// Clear the return result cache; this is where the result will be stored.
_returnResult.clear();
// Generate JSON response with functions and schema.
await _clueSolverModel.generateContentWithFunctions(
prompt: getSolverPrompt(clue, length, pattern),
onFunctionCall: (functionCall) async => switch (functionCall.name) {
'getWordMetadata' => ...,
'returnResult' => _cacheReturnResult(functionCall.args),
_ => throw Exception('Unknown function call: ${functionCall.name}'),
},
);
// Use the structured output that the LLM has called function with
assert(_returnResult.isNotEmpty);
return ClueAnswer(
answer: _returnResult['answer'] as String,
confidence: (_returnResult['confidence'] as num).toDouble(),
);
}
雖然使用 Firebase AI Logic 實現結構化輸出和工具呼叫的組合需要多做一些工作,但結果是值得的!
人在迴路
#到目前為止,我們已經看到了工具在收集資料和格式化輸出方面的應用。我們還可以利用它們引入人為參與。
舉個例子,有時示例傳入一個解決方案應遵循的模式——比如 "_R_Y"——而模型建議的答案卻不符合此模式——比如 "RENT"。這種衝突是請求使用者幫助的好時機。
這被稱為“人在迴路”(human in the loop),是人類與 LLM 協作的另一種方式。Flutter 和 Firebase AI Logic SDK 可以輕鬆實現這一點。首先,示例定義了一個函式並配置了模型:
// The new function to let the LLM resolve solution conflicts
static final _resolveConflictFunction = FunctionDeclaration(
'resolveConflict',
'Asks the user to resolve a conflict between the letter pattern and the '
'proposed answer. Use this BEFORE calling returnResult if the answer you '
'want to propose does not match the letter pattern.',
parameters: {
'proposedAnswer': Schema(
SchemaType.string,
description: 'The answer the LLM wants to suggest.',
),
'pattern': Schema(
SchemaType.string,
description: 'The current letter pattern from the grid.',
),
'clue': Schema(SchemaType.string, description: 'The clue text.'),
},
);
// Pass the new tool to the model for solving clues.
final _clueSolverModel = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.5-flash',
systemInstruction: Content.text(clueSolverSystemInstruction),
tools: [
Tool.functionDeclarations([
...
_resolveConflictFunction,
]),
],
);
// Let the LLM know that it has a new tool.
static String get clueSolverSystemInstruction =>
'''
You are an expert crossword puzzle solver.
...
### Tool: `resolveConflict`
You have a tool to ask the user to resolve a conflict.
**When to use:**
- Use this tool **BEFORE** `returnResult` if your proposed answer conflicts with the provided letter pattern.
- For example, if the pattern is `_ R _ Y` and you want to suggest `RENT` (which fits the clue), there is a conflict at the second letter (`R` vs `E`). You should call `resolveConflict(proposedAnswer: "RENT", pattern: "_ R _ Y", clue: "...")`.
- The tool will return the user's decision (either your proposed answer or a new one). You should then use that result to call `returnResult`.
**Function signature:**
```json
${jsonEncode(_resolveConflictFunction.toJson())}
```
''';
現在,當模型檢測到衝突時,它就會呼叫該工具。
// handle the LLM's request to resolve the conflict
await _clueSolverModel.generateContentWithFunctions(
prompt: getSolverPrompt(clue, length, pattern),
onFunctionCall: (functionCall) async => switch (functionCall.name) {
...
'resolveConflict' => await _handleResolveConflict(
functionCall.args,
onConflict,
),
},
);
// Show the dialog to gather the user's input
Future<Map<String, dynamic>> _handleResolveConflict(
Map<String, dynamic> args,
Future<String> Function(String clue, String proposedAnswer, String pattern)?
onConflict,
) async {
final proposedAnswer = args['proposedAnswer'] as String;
final pattern = args['pattern'] as String;
final clue = args['clue'] as String;
if (onConflict != null) {
final result = await onConflict(clue, proposedAnswer, pattern);
return {'result': result};
}
return {'result': proposedAnswer};
}
示例透過 onConflict 方法的實現來處理該工具,該方法會呼叫 showDialog 以從使用者處收集資料。這一切都發生在智慧體迴圈中間,但這沒問題——模型並沒有在等待;它已經將其響應傳送回了應用的初始請求。使用者可以在 UI 上慢慢操作,而示例則等待 showDialog 返回的 Future。當用戶完成後,模型會使用訊息歷史記錄和最新的請求(在此案例中是互動式收集的使用者資料)從中斷處繼續執行。
模態對話方塊是“人在迴路”的一種簡單方式,但這並不是 Flutter 中實現該功能的唯一途徑。如果你願意,可以使用 Completer 例項,透過設定應用狀態使其進入“從使用者處收集資料”模式。當應用獲得資料後,它可以對 Completer 呼叫 complete 並恢復智慧體迴圈。
或者,由於你掌控著智慧體迴圈,你可以檢查是否呼叫了指示需要從使用者處收集資料的“特殊”函式。這種特殊函式有時被稱為“中斷”(interrupt),當你獲取到使用者資料後,可以“恢復”與模型的對話。
記住,LLM 是無狀態的。它並沒有在等待你,所以你可以用對你的應用最有意義的任何方式來處理智慧體迴圈。無論是一分鐘後還是一個月後,你都可以隨時帶著更新後的訊息歷史記錄和新的提示詞返回 LLM。