跳到主內容

工具呼叫(又稱函式呼叫)

瞭解如何使用 Firebase AI Logic SDK 實現工具呼叫、管理智慧體迴圈以及整合“人在迴路”(human-in-the-loop)互動。

雖然大語言模型(LLM)確實是在整個網際網路的資料上進行訓練的,但它們並非無所不知。它們只瞭解訓練截止日期前的公開網際網路資訊,而不瞭解任何更新的資訊。它們也不瞭解任何屬於你個人或組織的私有資料。即便對於它們已知的資訊,有時也容易出現混淆。

針對這些以及許多其他場景,我們通常會為 LLM 提供一個或多個工具。

工具的定義

#

工具包含名稱、描述以及用於規定 LLM“呼叫”工具時輸入資料格式的 JSON 模式(JSON schema)。例如,如果我們提示 LLM“減少奶奶的經典全美早餐食譜中的碳水化合物”,除非我們提供一個接收查詢字串並用於查詢食譜的 "lookupRecipe" 工具,否則它不會知道奶奶的食譜是什麼。

從概念上講,工具就是當 LLM 需要特定資料或服務時,我們提供給它使用的東西。LLM 呼叫工具的方式是嚮應用的請求回覆一條特殊格式的訊息,意為“工具呼叫”。工具呼叫訊息中包含工具的名稱和 JSON 引數。應用處理該工具呼叫,並將結果打包到下一次 LLM 請求中,LLM 隨後會對該請求做出響應。

這個過程可能會持續一段時間。應用可以使用任意數量的工具來配置模型例項(儘管 LLM 在使用功能不重疊、目標明確的小型工具集時表現更好)。LLM 可以在其響應中打包任意數量的工具呼叫,並在請求中接收任意數量的工具結果。LLM 透過一個訊息棧來合併多次提示詞與工具呼叫的往返過程,該棧構成了請求/響應對的歷史記錄。

當工具呼叫完成後,LLM 會返回其最終響應,例如:“這是為您改良後的奶奶經典全美早餐食譜,它具備高蛋白、低碳水化合物的特點……”

Gemini 函式

#

在 Firebase AI Logic SDK 中,工具被稱為“函式”(function),但本質是一樣的。在示例中,線索求解模型被配置了一個用於查詢單詞詳情的函式。如果 LLM 需要關於某個單詞的詳情來輔助解題過程,呼叫該函式即可從 Free Dictionary API 獲取資料。

json
[
  {
    "word": "tool",
    "phonetic": "/tuːl/",
    "phonetics": [
      {
        "text": "/tuːl/",
        "audio": "https://api.dictionaryapi.dev/media/pronunciations/en/tool-uk.mp3",
        "sourceUrl": "https://commons.wikimedia.org/w/index.php?curid=94709459",
        "license": {
          "name": "BY-SA 4.0",
          "url": "https://creativecommons.org/licenses/by-sa/4.0"
        }
      }
    ],
    "meanings": [
      {
        "partOfSpeech": "noun",
        "definitions": [
          {
            "definition": "A mechanical device intended to make a task easier.",
            "synonyms": [],
            "antonyms": [],
            "example": "Hand me that tool, would you?   I don't have the right tools to start fiddling around with the engine."
          },
...

應用擁有一個執行查詢任務的 Dart 函式

dart
// Look up the metadata for a word in the dictionary API.
Future<Map<String, dynamic>> _getWordMetadataFromApi(String word) async {
  final url = Uri.parse(
    'https://api.dictionaryapi.dev/api/v2/entries/en/${Uri.encodeComponent(word)}',
  );

  final response = await http.get(url);
  return response.statusCode == 200
      ? {'result': jsonDecode(response.body)}
      : {'error': 'Could not find a definition for "$word".'};
}

模型在初始化時被配置了查詢函式

dart
// The model for solving clues.
_clueSolverModel = FirebaseAI.googleAI().generativeModel(
  model: 'gemini-2.5-flash',
  systemInstruction: Content.text(clueSolverSystemInstruction),
  tools: [
    Tool.functionDeclarations([
      FunctionDeclaration(
        'getWordMetadata',
        'Gets grammatical metadata for a word, like its part of speech. '
        'Best used to verify a candidate answer against a clue that implies a '
        'grammatical constraint.',
        parameters: {
           'word': Schema(SchemaType.string, description: 'The word to look up.'),
         },
       ),
    ]),
  ],
);

為了可靠起見,最好也在系統指令中列出這些工具

dart
static String get clueSolverSystemInstruction =>
    '''
You are an expert crossword puzzle solver.

...

### Tool: `getWordMetadata`

You have a tool to get grammatical information about a word.

**When to use:**
- This tool is most helpful as a verification step after you have a likely answer.
- Consider using this tool when a clue contains a grammatical hint that could be ambiguous.
- **Good candidates for verification:**
  - Clues that seem to be verbs (e.g., "To run," "Waving").
  - Clues that are adverbs (e.g., "Happily," "Quickly").
  - Clues that specify a plural form.
- **Try to avoid using the tool for:**
  - Simple definitions (e.g., "A small dog").
  - Fill-in-the-blank clues (e.g., "___ and flow").
  - Proper nouns (e.g., "Capital of France").

**Function signature:**
```json
${jsonEncode(_getWordMetadataFunction.toJson())}
```
''';

當應用發出請求時,模型現在擁有了一個可以在認為有幫助時使用的工具。為了支援工具呼叫,我們需要實現一個智慧體迴圈(agentic loop)。

智慧體迴圈

#

LLM 在功能上是無狀態的,這意味著你必須在每次請求時提供它所需的所有資料。對於僅包含提示詞和任何你想要傳送的檔案請求,Firebase AI Logic SDK 在你的模型例項上公開了 generateContent 方法。

然而,工具呼叫需要保留構成初始提示詞的訊息歷史記錄,以及組成工具呼叫和工具結果的響應/請求對。為了支援這一點,Firebase Logic AI 提供了一個“聊天”(chat)物件來收集歷史記錄。我們利用它來構建智慧體迴圈:

  • 啟動一個聊天會話,以儲存跨多個請求/響應對的訊息歷史記錄
  • 收集其提供的任何工具呼叫的工具結果
  • 將工具結果打包到一個新的請求中
  • 迴圈執行,直到模型在沒有工具呼叫的情況下給出響應
  • 返回在所有響應中累積的文字

以下是將上述演算法作為 GenerativeModel 類的一個擴充套件方法來實現的程式碼,這樣我們就可以像呼叫 generateContent 一樣呼叫它

dart
extension on GenerativeModel {
  Future<String> generateContentWithFunctions({
    required String prompt,
    required Future<Map<String, dynamic>> Function(FunctionCall) onFunctionCall,
  }) async {
    // Use a chat session to support multiple request/response pairs, which is
    // needed to support function calls.
    final chat = startChat();
    final buffer = StringBuffer();
    var response = await chat.sendMessage(Content.text(prompt));

    while (true) {
      // Append the response text to the buffer.
      buffer.write(response.text ?? '');

      // If no function calls were collected, we're done
      if (response.functionCalls.isEmpty) break;

      // Append a newline to separate responses.
      buffer.write('\n');

      // Execute all function calls
      final functionResponses = <FunctionResponse>[];
      for (final functionCall in response.functionCalls) {
        try {
          functionResponses.add(
            FunctionResponse(
              functionCall.name,
              await onFunctionCall(functionCall),
            ),
          );
        } catch (ex) {
          functionResponses.add(
            FunctionResponse(functionCall.name, {'error': ex.toString()}),
          );
        }
      }

      // Get the next response stream with function results
      response = await chat.sendMessage(
        Content.functionResponses(functionResponses),
      );
    }

    return buffer.toString();
  }
}

該方法接收一個提示詞和一個用於處理特定工具呼叫的回撥函式,示例中呼叫該回調來處理單詞查詢函式

dart
await _clueSolverModel.generateContentWithFunctions(
  prompt: getSolverPrompt(clue, length, pattern),
  onFunctionCall: (functionCall) async => switch (functionCall.name) {
    'getWordMetadata' => await _getWordMetadataFromApi(
      functionCall.args['word'] as String,
    ),
    _ => throw Exception('Unknown function call: ${functionCall.name}'),
  },
);

結構化輸出使 LLM 易於程式設計,但正是工具將 LLM 轉變為“智慧體”(關於此點的更多內容請參閱“互動模式”部分)。

結構化輸出與工具呼叫

#

將結構化輸出與工具呼叫結合使用會產生強大的效果。在示例中,線索求解器有一個查詢單詞詳情的工具。它還被要求返回一個包含解決方案和置信度分數的 JSON,這兩者都會在應用的列表任務中顯示。

App task list showing crossword clues followed by bold answers and
confidence scores in parentheses

遺憾的是,在撰寫本文時,在使用 Firebase AI Logic SDK 同時使用結構化輸出和函式會產生異常。

Function calling with a response mime type: 'application/json' is unsupported

作為對此問題的(希望是暫時的)變通方案,示例移除了結構化輸出配置,改用名為 returnResult 的工具來模擬結構化輸出。

dart
 // The model for solving clues.
_clueSolverModel = FirebaseAI.googleAI().generativeModel(
  model: 'gemini-2.5-flash',
  systemInstruction: Content.text(clueSolverSystemInstruction),
  tools: [
    Tool.functionDeclarations([
      ...,
      FunctionDeclaration(
        'returnResult',
        'Returns the final result of the clue solving process.',
        parameters: {
        'answer': Schema(
          SchemaType.string,
          description: 'The answer to the clue.',
        ),
        'confidence': Schema(
          SchemaType.number,
          description: 'The confidence score in the answer from 0.0 to 1.0.',
          ),
        },
      ),
    ]),
  ],
);

returnResult 方法也在系統指令中提及

dart
static String get clueSolverSystemInstruction =>
    '''
You are an expert crossword puzzle solver.

...

### Tool: `returnResult`

You have a tool to return the final result of the clue solving process.

**When to use:**
- Use this tool when you have a final answer and confidence score to return. You
must use this tool exactly once, and only once, to return the final result.

**Function signature:**
```json
${jsonEncode(_returnResultFunction.toJson())}
```
''';

當模型呼叫 returnResult 時,示例會快取該結果,並在呼叫 generateContentWithFunctions 後由 solveClue 進行查詢。

dart
// Buffer for the result of the clue solving process.
final _returnResult = <String, dynamic>{};

// Cache the return result of the clue solving process via a function call.
// This is how we get JSON responses from the model with functions, since the
// model cannot return JSON directly when tools are used.
Map<String, dynamic> _cacheReturnResult(Map<String, dynamic> returnResult) {
  assert(_returnResult.isEmpty);
  _returnResult.addAll(returnResult);
  return {'status': 'success'};
}

Future<ClueAnswer?> solveClue(Clue clue, int length, String pattern) async {
  // Clear the return result cache; this is where the result will be stored.
  _returnResult.clear();

  // Generate JSON response with functions and schema.
  await _clueSolverModel.generateContentWithFunctions(
    prompt: getSolverPrompt(clue, length, pattern),
    onFunctionCall: (functionCall) async => switch (functionCall.name) {
      'getWordMetadata' => ...,
      'returnResult' => _cacheReturnResult(functionCall.args),
      _ => throw Exception('Unknown function call: ${functionCall.name}'),
    },
  );

  // Use the structured output that the LLM has called function with
  assert(_returnResult.isNotEmpty);
  return ClueAnswer(
    answer: _returnResult['answer'] as String,
    confidence: (_returnResult['confidence'] as num).toDouble(),
  );
}

雖然使用 Firebase AI Logic 實現結構化輸出和工具呼叫的組合需要多做一些工作,但結果是值得的!

人在迴路

#

到目前為止,我們已經看到了工具在收集資料和格式化輸出方面的應用。我們還可以利用它們引入人為參與。

舉個例子,有時示例傳入一個解決方案應遵循的模式——比如 "_R_Y"——而模型建議的答案卻不符合此模式——比如 "RENT"。這種衝突是請求使用者幫助的好時機。
Crossword Companion app displaying a Conflict Detected dialog asking for
user input to resolve a clue pattern
這被稱為“人在迴路”(human in the loop),是人類與 LLM 協作的另一種方式。Flutter 和 Firebase AI Logic SDK 可以輕鬆實現這一點。首先,示例定義了一個函式並配置了模型:

dart

// The new function to let the LLM resolve solution conflicts
static final _resolveConflictFunction = FunctionDeclaration(
  'resolveConflict',
  'Asks the user to resolve a conflict between the letter pattern and the '
  'proposed answer. Use this BEFORE calling returnResult if the answer you '
  'want to propose does not match the letter pattern.',
  parameters: {
    'proposedAnswer': Schema(
      SchemaType.string,
      description: 'The answer the LLM wants to suggest.',
    ),
    'pattern': Schema(
      SchemaType.string,
      description: 'The current letter pattern from the grid.',
    ),
    'clue': Schema(SchemaType.string, description: 'The clue text.'),
  },
);

// Pass the new tool to the model for solving clues.
final _clueSolverModel = FirebaseAI.googleAI().generativeModel(
  model: 'gemini-2.5-flash',
  systemInstruction: Content.text(clueSolverSystemInstruction),
  tools: [
    Tool.functionDeclarations([
      ...
      _resolveConflictFunction,
    ]),
  ],
);
// Let the LLM know that it has a new tool.
static String get clueSolverSystemInstruction =>
    '''
You are an expert crossword puzzle solver.

...

### Tool: `resolveConflict`

You have a tool to ask the user to resolve a conflict.

**When to use:**
- Use this tool **BEFORE** `returnResult` if your proposed answer conflicts with the provided letter pattern.
- For example, if the pattern is `_ R _ Y` and you want to suggest `RENT` (which fits the clue), there is a conflict at the second letter (`R` vs `E`). You should call `resolveConflict(proposedAnswer: "RENT", pattern: "_ R _ Y", clue: "...")`.
- The tool will return the user's decision (either your proposed answer or a new one). You should then use that result to call `returnResult`.

**Function signature:**
```json
${jsonEncode(_resolveConflictFunction.toJson())}
```
''';

現在,當模型檢測到衝突時,它就會呼叫該工具。

dart
// handle the LLM's request to resolve the conflict
await _clueSolverModel.generateContentWithFunctions(
  prompt: getSolverPrompt(clue, length, pattern),
  onFunctionCall: (functionCall) async => switch (functionCall.name) {
    ...
    'resolveConflict' => await _handleResolveConflict(
      functionCall.args,
      onConflict,
    ),
  },
);

// Show the dialog to gather the user's input
Future<Map<String, dynamic>> _handleResolveConflict(
  Map<String, dynamic> args,
  Future<String> Function(String clue, String proposedAnswer, String pattern)?
  onConflict,
) async {
  final proposedAnswer = args['proposedAnswer'] as String;
  final pattern = args['pattern'] as String;
  final clue = args['clue'] as String;

  if (onConflict != null) {
    final result = await onConflict(clue, proposedAnswer, pattern);
    return {'result': result};
  }

  return {'result': proposedAnswer};
}

示例透過 onConflict 方法的實現來處理該工具,該方法會呼叫 showDialog 以從使用者處收集資料。這一切都發生在智慧體迴圈中間,但這沒問題——模型並沒有在等待;它已經將其響應傳送回了應用的初始請求。使用者可以在 UI 上慢慢操作,而示例則等待 showDialog 返回的 Future。當用戶完成後,模型會使用訊息歷史記錄和最新的請求(在此案例中是互動式收集的使用者資料)從中斷處繼續執行。

模態對話方塊是“人在迴路”的一種簡單方式,但這並不是 Flutter 中實現該功能的唯一途徑。如果你願意,可以使用 Completer 例項,透過設定應用狀態使其進入“從使用者處收集資料”模式。當應用獲得資料後,它可以對 Completer 呼叫 complete 並恢復智慧體迴圈。

或者,由於你掌控著智慧體迴圈,你可以檢查是否呼叫了指示需要從使用者處收集資料的“特殊”函式。這種特殊函式有時被稱為“中斷”(interrupt),當你獲取到使用者資料後,可以“恢復”與模型的對話。

記住,LLM 是無狀態的。它並沒有在等待你,所以你可以用對你的應用最有意義的任何方式來處理智慧體迴圈。無論是一分鐘後還是一個月後,你都可以隨時帶著更新後的訊息歷史記錄和新的提示詞返回 LLM。