GitHub Copilot Chat Integration
This guide describes how @selfagency/llm-stream-parser can be integrated with GitHub Copilot Chat extensions and chat hosts.
Goals
- Provide composable parsing primitives for streaming LLM responses in Copilot Chat
- Enable structured output extraction (thinking, tool calls, JSON schemas)
- Support extensible stream processing for chat-based workflows
- Maintain compatibility with multiple model providers (Claude, GPT, local models via Ollama)
Integration Patterns
Usage with LLMStreamProcessor
typescript
const processor = new LLMStreamProcessor({
parseThinkTags: true,
scrubContextTags: true,
knownTools: new Set(['search', 'edit_file']),
});
processor.on('thinking', delta => {
// Stream thinking to UI in real-time
updateThinkingPanel(delta);
});
processor.on('text', delta => {
// Stream content to UI
updateContentPanel(delta);
});
processor.on('tool_call', call => {
// Execute tool calls
executeToolInCopilot(call.name, call.parameters);
});
for await (const chunk of chatStream) {
const output = processor.process({
content: chunk.content,
thinking: chunk.thinking,
done: chunk.done,
});
}
// Get final accumulated state
const final = processor.accumulatedMessage;Streaming in Chat UI
Process chunks immediately without buffering:
typescript
import { ThinkingParser, createXmlStreamFilter } from '@selfagency/llm-stream-parser';
const thinking = new ThinkingParser({ openingTag: '<think>', closingTag: '</think>' });
const filter = createXmlStreamFilter({ enforcePrivacyTags: true });
for await (const chunk of chatStream) {
// Extract thinking and regular content
const [thinkingDelta, contentDelta] = thinking.addContent(chunk);
if (thinkingDelta) {
updateThinkingPanel(thinkingDelta);
}
// Filter context blocks before display
const filtered = filter.write(contentDelta);
if (filtered) {
updateChatDisplay(filtered);
}
}
// Finalize streams
const [finalThinking, finalContent] = thinking.flush();
updateThinkingPanel(finalThinking);
const finalFiltered = filter.end();
updateChatDisplay(finalFiltered);Tool Call Routing
Extract and execute structured tool calls:
typescript
import { extractXmlToolCalls } from '@selfagency/llm-stream-parser';
const response = await chatCompletion(messages);
const toolCalls = extractXmlToolCalls(
response,
new Set(['search_codebase', 'edit_file', 'run_tests', 'execute_command']),
);
for (const call of toolCalls) {
const result = await executeToolInHost(call.name, call.parameters);
// Feed result back to chat context
messages.push({ role: 'user', content: `Tool ${call.name} returned: ${result}` });
}Schema Validation with Retry
Validate structured outputs and prompt for repairs:
typescript
import { parseJson, validateJsonSchema, buildRepairPrompt } from '@selfagency/llm-stream-parser';
const schema = {
type: 'object',
properties: {
suggestions: {
type: 'array',
items: { type: 'string' },
},
},
};
let response = await chatCompletion(messages);
let parsed = parseJson(response);
if (parsed === null) {
console.error('Failed to parse JSON');
return;
}
let validation = validateJsonSchema(JSON.stringify(parsed), schema);
if (!validation.success) {
// Build repair prompt
const repairPrompt = buildRepairPrompt({
failedOutput: response,
error: validation.errors[0],
schema,
originalPrompt: messages[messages.length - 1].content,
});
// Ask model to fix
messages.push({ role: 'user', content: repairPrompt });
response = await chatCompletion(messages);
parsed = parseJson(response);
if (parsed === null) {
console.error('Failed to parse JSON on retry');
return;
}
validation = validateJsonSchema(JSON.stringify(parsed), schema);
}
if (validation.success) {
return validation.data;
}Model-Specific Considerations
Claude (Anthropic)
- Supports
<think>...</think>natively - Tool use via XML format
- Use
ThinkingParserwith default settings
typescript
const processor = new LLMStreamProcessor({
modelId: 'claude-opus', // Auto-detects thinking tags
parseThinkTags: true,
knownTools: new Set(toolNames),
});GPT Models (OpenAI)
- Supports
<think>...</think>format via system prompts - Tool calls via function calling (separate from response)
- May need to parse tool calls from response text
typescript
const processor = new LLMStreamProcessor({
thinkingOpenTag: '<think>',
thinkingCloseTag: '</think>',
parseThinkTags: true,
knownTools: new Set(toolNames),
});Local Models (Ollama)
- Varies by model; check model-specific documentation
- Common patterns:
<think>...</think>,<reasoning>...</reasoning> - Configure tag mapping in processor options
typescript
const processor = new LLMStreamProcessor({
modelId: 'deepseek', // Auto-detects for known models
parseThinkTags: true,
knownTools: new Set(toolNames),
});
// Custom configuration for unknown models
const processor2 = new LLMStreamProcessor({
thinkingOpenTag: '<reasoning>',
thinkingCloseTag: '</reasoning>',
parseThinkTags: true,
knownTools: new Set(toolNames),
});Safety Invariants
- Only process trusted context - Context blocks extracted as elevated context should only come from known sources
- Enable privacy scrubbing - Keep privacy-related XML tags scrubbed by default to avoid leaking sensitive data
- Enforce limits - Use
maxJsonDepthandmaxJsonKeysto prevent DoS via deeply nested structures - Validate at boundaries - Always validate structured output at the chat→tool interface
Feature Flags and Rollout
For safe integration into existing Copilot Chat hosts:
typescript
// Feature flag for gradual rollout
const useLLMStreamParser = features.isEnabled('@selfagency/llm-stream-parser');
const processor = useLLMStreamParser ? new LLMStreamProcessor(config) : legacyParsingPath(config);
// Run both paths in tests for parity verification
if (process.env.VERIFY_PARITY) {
const legacyResult = legacyParsingPath(response);
const newResult = processor.flush();
if (JSON.stringify(legacyResult) !== JSON.stringify(newResult)) {
logParityMismatch('streams_not_equal', { legacyResult, newResult });
}
}Performance Tips
- Stream processing - Process chunks immediately instead of buffering entire responses
- Subpath imports - Only import what you need:
import { ThinkingParser } from '@selfagency/llm-stream-parser/thinking' - Limit tuning - Adjust
maxJsonDepthandmaxJsonKeysbased on expected response types - Caching - Cache parsed schemas and processors across multiple chat turns
Debugging
Enable diagnostics for stream processing using the onWarning hook:
typescript
const processor = new LLMStreamProcessor({
onWarning: (message, context) => {
console.log(`[@selfagency/llm-stream-parser] ${message}`, context);
},
...config,
});