Model Module Overview
The ARKOS Model Module provides an asynchronous interface for communicating with LLMs using the AsyncOpenAI client. It’s designed to work with SGLANG servers and any OpenAI-compatible endpoint.Core Components
ArkModelLink
Main class for LLM communication using AsyncOpenAI
Message Classes
Pydantic models for different message types
Async Support
Non-blocking I/O for better performance
Streaming
Real-time token streaming for responsive UX
Architecture
Message Classes
All messages extend the baseMessage Pydantic model:
ArkModelLink Class
Initialization
Configuration Options
| Parameter | Default | Description |
|---|---|---|
model_name | "tgi" | Model identifier for the API |
base_url | "http://0.0.0.0:30000/v1" | LLM server endpoint |
max_tokens | 1024 | Maximum response tokens |
temperature | 0.7 | Creativity (0-2) |
Core Methods
generate_response()
Main method for getting LLM responses:generate_stream()
Stream tokens as they’re generated:make_llm_call()
Low-level method for API calls:Structured Output
Use JSON schemas for structured responses:Message Formatting
Messages are automatically converted to OpenAI format:Integration with Agent
The Agent module uses ArkModelLink throughcall_llm():
AsyncOpenAI Client
The module uses AsyncOpenAI internally:Configuration via YAML
Configure the LLM endpoint inconfig_module/config.yaml:
Error Handling
Streaming Implementation
SGLANG Server
The model module is designed to work with SGLANG:Verify Server
Testing
Basic test example:Best Practices
- Use async/await: All LLM calls should be awaited
- Set appropriate timeouts: Prevent hanging on slow responses
- Handle errors gracefully: Catch exceptions and provide fallbacks
- Use streaming for UX: Better user experience for long responses
- Validate schemas: Test JSON schemas before production use
Troubleshooting
Connection refused
Connection refused
Ensure SGLANG server is running:
Slow responses
Slow responses
Check GPU utilization and model loading:
JSON parsing errors
JSON parsing errors
Verify your schema matches expected output: