Testing Guide
Comprehensive testing is coming soon! This guide outlines the planned testing strategy for ARKOS.Current Testing Status
ARKOS currently has only basic model functionality tests. A comprehensive testing suite with unit tests, integration tests, and system tests is planned for future releases.
Planned Testing Philosophy (Coming soon!)
ARKOS will follow a test-driven development (TDD) approach with comprehensive coverage across all modules. We aim for >80% code coverage with meaningful tests.
Current Test Structure
Copy
tests/
└── test_model_module.py # Basic model tests only
Planned Test Structure (Coming soon!)
Copy
# Future comprehensive test structure (Coming soon!)
tests/
├── conftest.py # Shared fixtures and configuration
├── unit/ # Unit tests for individual components
│ ├── test_state/
│ ├── test_memory/
│ ├── test_model/
│ └── test_tool/
├── integration/ # Integration tests
│ ├── test_agent_flow.py
│ ├── test_memory_persistence.py
│ └── test_model_routing.py
├── system/ # End-to-end system tests
│ ├── test_complete_conversation.py
│ └── test_multi_agent.py
├── performance/ # Performance and load tests
│ ├── test_memory_performance.py
│ └── test_model_latency.py
└── fixtures/ # Test data and mocks
├── sample_conversations.json
└── mock_responses.yaml
Setting Up Testing
Current Testing Setup
Basic testing only - comprehensive testing coming soon!Copy
# Current basic testing (if available)
python -m pytest tests/test_model_module.py
Planned Testing Setup (Coming soon!)
Copy
# Future comprehensive testing setup (Coming soon!)
pip install pytest pytest-asyncio pytest-cov pytest-mock pytest-benchmark httpx
Configuration
Createpytest.ini in project root:
Copy
[pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
# Markers
markers =
unit: Unit tests
integration: Integration tests
system: System tests
slow: Slow tests
requires_gpu: Tests requiring GPU
requires_api_key: Tests requiring API keys
# Coverage
addopts =
--cov=.
--cov-report=html
--cov-report=term-missing
--verbose
# Async
asyncio_mode = auto
Writing Tests (Coming soon!)
The following test examples are planned for future implementation.Planned Unit Tests
Future unit tests will test individual components in isolation:Copy
# tests/unit/test_state/test_state_handler.py
import pytest
from state_module.state_handler import StateHandler
from state_module.state import StateContext
class TestStateHandler:
@pytest.fixture
def handler(self):
"""Create state handler for testing"""
return StateHandler(yaml_path="tests/fixtures/test_state_graph.yaml")
@pytest.fixture
def context(self):
"""Create test context"""
return StateContext(user_input="test input")
def test_initial_state(self, handler):
"""Test that handler starts with correct initial state"""
initial = handler.get_initial_state()
assert initial.name == "greeting"
assert not initial.is_terminal
def test_state_transition(self, handler, context):
"""Test state transitions work correctly"""
current_state = handler.get_initial_state()
next_state = handler.get_next_state(current_state.name, context)
assert next_state is not None
assert next_state.name == "waiting_for_input"
@pytest.mark.parametrize("input_text,expected_state", [
("hello", "greeting_response"),
("help", "help_state"),
("exit", "goodbye")
])
def test_conditional_transitions(self, handler, input_text, expected_state):
"""Test conditional state transitions"""
context = StateContext(user_input=input_text)
next_state = handler.get_next_state("router", context)
assert next_state.name == expected_state
Planned Integration Tests (Coming soon!)
Future integration tests will test component interactions:Copy
# tests/integration/test_agent_flow.py
import pytest
from unittest.mock import Mock, patch
from agent_module.agent import Agent
from model_module.ArkModelNew import ArkModel
from memory_module.memory import Memory
class TestAgentFlow:
@pytest.fixture
def agent(self):
"""Create agent with mocked dependencies"""
model = Mock(spec=ArkModel)
memory = Mock(spec=Memory)
return Agent(model=model, memory=memory)
@pytest.mark.asyncio
async def test_conversation_flow(self, agent):
"""Test complete conversation flow"""
# Mock model responses
agent.model.generate.return_value = Mock(
content="Hello! How can I help you?",
success=True
)
# Mock memory operations
agent.memory.search.return_value = []
# Process user message
response = await agent.process_message("Hello")
# Verify interactions
assert agent.model.generate.called
assert agent.memory.store.called
assert "Hello" in response
@patch('model_module.ArkModelNew.ArkModel')
@patch('memory_module.memory.Memory')
def test_error_recovery(self, mock_memory, mock_model):
"""Test agent recovers from errors gracefully"""
# Simulate model failure then success
mock_model.generate.side_effect = [
Exception("API Error"),
Mock(content="Recovered response")
]
agent = Agent(model=mock_model, memory=mock_memory)
response = agent.process_message("test")
assert response == "Recovered response"
assert mock_model.generate.call_count == 2
Planned Async Tests (Coming soon!)
Future async tests will test asynchronous operations:Copy
# tests/unit/test_model/test_async_generation.py
import pytest
import asyncio
from model_module.ArkModelNew import ArkModel
class TestAsyncGeneration:
@pytest.fixture
async def model(self):
"""Create model for async testing"""
return ArkModel(provider="mock")
@pytest.mark.asyncio
async def test_async_generate(self, model):
"""Test async text generation"""
response = await model.agenerate("Test prompt")
assert response.content is not None
assert response.success
@pytest.mark.asyncio
async def test_concurrent_requests(self, model):
"""Test multiple concurrent requests"""
prompts = ["Prompt 1", "Prompt 2", "Prompt 3"]
tasks = [model.agenerate(p) for p in prompts]
responses = await asyncio.gather(*tasks)
assert len(responses) == 3
assert all(r.success for r in responses)
@pytest.mark.asyncio
@pytest.mark.timeout(5)
async def test_streaming(self, model):
"""Test async streaming responses"""
chunks = []
async for chunk in model.astream("Generate story"):
chunks.append(chunk)
assert len(chunks) > 0
complete = "".join(chunks)
assert len(complete) > 0
Testing Patterns
Fixtures
Create reusable test components:Copy
# tests/conftest.py
import pytest
from pathlib import Path
import tempfile
import shutil
@pytest.fixture(scope="session")
def test_data_dir():
"""Provide test data directory"""
return Path(__file__).parent / "fixtures"
@pytest.fixture
def temp_dir():
"""Create temporary directory for test"""
temp = tempfile.mkdtemp()
yield Path(temp)
shutil.rmtree(temp)
@pytest.fixture
def mock_llm_response():
"""Mock LLM response factory"""
def _make_response(content="Test response", tokens=10):
return {
"content": content,
"prompt_tokens": tokens,
"completion_tokens": tokens * 2,
"total_tokens": tokens * 3
}
return _make_response
@pytest.fixture(scope="function")
def clean_memory():
"""Provide clean memory instance"""
from memory_module.memory import Memory
memory = Memory(backend="memory") # In-memory for tests
yield memory
memory.clear() # Cleanup
Mocking
Mock external dependencies:Copy
# tests/unit/test_tool/test_weather_tool.py
import pytest
from unittest.mock import patch, Mock
from tool_module.weather import WeatherTool
class TestWeatherTool:
@patch('requests.get')
def test_get_weather(self, mock_get):
"""Test weather API call with mocked response"""
# Mock API response
mock_get.return_value = Mock(
status_code=200,
json=lambda: {
"temp": 72,
"condition": "Sunny",
"humidity": 45
}
)
tool = WeatherTool(api_key="test")
result = tool.get_current("Boston")
assert result.success
assert result.data["temp"] == 72
mock_get.assert_called_once()
@patch('requests.get')
def test_api_error_handling(self, mock_get):
"""Test handling of API errors"""
mock_get.return_value = Mock(
status_code=500,
text="Internal Server Error"
)
tool = WeatherTool(api_key="test")
result = tool.get_current("Boston")
assert not result.success
assert "error" in result.error.lower()
Parametrized Tests
Test multiple scenarios efficiently:Copy
# tests/unit/test_memory/test_memory_types.py
import pytest
from memory_module.memory import MemoryEntry, MemoryType
class TestMemoryTypes:
@pytest.mark.parametrize("memory_type,expected_decay,expected_capacity", [
(MemoryType.SENSORY, 0.99, 100),
(MemoryType.WORKING, 0.5, 9),
(MemoryType.SHORT_TERM, 0.1, 1000),
(MemoryType.LONG_TERM, 0.001, None),
])
def test_memory_characteristics(self, memory_type, expected_decay, expected_capacity):
"""Test memory type characteristics"""
entry = MemoryEntry(
content="Test",
type=memory_type
)
assert entry.decay_rate == pytest.approx(expected_decay, 0.01)
if expected_capacity:
assert memory_type.max_capacity == expected_capacity
Performance Testing
Benchmark Tests
Copy
# tests/performance/test_memory_performance.py
import pytest
from memory_module.memory import Memory, MemoryEntry
class TestMemoryPerformance:
@pytest.fixture
def large_memory(self):
"""Create memory with many entries"""
memory = Memory(backend="sqlite")
for i in range(1000):
memory.store(MemoryEntry(content=f"Memory {i}"))
return memory
@pytest.mark.benchmark
def test_search_performance(self, benchmark, large_memory):
"""Benchmark memory search performance"""
result = benchmark(
large_memory.search,
"Memory 500",
k=10
)
assert len(result) > 0
@pytest.mark.benchmark
def test_store_performance(self, benchmark, large_memory):
"""Benchmark memory storage performance"""
entry = MemoryEntry(content="New memory")
benchmark(large_memory.store, entry)
Load Testing
Copy
# tests/performance/test_model_load.py
import pytest
import asyncio
from concurrent.futures import ThreadPoolExecutor
class TestModelLoad:
@pytest.mark.slow
def test_concurrent_load(self, model):
"""Test model under concurrent load"""
num_requests = 100
def make_request(i):
return model.generate(f"Request {i}")
with ThreadPoolExecutor(max_workers=10) as executor:
futures = [
executor.submit(make_request, i)
for i in range(num_requests)
]
results = [f.result() for f in futures]
assert len(results) == num_requests
success_rate = sum(1 for r in results if r.success) / num_requests
assert success_rate > 0.95 # 95% success rate
Test Coverage (Coming soon!)
Planned Coverage Testing
Copy
# Future coverage commands (Coming soon!)
pytest --cov=. --cov-report=html
pytest --cov=state_module tests/unit/test_state/
pytest --cov=. --cov-report=term-missing
Planned Coverage Goals
| Module | Target | Current Status |
|---|---|---|
| State | 85% | Coming soon! |
| Memory | 80% | Coming soon! |
| Model | 75% | Basic tests only |
| Tool | 80% | Coming soon! |
| Agent | 85% | Coming soon! |
CI/CD Integration
GitHub Actions Workflow
Copy
# .github/workflows/test.yml
name: Tests
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Cache dependencies
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install -r requirements-test.txt
- name: Run tests
run: |
pytest tests/ -v --cov=. --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
fail_ci_if_error: true
Testing Best Practices
Guidelines
- Test Isolation: Each test should be independent
- Clear Names: Test names should describe what they test
- Arrange-Act-Assert: Follow AAA pattern
- One Assertion: Generally one assertion per test
- Fast Tests: Keep unit tests under 100ms
- Mock External: Mock all external dependencies
- Test Behavior: Test behavior, not implementation
Common Pitfalls
Avoid these common testing mistakes:
- Testing private methods directly
- Over-mocking leading to meaningless tests
- Ignoring edge cases
- Not testing error paths
- Brittle tests that break with refactoring
Running Tests
Current Testing Commands
Copy
# Run current basic tests (if available)
python -m pytest tests/ -v
Planned Test Commands (Coming soon!)
Copy
# Future comprehensive test commands (Coming soon!)
pytest # Run all tests
pytest -v # Verbose output
pytest tests/unit/test_state/ # Specific test file
pytest -k "memory" # Tests matching pattern
pytest -m "unit and not slow" # Marked tests
pytest --log-level=DEBUG # With debug logging
pytest -x # Stop on first failure
pytest --lf # Run failed tests from last run
pytest -n 4 # Parallel execution
Debugging Tests
Using PDB
Copy
def test_complex_logic():
data = process_data()
import pdb; pdb.set_trace() # Debugger breakpoint
assert data.is_valid()
Verbose Logging
Copy
import logging
def test_with_logging(caplog):
with caplog.at_level(logging.DEBUG):
result = complex_function()
assert "Expected log message" in caplog.text
assert result == expected_value