Testing Guide

Comprehensive testing is coming soon! This guide outlines the planned testing strategy for ARKOS.

Current Testing Status

ARKOS currently has only basic model functionality tests. A comprehensive testing suite with unit tests, integration tests, and system tests is planned for future releases.

Planned Testing Philosophy (Coming soon!)

ARKOS will follow a test-driven development (TDD) approach with comprehensive coverage across all modules. We aim for >80% code coverage with meaningful tests.

Current Test Structure

tests/
└── test_model_module.py     # Basic model tests only

Planned Test Structure (Coming soon!)

# Future comprehensive test structure (Coming soon!)
tests/
├── conftest.py              # Shared fixtures and configuration
├── unit/                    # Unit tests for individual components
│   ├── test_state/
│   ├── test_memory/
│   ├── test_model/
│   └── test_tool/
├── integration/             # Integration tests
│   ├── test_agent_flow.py
│   ├── test_memory_persistence.py
│   └── test_model_routing.py
├── system/                  # End-to-end system tests
│   ├── test_complete_conversation.py
│   └── test_multi_agent.py
├── performance/             # Performance and load tests
│   ├── test_memory_performance.py
│   └── test_model_latency.py
└── fixtures/                # Test data and mocks
    ├── sample_conversations.json
    └── mock_responses.yaml

Setting Up Testing

Current Testing Setup

Basic testing only - comprehensive testing coming soon!

# Current basic testing (if available)
python -m pytest tests/test_model_module.py

Planned Testing Setup (Coming soon!)

# Future comprehensive testing setup (Coming soon!)
pip install pytest pytest-asyncio pytest-cov pytest-mock pytest-benchmark httpx

Configuration

Create pytest.ini in project root:

[pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*

# Markers
markers =
    unit: Unit tests
    integration: Integration tests
    system: System tests
    slow: Slow tests
    requires_gpu: Tests requiring GPU
    requires_api_key: Tests requiring API keys

# Coverage
addopts =
    --cov=.
    --cov-report=html
    --cov-report=term-missing
    --verbose

# Async
asyncio_mode = auto

Writing Tests (Coming soon!)

The following test examples are planned for future implementation.

Planned Unit Tests

Future unit tests will test individual components in isolation:

# tests/unit/test_state/test_state_handler.py
import pytest
from state_module.state_handler import StateHandler
from state_module.state import StateContext

class TestStateHandler:
    @pytest.fixture
    def handler(self):
        """Create state handler for testing"""
        return StateHandler(yaml_path="tests/fixtures/test_state_graph.yaml")

    @pytest.fixture
    def context(self):
        """Create test context"""
        return StateContext(user_input="test input")

    def test_initial_state(self, handler):
        """Test that handler starts with correct initial state"""
        initial = handler.get_initial_state()
        assert initial.name == "greeting"
        assert not initial.is_terminal

    def test_state_transition(self, handler, context):
        """Test state transitions work correctly"""
        current_state = handler.get_initial_state()
        next_state = handler.get_next_state(current_state.name, context)

        assert next_state is not None
        assert next_state.name == "waiting_for_input"

    @pytest.mark.parametrize("input_text,expected_state", [
        ("hello", "greeting_response"),
        ("help", "help_state"),
        ("exit", "goodbye")
    ])
    def test_conditional_transitions(self, handler, input_text, expected_state):
        """Test conditional state transitions"""
        context = StateContext(user_input=input_text)
        next_state = handler.get_next_state("router", context)
        assert next_state.name == expected_state

Planned Integration Tests (Coming soon!)

Future integration tests will test component interactions:

# tests/integration/test_agent_flow.py
import pytest
from unittest.mock import Mock, patch
from agent_module.agent import Agent
from model_module.ArkModelNew import ArkModel
from memory_module.memory import Memory

class TestAgentFlow:
    @pytest.fixture
    def agent(self):
        """Create agent with mocked dependencies"""
        model = Mock(spec=ArkModel)
        memory = Mock(spec=Memory)
        return Agent(model=model, memory=memory)

    @pytest.mark.asyncio
    async def test_conversation_flow(self, agent):
        """Test complete conversation flow"""
        # Mock model responses
        agent.model.generate.return_value = Mock(
            content="Hello! How can I help you?",
            success=True
        )

        # Mock memory operations
        agent.memory.search.return_value = []

        # Process user message
        response = await agent.process_message("Hello")

        # Verify interactions
        assert agent.model.generate.called
        assert agent.memory.store.called
        assert "Hello" in response

    @patch('model_module.ArkModelNew.ArkModel')
    @patch('memory_module.memory.Memory')
    def test_error_recovery(self, mock_memory, mock_model):
        """Test agent recovers from errors gracefully"""
        # Simulate model failure then success
        mock_model.generate.side_effect = [
            Exception("API Error"),
            Mock(content="Recovered response")
        ]

        agent = Agent(model=mock_model, memory=mock_memory)
        response = agent.process_message("test")

        assert response == "Recovered response"
        assert mock_model.generate.call_count == 2

Planned Async Tests (Coming soon!)

Future async tests will test asynchronous operations:

# tests/unit/test_model/test_async_generation.py
import pytest
import asyncio
from model_module.ArkModelNew import ArkModel

class TestAsyncGeneration:
    @pytest.fixture
    async def model(self):
        """Create model for async testing"""
        return ArkModel(provider="mock")

    @pytest.mark.asyncio
    async def test_async_generate(self, model):
        """Test async text generation"""
        response = await model.agenerate("Test prompt")
        assert response.content is not None
        assert response.success

    @pytest.mark.asyncio
    async def test_concurrent_requests(self, model):
        """Test multiple concurrent requests"""
        prompts = ["Prompt 1", "Prompt 2", "Prompt 3"]

        tasks = [model.agenerate(p) for p in prompts]
        responses = await asyncio.gather(*tasks)

        assert len(responses) == 3
        assert all(r.success for r in responses)

    @pytest.mark.asyncio
    @pytest.mark.timeout(5)
    async def test_streaming(self, model):
        """Test async streaming responses"""
        chunks = []
        async for chunk in model.astream("Generate story"):
            chunks.append(chunk)

        assert len(chunks) > 0
        complete = "".join(chunks)
        assert len(complete) > 0

Testing Patterns

Fixtures

Create reusable test components:

# tests/conftest.py
import pytest
from pathlib import Path
import tempfile
import shutil

@pytest.fixture(scope="session")
def test_data_dir():
    """Provide test data directory"""
    return Path(__file__).parent / "fixtures"

@pytest.fixture
def temp_dir():
    """Create temporary directory for test"""
    temp = tempfile.mkdtemp()
    yield Path(temp)
    shutil.rmtree(temp)

@pytest.fixture
def mock_llm_response():
    """Mock LLM response factory"""
    def _make_response(content="Test response", tokens=10):
        return {
            "content": content,
            "prompt_tokens": tokens,
            "completion_tokens": tokens * 2,
            "total_tokens": tokens * 3
        }
    return _make_response

@pytest.fixture(scope="function")
def clean_memory():
    """Provide clean memory instance"""
    from memory_module.memory import Memory
    memory = Memory(backend="memory")  # In-memory for tests
    yield memory
    memory.clear()  # Cleanup

Mocking

Mock external dependencies:

# tests/unit/test_tool/test_weather_tool.py
import pytest
from unittest.mock import patch, Mock
from tool_module.weather import WeatherTool

class TestWeatherTool:
    @patch('requests.get')
    def test_get_weather(self, mock_get):
        """Test weather API call with mocked response"""
        # Mock API response
        mock_get.return_value = Mock(
            status_code=200,
            json=lambda: {
                "temp": 72,
                "condition": "Sunny",
                "humidity": 45
            }
        )

        tool = WeatherTool(api_key="test")
        result = tool.get_current("Boston")

        assert result.success
        assert result.data["temp"] == 72
        mock_get.assert_called_once()

    @patch('requests.get')
    def test_api_error_handling(self, mock_get):
        """Test handling of API errors"""
        mock_get.return_value = Mock(
            status_code=500,
            text="Internal Server Error"
        )

        tool = WeatherTool(api_key="test")
        result = tool.get_current("Boston")

        assert not result.success
        assert "error" in result.error.lower()

Parametrized Tests

Test multiple scenarios efficiently:

# tests/unit/test_memory/test_memory_types.py
import pytest
from memory_module.memory import MemoryEntry, MemoryType

class TestMemoryTypes:
    @pytest.mark.parametrize("memory_type,expected_decay,expected_capacity", [
        (MemoryType.SENSORY, 0.99, 100),
        (MemoryType.WORKING, 0.5, 9),
        (MemoryType.SHORT_TERM, 0.1, 1000),
        (MemoryType.LONG_TERM, 0.001, None),
    ])
    def test_memory_characteristics(self, memory_type, expected_decay, expected_capacity):
        """Test memory type characteristics"""
        entry = MemoryEntry(
            content="Test",
            type=memory_type
        )

        assert entry.decay_rate == pytest.approx(expected_decay, 0.01)
        if expected_capacity:
            assert memory_type.max_capacity == expected_capacity

Performance Testing

Benchmark Tests

# tests/performance/test_memory_performance.py
import pytest
from memory_module.memory import Memory, MemoryEntry

class TestMemoryPerformance:
    @pytest.fixture
    def large_memory(self):
        """Create memory with many entries"""
        memory = Memory(backend="sqlite")
        for i in range(1000):
            memory.store(MemoryEntry(content=f"Memory {i}"))
        return memory

    @pytest.mark.benchmark
    def test_search_performance(self, benchmark, large_memory):
        """Benchmark memory search performance"""
        result = benchmark(
            large_memory.search,
            "Memory 500",
            k=10
        )
        assert len(result) > 0

    @pytest.mark.benchmark
    def test_store_performance(self, benchmark, large_memory):
        """Benchmark memory storage performance"""
        entry = MemoryEntry(content="New memory")
        benchmark(large_memory.store, entry)

Load Testing

# tests/performance/test_model_load.py
import pytest
import asyncio
from concurrent.futures import ThreadPoolExecutor

class TestModelLoad:
    @pytest.mark.slow
    def test_concurrent_load(self, model):
        """Test model under concurrent load"""
        num_requests = 100

        def make_request(i):
            return model.generate(f"Request {i}")

        with ThreadPoolExecutor(max_workers=10) as executor:
            futures = [
                executor.submit(make_request, i)
                for i in range(num_requests)
            ]
            results = [f.result() for f in futures]

        assert len(results) == num_requests
        success_rate = sum(1 for r in results if r.success) / num_requests
        assert success_rate > 0.95  # 95% success rate

Test Coverage (Coming soon!)

Planned Coverage Testing

# Future coverage commands (Coming soon!)
pytest --cov=. --cov-report=html
pytest --cov=state_module tests/unit/test_state/
pytest --cov=. --cov-report=term-missing

Planned Coverage Goals

Module	Target	Current Status
State	85%	Coming soon!
Memory	80%	Coming soon!
Model	75%	Basic tests only
Tool	80%	Coming soon!
Agent	85%	Coming soon!

CI/CD Integration

GitHub Actions Workflow

# .github/workflows/test.yml
name: Tests

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.8", "3.9", "3.10", "3.11"]

    steps:
    - uses: actions/checkout@v3

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}

    - name: Cache dependencies
      uses: actions/cache@v3
      with:
        path: ~/.cache/pip
        key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}

    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install -r requirements-test.txt

    - name: Run tests
      run: |
        pytest tests/ -v --cov=. --cov-report=xml

    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml
        fail_ci_if_error: true

Testing Best Practices

Guidelines

Test Isolation: Each test should be independent
Clear Names: Test names should describe what they test
Arrange-Act-Assert: Follow AAA pattern
One Assertion: Generally one assertion per test
Fast Tests: Keep unit tests under 100ms
Mock External: Mock all external dependencies
Test Behavior: Test behavior, not implementation

Common Pitfalls

Avoid these common testing mistakes:

Testing private methods directly
Over-mocking leading to meaningless tests
Ignoring edge cases
Not testing error paths
Brittle tests that break with refactoring

Running Tests

Current Testing Commands

# Run current basic tests (if available)
python -m pytest tests/ -v

Planned Test Commands (Coming soon!)

# Future comprehensive test commands (Coming soon!)
pytest                        # Run all tests
pytest -v                     # Verbose output
pytest tests/unit/test_state/ # Specific test file
pytest -k "memory"            # Tests matching pattern
pytest -m "unit and not slow" # Marked tests
pytest --log-level=DEBUG      # With debug logging
pytest -x                     # Stop on first failure
pytest --lf                   # Run failed tests from last run
pytest -n 4                   # Parallel execution

Debugging Tests

Using PDB

def test_complex_logic():
    data = process_data()
    import pdb; pdb.set_trace()  # Debugger breakpoint
    assert data.is_valid()

Verbose Logging

import logging

def test_with_logging(caplog):
    with caplog.at_level(logging.DEBUG):
        result = complex_function()

    assert "Expected log message" in caplog.text
    assert result == expected_value

Next Steps

Setup Guide

Set up your development environment

Contributing

Contribution guidelines

API Testing

API testing documentation

Examples

Example test implementations

Getting Started

Architecture

Core Modules

Development

​Testing Guide

​Current Testing Status

​Planned Testing Philosophy (Coming soon!)

​Current Test Structure

​Planned Test Structure (Coming soon!)

​Setting Up Testing

​Current Testing Setup

​Planned Testing Setup (Coming soon!)

​Configuration

​Writing Tests (Coming soon!)

​Planned Unit Tests

​Planned Integration Tests (Coming soon!)

​Planned Async Tests (Coming soon!)

​Testing Patterns

​Fixtures

​Mocking

​Parametrized Tests

​Performance Testing

​Benchmark Tests

​Load Testing

​Test Coverage (Coming soon!)

​Planned Coverage Testing

​Planned Coverage Goals

​CI/CD Integration

​GitHub Actions Workflow

​Testing Best Practices

​Guidelines

​Common Pitfalls

​Running Tests

​Current Testing Commands

​Planned Test Commands (Coming soon!)

​Debugging Tests

​Using PDB

​Verbose Logging

​Next Steps

Setup Guide

Contributing

API Testing

Examples

Testing Guide

Current Testing Status

Planned Testing Philosophy (Coming soon!)

Current Test Structure

Planned Test Structure (Coming soon!)

Setting Up Testing

Current Testing Setup

Planned Testing Setup (Coming soon!)

Configuration

Writing Tests (Coming soon!)

Planned Unit Tests

Planned Integration Tests (Coming soon!)

Planned Async Tests (Coming soon!)

Testing Patterns

Fixtures

Mocking

Parametrized Tests

Performance Testing

Benchmark Tests

Load Testing

Test Coverage (Coming soon!)

Planned Coverage Testing

Planned Coverage Goals

CI/CD Integration

GitHub Actions Workflow

Testing Best Practices

Guidelines

Common Pitfalls

Running Tests

Current Testing Commands

Planned Test Commands (Coming soon!)

Debugging Tests

Using PDB

Verbose Logging

Next Steps