Skip to main content

Quick Start Guide

This guide will help you set up ARKOS and create your first intelligent agent with persistent memory.

Prerequisites

Before you begin, ensure you have:
  • Python 3.8 or higher
  • Git
  • PostgreSQL database (Supabase recommended)
  • 8GB+ RAM recommended
  • GPU optional but recommended for local models

Installation

1. Clone the Repository

git clone https://github.com/SGIARK/arkos.git
cd arkos

2. Install Dependencies

pip install -r requirements.txt

3. Configure Your Environment

Create a .env file in the project root:
cp .env.example .env
Edit .env with your configuration:
# Database Connection (REQUIRED)
# Format: postgresql://user:password@host:port/database
DB_URL=postgresql://postgres:your-password@localhost:54322/postgres

# Hugging Face Token (OPTIONAL - for gated models)
HF_TOKEN=

# MCP Server Credentials (OPTIONAL - for tool integrations)
GOOGLE_OAUTH_CREDENTIALS=
GOOGLE_CALENDAR_MCP_TOKEN_PATH=
BRAVE_API_KEY=

# OpenAI API Key (REQUIRED by mem0, but can be placeholder)
OPENAI_API_KEY=sk-placeholder
The DB_URL environment variable is required. ARKOS uses PostgreSQL for storing conversation context and Supabase for vector memory.

Starting the Inference Engine

Check if LLM Server is Already Running

Since ARKOS is often deployed on shared servers, check if the LLM server is already running:
# Check if port 30000 is in use
lsof -i :30000

# Or verify it's responding
curl http://localhost:30000/v1/models
If you see output, the LLM server is already running - you can skip starting it.

Starting the LLM Server (if not running)

The project uses SGLang to run the Qwen 2.5-7B-Instruct model:
bash model_module/run.sh
This starts the SGLang server on port 30000 using Docker and GPU. Wait for “server started” messages.

Starting the Embedding Server (if not running)

The project uses Huggingface-TEI to run the Qwen 2 1.5B-Instruct model
bash model_module/run_tei.sh
This starts the SGLang server on port 4444 using Docker and GPU. Wait for “server started” messages.

Running Your First Agent

1. Start the API Server

python base_module/app.py
This starts the FastAPI server on port 1111, providing the /v1/chat/completions endpoint.

2. Run the Test Interface

In another terminal:
python base_module/main_interface.py
This provides an interactive CLI to test the agent. Type your messages and press Enter. Type exit or quit to stop.

Basic Usage Examples

Using the OpenAI-Compatible API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1111/v1",
    api_key="not-needed"  # Local deployment
)

response = client.chat.completions.create(
    model="ark-agent",
    messages=[
        {"role": "user", "content": "Hello! What can you help me with?"}
    ]
)

print(response.choices[0].message.content)

Streaming Responses

stream = client.chat.completions.create(
    model="ark-agent",
    messages=[{"role": "user", "content": "Tell me about ARKOS"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Using Memory Directly

from memory_module.memory import Memory
from model_module.ArkModelNew import UserMessage, AIMessage, SystemMessage

# Initialize memory system
memory = Memory(
    user_id="alice",
    session_id=None,  # Auto-generates session ID
    db_url="postgresql://...",
    use_long_term=True  # Enable Mem0 vector memory
)

# Store a message
memory.add_memory(UserMessage(content="My favorite color is blue"))

# Retrieve recent context
context = memory.retrieve_short_memory(turns=5)

# Retrieve relevant long-term memories
long_term = memory.retrieve_long_memory(context=context)

Configuration Files

Main Configuration (config_module/config.yaml)

app:
  host: "0.0.0.0"
  port: 1111
  reload: false
  system_prompt: "You are a helpful AI assistant."

llm:
  base_url: "http://localhost:30000/v1"

database:
  url: "${DB_URL}"

memory:
  user_id: "default_user"
  use_long_term: false  # Set to true to enable Mem0

state:
  graph_path: "state_module/state_graph.yaml"

# MCP Server Configuration (optional)
mcp_servers:
  google-calendar:
    transport: stdio
    command: npx
    args: ["-y", "@anthropic/google-calendar-mcp"]

State Graph Configuration (state_module/state_graph.yaml)


initial: agent_reply

states:
  ask_user:
    description: "state used for input from user"
    type: user 
    transition:
      next: [agent_reply]

  agent_reply:
    description: "state used for your reasoning"
    type: agent
    transition:
      next: [ask_user, use_tool]

  use_tool:
    description: "state used for tool use"
    type: tool
    transition: 
      next: [agent_reply]

Testing Your Setup

Health Check

curl http://localhost:1111/health
Expected response:
{"status": "ok", "llm_server": "running", "port": 1111}

Test Chat Completion

curl -X POST http://localhost:1111/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ark-agent",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Common Issues and Solutions

Ensure your DB_URL is correctly set in .env:
# Check if PostgreSQL is accessible
psql $DB_URL -c "SELECT 1"
Make sure the conversation_context table exists in your database.
Check if the SGLang server is running on port 30000:
curl http://localhost:30000/v1/models
If not running, start it with bash model_module/run.sh
Ensure PyTorch is installed with CUDA support:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Find and kill the process using the port:
lsof -i :1111
kill -9 <PID>

Next Steps

Getting Help