Agent Run

Full AI agent response pipeline: fetches conversation context, assembles RAG, resolves variables, runs OpenAI completion, sends response via GHL, and records the transaction.

Volume: ~300K runs/week | Auth: Authorization: Bearer <INTERNAL_API_KEY>

Request

POST /api/v1/agent-run
Content-Type: application/json
Authorization: Bearer <INTERNAL_API_KEY>

Body Parameters

Field	Type	Required	Description
`conversation_id`	string	✅	Internal conversation ID
`assistant_id`	string	✅	Internal assistant ID
`additional_instructions`	string		Extra instructions appended to system prompt

Responses

Completed

{
  "status": "completed",
  "runId": "uuid",
  "messageId": "ai_msg_id",
  "model": "gpt-4.1",
  "tokens": {
    "prompt_tokens": 500,
    "completion_tokens": 100,
    "total_tokens": 600
  },
  "finishReason": "stop",
  "charged": true
}

Stale Job (duplicate run detected)

{
  "status": "stale_job",
  "runId": "uuid"
}

Conversation Not Found (404)

{ "error": "Conversation not found" }

Assistant Not Found (404)

{ "error": "Assistant not found" }

No Access Token (400)

{ "error": "No access token for location" }

GHL Send Failure

{
  "status": "error",
  "error": "Failed to send message"
}

Pipeline Error

{
  "status": "error",
  "runId": "uuid",
  "error": "Internal error"
}

Test with cURL

curl -X POST http://localhost:3000/api/v1/agent-run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_INTERNAL_API_KEY" \
  -d '{
    "conversation_id": "conv_123",
    "assistant_id": "asst_456",
    "additional_instructions": "Respond in a friendly tone"
  }'

Pipeline Flow

Parallel fetch: conversation (with messages + subAccount), assistant
Validate conversation, assistant, access token
Resolve OpenAI key (workspace key or backup)
Create job marker (prevents duplicate runs)
Early stale-job check
Parallel: RAG query (Pinecone with KB filter), GHL contact fetch, system variable filling
Build system prompt with RAG context, contact info, variables, replacements
OpenAI chat completion (with optional tools)
Misspelling simulation (if configured)
Post-completion stale-job re-check
Send response via GHL
Save AI message to DB, remove AI Replying tag
Update conversation, create transaction
Parallel: parse extractions, ingestion update
Clean up job marker

Key Behaviors

Duplicate prevention: Job marker + dual stale-check (before and after completion)
Key resolution: Uses workspace OpenAI key if valid, falls back to backup key
KB filtering: Pinecone queries are filtered by assistant's knowledge base IDs
Timezone: Reads from GHL contact, falls back to America/New_York, handles invalid IANA IDs

POST /api/v1/agent-run

On this page