POST /api/v1/agent-run
AI agent response pipeline — RAG, prompt building, completion, GHL send
Agent Run
Full AI agent response pipeline: fetches conversation context, assembles RAG, resolves variables, runs OpenAI completion, sends response via GHL, and records the transaction.
Volume: ~300K runs/week | Auth: Authorization: Bearer <INTERNAL_API_KEY>
Request
POST /api/v1/agent-run
Content-Type: application/json
Authorization: Bearer <INTERNAL_API_KEY>Body Parameters
| Field | Type | Required | Description |
|---|---|---|---|
conversation_id | string | ✅ | Internal conversation ID |
assistant_id | string | ✅ | Internal assistant ID |
additional_instructions | string | Extra instructions appended to system prompt |
Responses
Completed
{
"status": "completed",
"runId": "uuid",
"messageId": "ai_msg_id",
"model": "gpt-4.1",
"tokens": {
"prompt_tokens": 500,
"completion_tokens": 100,
"total_tokens": 600
},
"finishReason": "stop",
"charged": true
}Stale Job (duplicate run detected)
{
"status": "stale_job",
"runId": "uuid"
}Conversation Not Found (404)
{ "error": "Conversation not found" }Assistant Not Found (404)
{ "error": "Assistant not found" }No Access Token (400)
{ "error": "No access token for location" }GHL Send Failure
{
"status": "error",
"error": "Failed to send message"
}Pipeline Error
{
"status": "error",
"runId": "uuid",
"error": "Internal error"
}Test with cURL
curl -X POST http://localhost:3000/api/v1/agent-run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_INTERNAL_API_KEY" \
-d '{
"conversation_id": "conv_123",
"assistant_id": "asst_456",
"additional_instructions": "Respond in a friendly tone"
}'Pipeline Flow
- Parallel fetch: conversation (with messages + subAccount), assistant
- Validate conversation, assistant, access token
- Resolve OpenAI key (workspace key or backup)
- Create job marker (prevents duplicate runs)
- Early stale-job check
- Parallel: RAG query (Pinecone with KB filter), GHL contact fetch, system variable filling
- Build system prompt with RAG context, contact info, variables, replacements
- OpenAI chat completion (with optional tools)
- Misspelling simulation (if configured)
- Post-completion stale-job re-check
- Send response via GHL
- Save AI message to DB, remove AI Replying tag
- Update conversation, create transaction
- Parallel: parse extractions, ingestion update
- Clean up job marker
Key Behaviors
- Duplicate prevention: Job marker + dual stale-check (before and after completion)
- Key resolution: Uses workspace OpenAI key if valid, falls back to backup key
- KB filtering: Pinecone queries are filtered by assistant's knowledge base IDs
- Timezone: Reads from GHL contact, falls back to
America/New_York, handles invalid IANA IDs