My App

POST /api/v1/agent-run

AI agent response pipeline — RAG, prompt building, completion, GHL send

Agent Run

Full AI agent response pipeline: fetches conversation context, assembles RAG, resolves variables, runs OpenAI completion, sends response via GHL, and records the transaction.

Volume: ~300K runs/week | Auth: Authorization: Bearer <INTERNAL_API_KEY>

Request

POST /api/v1/agent-run
Content-Type: application/json
Authorization: Bearer <INTERNAL_API_KEY>

Body Parameters

FieldTypeRequiredDescription
conversation_idstringInternal conversation ID
assistant_idstringInternal assistant ID
additional_instructionsstringExtra instructions appended to system prompt

Responses

Completed

{
  "status": "completed",
  "runId": "uuid",
  "messageId": "ai_msg_id",
  "model": "gpt-4.1",
  "tokens": {
    "prompt_tokens": 500,
    "completion_tokens": 100,
    "total_tokens": 600
  },
  "finishReason": "stop",
  "charged": true
}

Stale Job (duplicate run detected)

{
  "status": "stale_job",
  "runId": "uuid"
}

Conversation Not Found (404)

{ "error": "Conversation not found" }

Assistant Not Found (404)

{ "error": "Assistant not found" }

No Access Token (400)

{ "error": "No access token for location" }

GHL Send Failure

{
  "status": "error",
  "error": "Failed to send message"
}

Pipeline Error

{
  "status": "error",
  "runId": "uuid",
  "error": "Internal error"
}

Test with cURL

curl -X POST http://localhost:3000/api/v1/agent-run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_INTERNAL_API_KEY" \
  -d '{
    "conversation_id": "conv_123",
    "assistant_id": "asst_456",
    "additional_instructions": "Respond in a friendly tone"
  }'

Pipeline Flow

  1. Parallel fetch: conversation (with messages + subAccount), assistant
  2. Validate conversation, assistant, access token
  3. Resolve OpenAI key (workspace key or backup)
  4. Create job marker (prevents duplicate runs)
  5. Early stale-job check
  6. Parallel: RAG query (Pinecone with KB filter), GHL contact fetch, system variable filling
  7. Build system prompt with RAG context, contact info, variables, replacements
  8. OpenAI chat completion (with optional tools)
  9. Misspelling simulation (if configured)
  10. Post-completion stale-job re-check
  11. Send response via GHL
  12. Save AI message to DB, remove AI Replying tag
  13. Update conversation, create transaction
  14. Parallel: parse extractions, ingestion update
  15. Clean up job marker

Key Behaviors

  • Duplicate prevention: Job marker + dual stale-check (before and after completion)
  • Key resolution: Uses workspace OpenAI key if valid, falls back to backup key
  • KB filtering: Pinecone queries are filtered by assistant's knowledge base IDs
  • Timezone: Reads from GHL contact, falls back to America/New_York, handles invalid IANA IDs

On this page