Skip to content
Working at AngelList

Building Agents at AngelList

Production AI agents through better engineering, not better LLMs

Oct 20, 202514 min read

TL;DR

AngelList Intelligence is an AI agent built for our customer relations team to better serve GPs and LPs. AngelList has scaled rapidly, resulting in customer context splintering across Slack threads, email conversations, legal documents, CRM systems, and various knowledge bases like Guru and Notion. Our team was spending hours hunting through disparate systems to understand customer history, legal terms, and past decisions.

Rather than cobbling together existing solutions, we custom-engineered our agent as a standalone AI system that pulls context from multiple sources and provides intelligent, venture-specific responses directly in Slack and Front, where our team lives. We built production-grade streaming infrastructure in Go, created custom OpenSearch indices for unified data retrieval, and discovered that evals were far less critical than adoption metrics and user feedback.

The result: 65% of all customer queries are now assisted by AngelList Intelligence, enabling our customer relations team to scale as queries that previously took hours of research now get accurate, document-backed responses in minutes.

Use Cases: Multi-Source Intelligence in Action

An example of a customer query that our agent handles: Query: "The way the account managers conduct Indian deployments recently changed. Can you gather context and see how we should conduct deployments for CustomerX”

To handle a query like this, the agent needs to:

  • Review knowledge base documentation to find the current deployment procedures
  • Search Slack threads and Front email conversations to get information on how we have handled these deployments for other customers
  • Pull account information from our CRM to understand CustomerX's specific deployment requirements and history
  • Synthesize all the information into clear, actionable deployment guidance tailored to this customer

What previously required an employee to hunt through multiple communication channels, synthesize scattered information, and manually update documentation now happens automatically in minutes.

Build vs Buy: Why Generic AI Tools Fall Short in VC

Venture Capital Domain Complexity: The nuanced language of legal docs and our product offerings requires contextual understanding that generic AI lacks. Furthermore, we wanted control over response style, tone, and formatting to match our customer success team's communication standards rather than relying on generic AI personality traits.

Multi-Source Intelligence: Customer support in VC requires synthesizing information across completely different data types as shown in the example above. Tools like Notion AI work well within their own ecosystem, but can't reason across our fragmented knowledge landscape.

Native Integration Requirements: Our team lives in Slack and Front. Adding another dashboard or web interface would create friction that kills adoption. We also made sure to include real-time streaming responses on all platforms, so that the customer service team gets their answers as quickly as possible.

Data Autonomy: With customer legal documents and sensitive investment data, we needed complete control over how information is indexed, retrieved, and secured. Generic AI services couldn't provide the data governance and customization we required.

Our Philosophy: AI Layer vs Data Layer

Through building AngelList Intelligence, we've developed a strong conviction that effective AI agents are primarily a systems engineering problem, not an AI research problem.

The AI layer is commoditized — we simply use the best available language models (currently GPT-5 and Claude 4 Sonnet). The real differentiator is in the systems built around a language model.

Technical Architecture: Three Key Engineering Decisions

Our agent’s architecture is shaped by three core systems engineering decisions:

  • building a production-grade user experience
  • controlling data pipeline infrastructure
  • implementing pragmatic evaluation strategies

These decisions directly address the challenges of building AI agents that handle unpredictable, domain-specific queries in production environments.

Production-Grade User Experience Through Go Architecture

Go as the Orchestration Engine: We chose Go stack for several critical reasons:

  • Concurrency for Multi-Source Queries: Our agent often needs to query 5-10 different systems simultaneously (Slack, legal documents, CRM, external APIs). Go's goroutines make parallel data fetching intuitive and performant.
  • Streaming API Integration: Go's channels provide clean patterns for handling streaming LLM responses and implementing real-time GraphQL subscriptions back to users in Slack. Slack doesn't natively support streaming AI responses, so we engineered a solution using Socket Mode connections with message updates that show our agent's thinking process in real-time.
  • AI-Assisted Development: Coding agents like Cursor and Claude Code excel at writing clean, idiomatic Go code with proper error handling and concurrent patterns. This significantly accelerates our small team's development velocity, especially when implementing complex integrations and maintaining consistent code quality across the rapidly evolving AngelList Intelligence codebase.
  • Deployment Simplicity: Go compiles to a single binary, eliminating the complex deployment pipelines our previous Node.js system required.

Event-Driven Architecture: We use Temporal for workflow orchestration and NATS for real-time event streaming. When a customer query comes in through Slack or Front, it triggers a workflow that involves authenticating the user, performing tool calls to retrieve knowledge, and outputting a comprehensive response - all while providing real-time updates to the user.

Data Pipeline Control Through Custom Search Infrastructure

Platform API Limitations That Forced Custom Solutions

Everyone has access to GPT-5 and Claude 4 Sonnet—the differentiation comes from data quality and retrieval capabilities. Our initial approach relied on vendor APIs, but each platform's search limitations quickly became bottlenecks:

  • Slack's Search API imposes strict rate limiting on the search and get conversation message APIs, which meant our agent's performance was bottlenecked by an external service.
  • Notion's API only supports title-based search, missing the vast majority of a document's content.
  • Jira's REST API requires JQL (Jira Query Language) for complex searches, but there's no reliable way to translate natural language queries into programmatic JQL that properly weights recency and relevance.

These limitations meant our agent couldn't access the contextual depth required for the queries our account managers receive, forcing us to build our own comprehensive search infrastructure.

OpenSearch Implementation: From Query to Complex Retrieval

OpenSearch provides distributed search with complete control over indexing, scoring, and retrieval logic. When users ask "Find recent discussions in the product-feedback slack channel on featureX," the language model analyzes the query and constructs structured arguments for our Slack search tool (channel: "product-feedback", terms: "featureX", temporal: "recent", etc.) that translate into sophisticated OpenSearch operations:

  • Multi-field search: Queries channel names, conversation text, parent messages, and reply threads simultaneously
  • Fuzzy matching: handles typos and variations in terminology
  • Filtering: Uses the information from the query to create hard filters
  • Temporal scoring: Detects "recent" in queries and applies Gaussian time decay to prioritize newer conversations

The example query above would be translated to the following OpenSearch parameters:

{
"size": 10,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "fund structure support",
"fields": ["channel_name", "full_conversation_text", "parent_message.text", "replies.text"],
"type": "best_fields",
"fuzziness": "AUTO"
}
}
],
"filter": [
{
"fuzzy": {
"channel_name": {
"value": "product-feedback",
"fuzziness": "1"
}
}
}
]
}
},
"functions": [
{
"gauss": {
"updated_at": {
"origin": "now",
"scale": "30d",
"decay": 0.3
}
}
}
],
"boost_mode": "multiply",
"score_mode": "sum"
}
},
"sort": [
{ "_score": "desc" },
{ "updated_at": "desc" }
]
}

This search architecture can be easily extended to support various forms of filtering, scoring adjustments, and aggregations. Additional filters for customer IDs, document types, or sentiment scores can be added based on query analysis.

Retrieval Methods: Keyword vs Vector Search

Our implementation strategically chooses search approaches based on both the knowledge base being queried and the query characteristics:

Keyword Search: We found using BM-25 retrieval through Opensearch (like in the snippet above) highly performant for most knowledge base searches, especially for legal document searches, customer lookups, and regulatory compliance queries, where precision trumps similarity. When users mention specific customer names, product features, or legal terminology, keyword searches tend to work better.

Vector Search Applications: Using embeddings for retrieval tends to work better for queries that require analysis across CRM communications and call transcripts. When users search for “positive customer feedback,” vector search retrieves semantically similar phrases such as “great experience overall” or “delighted with the product,” capturing nuances that keyword search overlooks.

Our Pragmatic Approach To Evals

Evaluation is critical for AI agents because they operate in production environments where failures directly impact user trust and business operations. However, traditional evaluation methods assume predictable query patterns and comprehensive test coverage. The kinds of queries our customers have are of high variance, so constructing comprehensive evals that represent the entire distribution of queries whilst trying to rapidly develop new features is hard.

We initially created an evaluation set of basic tests that serve as unit tests to ensure the agent can access data sources and combine information across platforms. They catch functional regressions, reasoning capabilities, prompt instructions, and confirm that data retrieval mechanisms are operating as expected. While essential for maintaining system health, these evaluations focus on technical functionality rather than response quality or user satisfaction.

Our primary signal comes from weekly query volume, which reveals genuine adoption patterns. Further, our users will use Slack reactions (thumbs up or down) on the threads where our agent was looped in. By correlating negative feedback with the specific tools used in those responses, we can quickly identify problem areas. This approach lets actual user behavior drive engineering priorities, rather than trying to predict what matters through synthetic benchmarks.

The Compound AI Advantage: Years of AI Investment Paying Dividends

Our agent's ability to reason over customer legal documents builds on our earlier AI initiatives. Over the past year and a half, we invested in building a comprehensive document intelligence foundation, creating a parsing system that can ingest large volumes of legal and financial documents—like SAFEs, convertible notes, and share purchase agreements—and extract structured metadata. While this system was originally designed to support internal deal processing and compliance workflows, it eventually proved incredibly useful for our agent.

At the same time, we enriched our CRM by creating structured data like issue categories and call transcript summaries, which gave our agent better tools for retrieving and surfacing relevant customer information to account managers.

Results & What's Next

Key Impact: 65% of all customer queries are now assisted by AngelList Intelligence, enabling our customer relations team to scale without proportional headcount increases. For investor communications, the ability to provide immediate responses backed by sources has made AngelList Intelligence essential for our team's GP and LP relationship management.

What’s Next? We’re expanding our agent's capabilities to connect to more data sources and set up a data flywheel that continuously improves the agent’s outputs. We’re also building out a more robust evaluation framework (informed by our metrics), which will unlock experimentation with different tool setups, embedding models, and OpenSearch configurations, and allow us to systematically optimize performance over time.

The future of AI agents isn't about building better language models—it's about building better systems that understand your domain deeply and integrate seamlessly into how work actually gets done.

Acknowledgements

AngelList Intelligence is the result of close collaboration across our engineering and customer success teams. This work wouldn't have been possible without James Pozdena, Kerry Jones, and Thibaut LaBarre, who architected and built the core agent infrastructure, search systems, and production integrations alongside me.

Special thanks to our customer success team, especially Archit Dhar and Catherine Nguyen for their continuous feedback, willingness to experiment with early versions, and invaluable insights that shaped the agent into a tool they actually want to use every day.


;