Azure OpenAI Assistants API: Code Interpreter, File Search & Function Calling
You ask GPT-4 to calculate a compound interest schedule — it gets it slightly wrong. You ask it a question about your internal runbook — it confidently cites a policy that does not exist. You want it to trigger a deployment — it has no way to reach your systems.
These are not model failures. They are architectural gaps. The standard Chat Completions API is stateless and prompt-only: no tools, no persistent memory, no way to act on the world.
The Azure OpenAI Assistants API was built to close those gaps. It gives the AI a sandboxed Python execution environment, a managed RAG pipeline over your private documents, and a structured interface for calling your own backend functions — all within a persistent conversation thread it manages for you.
This article walks through all three tools in Node.js with working code, real output examples, and the production details that tutorials usually skip.
The full working source for all three examples is on GitHub: azizjarrar/azure-openai-tools
Contents
- •The Building Blocks
- •Setting Up Azure AI Foundry
- •Authentication: The Right Way (Keyless)
- •Tool 1: Code Interpreter
- •Tool 2: File Search (RAG with Vector Stores)
- •Tool 3: Function Calling (Custom Tools)
- •Comparison: Which Tool for Which Problem?
- •SDK Method Reference
- •What to Watch Out For
The Building Blocks
Before diving into each tool, you need a solid mental model of the four primitives that every Assistants API application uses.
Assistant
An Assistant is a named AI persona that you configure once. It holds the model you want to use, the system instructions that define how it behaves, and the list of tools it is allowed to call.
Think of it as the blueprint — you create it once per application, not once per conversation.
const assistant = await openai.beta.assistants.create({
name: "My Assistant",
instructions: "You are a helpful assistant.",
model: "gpt-4.1",
tools: [{ type: "code_interpreter" }],
});
// Save assistant.id — reuse it across all conversations
Thread
A Thread is a single conversation session. Every message in a conversation — from the user and from the assistant — gets appended to the same Thread. The API maintains the full context for you, so you do not have to manually manage a message history array like you do with the standard Chat Completions API.
One new user session → one new Thread. Simple.
const thread = await openai.beta.threads.create();
// Save thread.id against the user's session
Message
A Message is a single turn in the conversation. You add a user Message to the Thread before each run, and the AI adds its assistant Messages after each run completes.
await openai.beta.threads.messages.create(thread.id, {
role: "user",
content: "What is 12 factorial?",
});
Run
A Run is the execution event. When you want the AI to process the current Thread and respond, you create a Run against a Thread + Assistant pair. The API processes everything asynchronously — you ask for a run, get back a run ID, and then poll (or use a helper method) until it finishes.
const run = await openai.beta.threads.runs.createAndPoll(thread.id, {
assistant_id: assistant.id,
});
// run.status === "completed" when done
The status flow is:
queued → in_progress → (requires_action) → completed
↑
only happens with Function Calling
The two paths through a Run — with and without Function Calling — are easier to see than describe:
Add user message
threads.messages.create()
Create Run
threads.runs.createAndPoll()
AI processing
run.status = 'in_progress'
Run complete
run.status = 'completed'
Fetch response
threads.messages.list()
Setting Up Azure AI Foundry
Before writing any code you need an Azure OpenAI resource with a deployed model. Here is the exact path through the portal.
1. Create a Project
Go to ai.azure.com and sign in with your Azure account.
- •Click New project
- •Give it a name and select or create a Hub (the hub is the top-level resource that holds billing, networking, and access control — one hub can contain many projects)
- •Click Create
2. Deploy a Model
Inside your project, go to My assets → Models + endpoints → Deploy model.
- •Select gpt-4.1 (or
gpt-4oif 4.1 is not available in your region) - •Give the deployment a name — this is important. The name you type here (e.g.
gpt-4.1) is what goes in themodelfield ofassistants.create(). It does not have to match the model family name, but keeping them the same avoids confusion - •Set a token-per-minute limit appropriate for your usage and click Deploy
3. Grab Your Endpoint
Once the deployment is live, click on it to open the detail page. You need two values:
- •Endpoint URL — looks like
https://<your-resource-name>.openai.azure.com/ - •Deployment name — whatever you typed in step 2
Add the endpoint to your environment:
export AZURE_OPENAI_ENDPOINT="https://<your-resource-name>.openai.azure.com/"
4. Grant Yourself Access (for Keyless Auth)
The examples in this article use DefaultAzureCredential instead of an API key. For this to work locally, your Azure account needs the Cognitive Services OpenAI User role on the Azure OpenAI resource.
In the Azure Portal, go to your OpenAI resource → Access control (IAM) → Add role assignment → search for Cognitive Services OpenAI User → assign it to your account.
Then log in via the CLI:
az login
That is all the setup needed. The same role assignment on a Managed Identity covers your production environment without any secret management.
Authentication: The Right Way (Keyless)
All three examples use DefaultAzureCredential from the @azure/identity package instead of hardcoding an API key. This is the recommended pattern for any serious Azure application.
import { DefaultAzureCredential, getBearerTokenProvider } from "@azure/identity";
const credential = new DefaultAzureCredential();
const scope = "https://cognitiveservices.azure.com/.default";
const azureADTokenProvider = getBearerTokenProvider(credential, scope);
Why this matters:
DefaultAzureCredential works in multiple environments automatically. Locally it picks up your az login session. In production on an Azure service (Container App, VM, App Service) it uses the resource's Managed Identity — no secrets stored anywhere, no rotation needed, no accidental leaks in environment variable dumps.
The scope value tells Azure which service the token is valid for. Cognitive Services (which includes Azure OpenAI) uses https://cognitiveservices.azure.com/.default.
You wire it into the SDK like this:
const openai = new AzureOpenAI({
endpoint: process.env.AZURE_OPENAI_ENDPOINT,
azureADTokenProvider: azureADTokenProvider,
apiVersion: "2024-05-01-preview",
});
The apiVersion: "2024-05-01-preview" is required — the Assistants API is not available on the stable GA versions of the Azure OpenAI API.
Tool 1: Code Interpreter
File: code_interpreter.js — A math tutor that writes and runs Python to answer questions.
What it Does
The Code Interpreter tool gives the AI access to a secure, sandboxed Python execution environment hosted by OpenAI. The AI writes Python code, runs it inside the sandbox, sees the result, and uses that result to form its answer.
This is not the AI pretending to calculate — it is actually executing code. Ask it to plot a graph and it will write matplotlib code and return an image. Ask it to solve a differential equation and it will run sympy. The sandbox is stateful within a single Thread, so variables defined in one run persist in the next.
The Setup
const assistant = await openai.beta.assistants.create({
name: "Math Tutor",
instructions: "You are a helpful math tutor. Write and run code to answer math questions.",
model: "gpt-4.1",
tools: [{ type: "code_interpreter" }],
});
const thread = await openai.beta.threads.create();
That single { type: "code_interpreter" } line is all it takes to grant the AI a Python sandbox. No configuration needed.
The Chat Loop
while(true) {
const userMessage = await rl.question("Ask a math question: ");
if (userMessage.toLowerCase() === 'exit') break;
// 1. Add the user's message to the thread
await openai.beta.threads.messages.create(thread.id, {
role: "user",
content: userMessage,
});
// 2. Execute the run and wait for it to complete
const run = await openai.beta.threads.runs.createAndPoll(thread.id, {
assistant_id: assistant.id,
});
// 3. Fetch and print the assistant's response
if (run.status === 'completed') {
const messages = await openai.beta.threads.messages.list(run.thread_id);
for (const message of messages.data.reverse()) {
if (message.role === "assistant") {
console.log(message.content[0].text.value);
}
}
}
}
Key detail — createAndPoll: This is a convenience method that creates the Run and automatically polls the API every second until the run reaches a terminal state (completed, failed, cancelled, etc.). It removes the manual polling loop. For Code Interpreter this is safe to use because there is no requires_action state to handle — the AI manages the sandbox internally.
Key detail — message ordering: messages.list() returns messages newest-first. .reverse() puts them in natural chronological order so the conversation prints correctly.
When to Use It
- •Any task where accurate computation matters: statistics, finance, data analysis
- •Generating charts or visualizations from data
- •Running simulations or iterative calculations
- •Anywhere you cannot trust the model to compute in its head reliably
What the Output Looks Like
Ask "What is the definite integral of x² from 0 to 3?" and the AI does not guess. It writes Python, runs it, and explains the result:
> What is the definite integral of x² from 0 to 3?
To solve this I'll use sympy to compute it exactly.
import math
from sympy import *
x = symbols('x')
result = integrate(x**2, (x, 0, 3))
print(result) # → 9
The definite integral of x² from 0 to 3 is 9.
Analytically: [x³/3] from 0 to 3 = 27/3 - 0 = 9. ✓
The model wrote the code, executed it in the sandbox, got 9, then used that confirmed result in its answer. No hallucinated arithmetic.
Tool 2: File Search (RAG with Vector Stores)
File: file_search.js — A document assistant that searches a private knowledge base.
What it Does
The File Search tool implements a managed Retrieval-Augmented Generation (RAG) pipeline. You upload documents, Azure OpenAI automatically chunks and embeds them into a Vector Store, and then the AI uses semantic search to retrieve relevant passages before answering a question.
This solves the core problem with LLMs and private data: the model was not trained on your internal documents. File Search bridges that gap without you having to build a chunking pipeline, manage an embedding model, or run a vector database yourself.
The Setup
// 1. Create the Vector Store (a named index that holds your documents)
const vectorStore = await openai.vectorStores.create({
name: "Secure Knowledge Base"
});
// 2. Upload a file and wait for it to be processed (chunked + embedded)
const fileStream = fs.createReadStream("vault_info.txt");
await openai.vectorStores.fileBatches.uploadAndPoll(vectorStore.id, {
files: [fileStream]
});
// 3. Create the assistant and link it to the vector store
const assistant = await openai.beta.assistants.create({
name: "Security Guard",
instructions: "You are a secure document assistant. Use the file_search tool to answer questions.",
model: "gpt-4.1",
tools: [{ type: "file_search" }],
tool_resources: {
file_search: { vector_store_ids: [vectorStore.id] }
}
});
Key detail — uploadAndPoll: Uploading a file is not instantaneous. The service needs to chunk the text, run it through the embedding model, and index it. uploadAndPoll blocks until this processing is complete so you do not start asking questions against a half-indexed store.
Key detail — tool_resources: This is how you connect a Vector Store to an Assistant. The vector_store_ids array can hold multiple stores — the AI will search across all of them.
How Retrieval Works (Under the Hood)
When a Run is created with File Search enabled, the pipeline is:
User question
↓
Question is embedded into a vector
↓
Vector Store is searched for semantically similar chunks
↓
Top matching chunks are injected into the AI's context window
↓
AI generates an answer grounded in the retrieved text
The AI never "sees" the whole document at once. It only sees the chunks that are relevant to the current question. This makes File Search practical even for very large document sets that would not fit in a context window.
When to Use It
- •Internal Q&A bots (HR policies, runbooks, compliance docs)
- •Customer support bots grounded in your product documentation
- •Legal or financial document analysis
- •Any application where the AI must answer from a specific corpus, not general training data
What the Output Looks Like
Ask a question and the AI retrieves the relevant chunk before answering. By default the response includes inline citation markers:
> Who is allowed to access the vault?
Access to the vault is restricted to personnel with security clearance level 3 or
above, and requires two-factor authentication for every session. All access
attempts are logged and reviewed weekly【4:0†vault_info.txt】.
The 【4:0†vault_info.txt】 is a citation injected by File Search pointing back to the source chunk. Strip it in your UI with:
const clean = message.replace(/【[^】]*】/g, "").trim();
Tool 3: Function Calling (Custom Tools)
File: functions_tool.js — A security operations assistant that can query live data and trigger real actions.
This is the most powerful and complex of the three tools. It is also where the full Run lifecycle becomes visible.
What it Does
Function Calling lets you extend the AI with your own arbitrary logic. You define a set of "tools" as JSON schemas — their name, what they do, and what parameters they accept. The AI decides when to call them, constructs the arguments, and sends them back to you. You execute your own code and return the result. The AI then uses that result to form its final answer.
Critically: the AI never runs your code directly. It only sends a structured request. You control execution.
Defining the Tools
Each function is described with a JSON schema. The description field is the most important part — it is what the AI reads to decide whether to call this function for a given user request.
{
type: "function",
function: {
name: "getVulnerabilityCount",
description: "Returns the number of vulnerabilities found in a specific environment.",
parameters: {
type: "object",
properties: {
systemName: {
type: "string",
enum: ["production", "staging", "legacy"]
}
},
required: ["systemName"]
}
}
}
The enum constraint tells the AI which values are valid. This prevents hallucinated parameter values — the AI is forced to pick from your allowed list.
The Local Handlers
These are the actual JavaScript functions that run on your server:
const mockFunctions = {
getVulnerabilityCount: (args) => {
const data = { production: 5, staging: 0, legacy: 104 };
return JSON.stringify({ count: data[args.systemName.toLowerCase()] || "unknown" });
},
blockIPAddress: (args) => {
console.log(`[FIREWALL ACTION] Blocking traffic from ${args.ipAddress}`);
return JSON.stringify({ status: "success", blocked_ip: args.ipAddress });
}
// ...
};
In a real application these would call Azure Monitor APIs, a SQL database, a firewall SDK — anything. The mock just returns static data so the pattern is clear without needing live Azure resources.
The Full Run Lifecycle
This is where Function Calling differs from the other two tools. Because the AI needs to pause and wait for your local code to return data, the Run goes through an additional requires_action state.
// Step 1: Create the Run (non-blocking this time — we need to poll manually)
let run = await openai.beta.threads.runs.create(thread.id, { assistant_id: assistant.id });
// Step 2: Poll until it's no longer actively processing
while (run.status === 'queued' || run.status === 'in_progress') {
await new Promise(resolve => setTimeout(resolve, 1000));
run = await openai.beta.threads.runs.retrieve(run.id, { thread_id: thread.id });
}
// Step 3: Handle tool calls if the AI requested one
if (run.status === 'requires_action') {
const toolCalls = run.required_action.submit_tool_outputs.tool_calls;
const toolOutputs = toolCalls.map(tc => {
const args = JSON.parse(tc.function.arguments);
const result = mockFunctions[tc.function.name](args);
return {
tool_call_id: tc.id, // Must match — the AI uses this to correlate results
output: result
};
});
// Step 4: Return the results and wait for the AI to finish its response
run = await openai.beta.threads.runs.submitToolOutputsAndPoll(run.id, {
thread_id: thread.id,
tool_outputs: toolOutputs
});
}
// Step 5: Print the final response
if (run.status === 'completed') {
const messages = await openai.beta.threads.messages.list(thread.id);
console.log(messages.data[0].content[0].text.value);
}
Key detail — tool_call_id: When you submit tool outputs back to the API, each output must include the tool_call_id from the original request. The AI may have requested multiple functions in a single run — this ID is how it matches each result to the correct call.
Key detail — parallel tool calls: The API can request multiple function calls in a single requires_action event. The toolCalls.map(...) pattern handles this naturally — it processes every request in the array and submits all results in one batch.
Key detail — why runs.create instead of createAndPoll here: createAndPoll is fine for Code Interpreter and File Search because those tools are handled server-side. For Function Calling, createAndPoll would stop at requires_action anyway — but using runs.create and manual polling makes the lifecycle explicit and easier to understand.
Heads up — alternative auth pattern: In some edge cases the
AzureOpenAISDK client has path-construction bugs with certain API preview versions. If you hit unexpected 404s or missing auth headers, you can bypass the SDK's auth layer entirely with a customfetchoverride on the baseOpenAIclass:const openai = new OpenAI({ apiKey: "NA", baseURL: `${process.env.AZURE_OPENAI_ENDPOINT}openai/`, defaultQuery: { "api-version": "2024-05-01-preview" }, defaultHeaders: { "OpenAI-Beta": "assistants=v2" }, async fetch(url, options) { const tokenResponse = await credential.getToken(scope); const headers = new Headers(options.headers); headers.set("Authorization", `Bearer ${tokenResponse.token}`); options.headers = headers; return fetch(url, options); } });This achieves the same security outcome as
azureADTokenProvider— it just injects the token at the HTTP layer instead of through the SDK abstraction.
When to Use It
- •Any situation where the AI needs to read live data (not static documents)
- •Taking real-world actions: triggering deployments, creating tickets, blocking IPs
- •Multi-step workflows where the AI orchestrates calls to different internal services
- •Building AI agents that interact with your existing APIs
What the Output Looks Like
Ask "How many vulnerabilities does the legacy system have?" and watch the full cycle: the AI pauses, calls your function, receives the data, then forms a grounded answer:
> How many vulnerabilities does the legacy system have?
[Server] getVulnerabilityCount called with: { systemName: 'legacy' }
The legacy system currently has 104 known vulnerabilities — far higher than
production (5) and staging (0). I'd recommend prioritizing a security audit
of the legacy environment before the next release cycle.
The AI did not know the number. It requested it from your function, received { count: 104 }, and used that confirmed value to form its response. The [Server] line is your local handler logging the call — it never touches the OpenAI API.
Comparison: Which Tool for Which Problem?
| Scenario | Tool |
|---|---|
| The AI needs to calculate, process data, or run code | Code Interpreter |
| The AI needs to answer questions from your private documents | File Search |
| The AI needs to read live data or take real-world actions | Function Calling |
| The AI needs multiple capabilities at once | Combine all three |
One Assistant can have all three tools enabled simultaneously. The AI picks the right one based on the user's request.
SDK Method Reference
Every method used across the three examples, with a quick explanation of what it does and when to reach for it.
| Method | What It Does | When to Use It |
|---|---|---|
DefaultAzureCredential() | Tries a chain of credential sources in order: environment variables, workload identity, managed identity, az login session | Always — it works locally and in production without changing code |
getBearerTokenProvider(credential, scope) | Wraps a credential into a function that returns a fresh token on demand | When passing auth to the AzureOpenAI SDK constructor via azureADTokenProvider |
beta.assistants.create({ model, instructions, tools }) | Creates a named AI persona with a model, system prompt, and tool permissions | Once per application — not once per conversation |
beta.threads.create() | Opens a new conversation session the API will manage | At the start of each new user session |
beta.threads.messages.create(threadId, { role, content }) | Appends a message to an existing thread | Before every run — this is how you send user input |
beta.threads.runs.createAndPoll(threadId, { assistant_id }) | Starts a run and blocks until it reaches a terminal state | When using Code Interpreter or File Search — tools the API handles server-side with no requires_action pause |
beta.threads.runs.create(threadId, { assistant_id }) | Starts a run and returns immediately without waiting | When using Function Calling — you need to poll manually so you can handle requires_action |
beta.threads.runs.retrieve(runId, { thread_id }) | Fetches the current status of a run | Inside a manual poll loop to check if a run has reached requires_action or completed |
beta.threads.runs.submitToolOutputsAndPoll(runId, { thread_id, tool_outputs }) | Sends your function results back to the API and waits for the final response | After executing your local functions in response to a requires_action event |
beta.threads.messages.list(threadId) | Returns all messages in a thread, newest first | After a run completes to read the assistant's response |
vectorStores.create({ name }) | Creates a named index that will hold your document embeddings | Once per knowledge base — before uploading any files |
vectorStores.fileBatches.uploadAndPoll(vectorStoreId, { files }) | Uploads files, chunks and embeds them, and blocks until indexing is complete | When adding documents to a vector store before the assistant starts answering questions |
What to Watch Out For
Thread persistence: Threads persist on the API indefinitely. In production you need a strategy for storing Thread IDs against user sessions (database, cookie, etc.) and for deleting old threads to control costs.
Run timeouts: Runs that take longer than 10 minutes are automatically cancelled. Long-running function calls that do heavy processing should be offloaded asynchronously — return a job ID to the AI and let it poll.
File Search citation noise: By default, File Search injects citation references like 【4:0†source】 into the response text. You can suppress these by setting include: [] on the message list call, or strip them with a regex in your UI.
Model deployment names: The model field in assistants.create() must exactly match the Deployment Name you configured in Azure AI Foundry — not the model family name. If your deployment is named gpt-4-1-prod, that is what goes in the model field.
Cost model: Code Interpreter charges per session. File Search charges per GB of vector store storage per day. Function Calling itself has no additional cost beyond the token usage. Factor this in when choosing tools.
Key Takeaways
The Assistants API is what you reach for when a single chat completion is not enough — when you need stateful conversation, real computation, private document grounding, or the ability to act on the world.
The three tools are not competing options. They are complementary layers:
- •Code Interpreter gives the AI a brain that can compute accurately
- •File Search gives the AI memory over your private knowledge
- •Function Calling gives the AI hands to interact with your systems
The authentication pattern is the same across all three: DefaultAzureCredential with the Cognitive Services scope. Get this right once and it works everywhere from your laptop to production.
Aziz Jarrar
Full Stack Engineer