I had a requirement to track LLM and Content Understanding token usage within
a multi-tenant application for downstream customer billing, rather than
relying solely on Application Insights.
Thought of using
AI gateway in Azure API Management
in front of Azure OpenAI / Foundry endpoints.
Specifically:
- Expose AI endpoints via APIM (e.g., Language Model APIs / Foundry)
- Use policies such as llm-emit-token-metric (but this seems tightly coupled to App Insights).
- Worst case: custom policies to intercept responses, capture token usage metadata (prompt, completion, total tokens) and emit usage events to Event Hub from APIM via log-to-eventhub
- Then process these events via a worker to persist usage records to our billing datastore.
Believed this is a common requirement, but couldn't find any better solution. Proceeded with custom policies.
Thought of giving it a try with Content Understanding first as it felt a bit
challenging.
Didn't go through the AI gateway in Azure API Management path, instead just added a REST API to APIM.
{
"openapi": "3.0.1",
"info": {
"title": "Test Foundry API",
"description": "",
"version": "1.0"
},
"servers": [{
"url": "https://<SOME_APIM>.com/test-foundry-api"
}],
"paths": {
"/{*path}": {
"post": {
"summary": "POST",
"operationId": "post",
"parameters": [{
"name": "*path",
"in": "path",
"required": true,
"schema": {
"type": ""
}
}],
"responses": {
"200": {
"description": ""
}
}
},
"get": {
"summary": "GET",
"operationId": "get",
"parameters": [{
"name": "*path",
"in": "path",
"required": true,
"schema": {
"type": ""
}
}],
"responses": {
"200": {
"description": ""
}
}
}
}
},
"components": {
"securitySchemes": {
"apiKeyHeader": {
"type": "apiKey",
"name": "Ocp-Apim-Subscription-Key",
"in": "header"
},
"apiKeyQuery": {
"type": "apiKey",
"name": "subscription-key",
"in": "query"
}
}
},
"security": [{
"apiKeyHeader": []
}, {
"apiKeyQuery": []
}]
}
Now the most important part. Added the following
All Operations policy. Here instead of sending messages to Event Hub,
I am sending to Service Bus using send-service-bus-message (Sending messages to Azure Service Bus from Azure API Management) for testing purposes.
<policies> <inbound> <base /> <set-variable name="tenantId" value="@(context.Request.Headers.GetValueOrDefault("x-tenant-id", "unknown"))" /> <set-backend-service base-url="https://<SOME_FOUNDRY>.services.ai.azure.com" /> </inbound> <backend> <forward-request buffer-request-body="true" /> </backend> <outbound> <base /> <set-header name="Operation-Location" exists-action="override"> <value>@{ var location = context.Response.Headers.GetValueOrDefault("Operation-Location", ""); if (string.IsNullOrEmpty(location)) { return location; } var uri = new Uri(location); var req = context.Request.OriginalUrl; return req.Scheme + "://" + req.Host + "/" + context.Api.Path + uri.PathAndQuery; }</value> </set-header> <choose> <when condition="@(context.Response.StatusCode >= 200 && context.Response.StatusCode < 300)"> <set-variable name="body" value="@(context.Response.Body.As<string>(preserveContent: true))" /> <choose> <when condition="@{ var text = (string)context.Variables["body"]; if (string.IsNullOrEmpty(text) || !text.TrimStart().StartsWith("{")) { return false; } var json = Newtonsoft.Json.Linq.JObject.Parse(text); var statusToken = json["status"]; var status = statusToken == null ? string.Empty : ((string)statusToken).ToLowerInvariant(); return status == "succeeded" || status == "completed" || status == "failed"; }"> <send-service-bus-message topic-name="sbt-test-usage-tracking" namespace="<SOME_SERVICEBUS_NAMESPACE>.servicebus.windows.net" client-id="<SOME_MANAGED_IDENTITY_CLIENT_ID>"> <payload>@{ var json = Newtonsoft.Json.Linq.JObject.Parse((string)context.Variables["body"]); var operationIdToken = json["id"]; var analyzerIdToken = json["result"]?["analyzerId"]; var statusToken = json["status"]; return new Newtonsoft.Json.Linq.JObject( new Newtonsoft.Json.Linq.JProperty("tenantId", (string)context.Variables["tenantId"]), new Newtonsoft.Json.Linq.JProperty("eventType", "cu-analysis-completed"), new Newtonsoft.Json.Linq.JProperty("requestId", context.RequestId.ToString()), new Newtonsoft.Json.Linq.JProperty("operationId", operationIdToken == null ? string.Empty : (string)operationIdToken), new Newtonsoft.Json.Linq.JProperty("analyzerId", analyzerIdToken == null ? string.Empty : (string)analyzerIdToken), new Newtonsoft.Json.Linq.JProperty("status", statusToken == null ? string.Empty : (string)statusToken), new Newtonsoft.Json.Linq.JProperty("usage", json["usage"] ?? new Newtonsoft.Json.Linq.JObject()), new Newtonsoft.Json.Linq.JProperty("timestamp", DateTime.UtcNow.ToString("o")) ).ToString(); }</payload> </send-service-bus-message> </when> </choose> </when> </choose> </outbound> <on-error> <base /> </on-error> </policies>
Important points:
- Forward Request: buffer-request-body="true": Needed for binary PDF forwarding
- Header Operation-Location Rewrite: Routes SDK polling back through APIM so the outbound policy fires
Then used the Azure Content Understanding .NET Client (Azure Content Understanding Client Library for .NET) to trigger an analysis and polled for the result.
// NOTE: Endpoint is now APIM string endpoint = "https://<SOME_APIM>.com/test-foundry-api"; ContentUnderstandingClientOptions contentUnderstandingClientOptions = new(); contentUnderstandingClientOptions.AddPolicy(new TenantHeaderPolicy("<SOME_TENANT_ID>"), HttpPipelinePosition.PerCall); ContentUnderstandingClient contentUnderstandingClient = new(new Uri(endpoint), new DefaultAzureCredential(), contentUnderstandingClientOptions); // TODO: Trigger analysis and poll
// REFER: https://jaliyaudagedara.blogspot.com/2026/03/azure-content-understanding-client.html sealed class TenantHeaderPolicy(string tenantId) : HttpPipelineSynchronousPolicy { public override void OnSendingRequest(HttpMessage message) { Console.WriteLine($"Calling: {message.Request.Method} {message.Request.Uri}"); message.Request.Headers.SetValue("x-tenant-id", tenantId); message.Request.Headers.SetValue("Ocp-Apim-Trace", "true"); } }
Looked promising.
|
|
| Service Bus Message |
Hope this helps.
Happy Coding.
Regards,
Jaliya
No comments:
Post a Comment