Skip to content
← Cosmos workspace

DBA handbook — Cosmos Optimizer step-by-step

Walk through every pane of the Cosmos Optimizer with a real production cluster. For each screen: what it shows, how to test it, what to expect, and how to troubleshoot when something is off.

§0 Prerequisites

Before opening the Optimizer, finish the setup guide. Without credentials, every metrics pane shows NO SOURCE and falls back to Local samples (only what NoSqlStudio itself queried).

  1. Follow Cosmos monitoring setup end-to-end: Service Principal, RBAC role assignment, Resource ID, Cloud Credentials. Cosmos monitoring setup.
  2. Verify with az login --service-principal and az monitor metrics list that the SP can read metrics for your Cosmos account.
  3. Open NoSqlStudio → connect to the Cosmos account → open Tools → Cosmos DB → Cosmos Optimizer (or Ctrl+Alt+Shift+O).

§1 5-minute smoke test — confirms everything is wired

Goal: in 5 minutes, confirm Azure Monitor is feeding real data to NoSqlStudio. If any step fails, jump to §3 Troubleshooting. §3 Troubleshooting.

  1. Connect to your Cosmos account in NoSqlStudio. Confirm you see the database tree in the sidebar.
  2. Open Cosmos Optimizer. The intro banner should show CONFIGURE DATA SOURCES button if credentials are missing, or pane content directly if they are saved.
  3. Go to RU Budget. The source-mode banner should read AZURE MONITOR — aggregate only in green within 2 seconds. Click Refresh. Within 1-2 minutes you should see non-zero Consumed (avg).
  4. Click each tab (Query Cost, Hot Partitions, Index Policy, Throughput Optimizer, Throttling RCA, Diagnostic Logs, Composite Index, PITR Restore, Migration Wizard). None should throw a red error.
  5. Open the DevTools console (F12) — should NOT contain ChainedTokenCredential authentication failed or monitor client is null.
Expected
All ten panes load, RU Budget shows real RU/s within 2 minutes, no auth errors in console. If yes, you are production-ready.

§2 Per-pane walkthroughs

One section per pane. Each has: screenshot (when available), what it shows, step-by-step test, expected result on a busy production cluster, and troubleshooting notes for common issues.

§2.1

RU Budget

FREE TIER
RU Budget pane: live consumption, utilisation, 24h projection, per-collection table
RU Budget pane: live consumption, utilisation, 24h projection, per-collection table

Live view of RU/s consumed by your Cosmos account vs the budget you set. Holt-Winters forecast for 24h. Recommendation card comparing manual / autoscale / serverless / reserved.

Test steps

  1. Set RU/s budget to your account's provisioned throughput (e.g. 5000 for a 5k RU/s database-level provisioning).
  2. Click Refresh.
  3. Observe Consumed (avg): should be non-zero on an active cluster. Compare against the value you see in the Azure Portal under Metrics → NormalizedRUConsumption.
  4. Check Utilisation badge color: green = under 65%, yellow = 65–85%, red = over 85%. Red means urgent scale-up consideration.
  5. Read the Recommendation card. Common outputs: RESERVED-3Y for steady workloads, AUTOSCALE for spiky ones, SERVERLESS for low-volume bursts.
  6. Scroll to Per collection (preview). With Azure Monitor only, you see a single instance row aggregating everything — per namespace requires Log Analytics (PAID tier).
Expected
On a busy cluster: 495 RU/s consumed avg, 9.9% utilisation of the 5000 budget, WITHIN BUDGET badge in green. Recommendation: RESERVED-3Y with $32.9K/mo savings projected.
Troubleshoot
0 RU/s with AZURE MONITOR banner green = cluster genuinely idle in the 15-min window OR adapter mapping returns wrong metric. Confirm with az monitor metrics list --metric TotalRequestUnits — if CLI shows non-zero, file a NoSqlStudio bug. NO SOURCE banner = credentials not registered. Reload (Ctrl+R) the app, re-open Cloud Credentials, click Save again.
§2.2

Query Cost Inspector

ALWAYS-ON
Query Cost pane: ring buffer of per-query RU costs captured from x-ms-request-charge
Query Cost pane: ring buffer of per-query RU costs captured from x-ms-request-charge

Captures the x-ms-request-charge response header from every query you run through NoSqlStudio. Zero Azure API calls — pure client-side instrumentation, always available.

Test steps

  1. Open a Shell or Query tab against the Cosmos collection.
  2. cosmosSite.dbaHandbook.queryCost.step2
  3. Return to the Query Cost tab. Top query shapes should populate with the shape, total RU, average RU, p95, max, and count.
  4. Run the same shape 5+ times to populate the count and p95 properly.
  5. Use the filter input top-right to narrow to ns or shape substring.
Expected
Each new query appears immediately (no Azure delay). The ring buffer caps at 500 entries (oldest dropped) and survives until the app restart — not persisted to disk.
Troubleshoot
Queries run but don't appear: confirm you are running the query through NoSqlStudio (not through mongosh on another window or app code). The capture hook only sits on NoSqlStudio's DataService.
§2.3

Hot Partition Detector

ALWAYS-ON
Hot Partitions pane: heatmap of doc count by partition key + skew detection
Hot Partitions pane: heatmap of doc count by partition key + skew detection

Active probe that groups documents by your partition key and ranks the resulting partitions by doc count. Spots skew (one partition holding 40%+ of docs) before it becomes a 429 throttle storm.

Test steps

  1. Fill Namespace: db.coll format, e.g. catalog.products.
  2. Fill Partition key path: the field name you used as shardKey when creating the collection, e.g. tenantId.
  3. Click Scan. This runs a $group aggregation on Cosmos — costs RU proportional to collection size. On a 200 GB collection expect 100-500 RU.
  4. Read the spread badge in the top-right: green = even distribution, yellow = moderate skew, red = severe skew (top partition > 40% of docs).
  5. Use the heatmap to spot specific hot partitions. Hover any cell for the exact key value + doc count.
Expected
For a tenant-sharded collection with hundreds of tenants: green spread (no tenant dominates). For an org-sharded collection where one big customer is 60% of data: red badge, one cell deep red on the heatmap.
Troubleshoot
No DataService bound = connection lost or plugin not loaded for this connection. Reconnect. Scan errors out with 16500: TooManyRequests = your collection lacks enough RU to run the aggregation right now. Increase RU temporarily or scan during low traffic.
§2.4

Visual Index Policy Editor

ALWAYS-ON
Index Policy editor: schema tree with include/exclude checkboxes + JSON preview
Index Policy editor: schema tree with include/exclude checkboxes + JSON preview

Cosmos indexes every property by default — each write costs 1 RU per indexed property. Excluding never-queried paths can cut write RU consumption 30-70% on document-heavy collections.

Test steps

  1. Fill Namespace, e.g. catalog.products.
  2. Click Sample schema. NoSqlStudio reads 100 docs and builds a tree of all field paths.
  3. Uncheck any field that NEVER appears in a WHERE / filter clause — think imageBase64, fullText, metadata.audit.*, etc.
  4. Click Save draft. JSON preview updates with the proposed indexingPolicy.
  5. cosmosSite.dbaHandbook.indexPolicy.step5
Expected
On a typical e-commerce products collection (50+ fields, only 5 queried) you can drop indexing on ~40 paths. Write cost falls from ~50 RU/insert to ~10 RU/insert.
Troubleshoot
Sample schema returns empty tree = collection is empty (no documents to infer schema from). Insert a representative doc first.
§2.5

Throughput Optimizer + Bill Simulator

FREE TIER
Throughput Optimizer: workload classifier + bill comparison across 5 modes + Reserved Capacity quote
Throughput Optimizer: workload classifier + bill comparison across 5 modes + Reserved Capacity quote

Classifies your workload (STEADY / SPIKY / CYCLIC / RAMP) from the last 15 min of metrics, then simulates monthly bill across the 5 Cosmos billing modes side-by-side. Reserved Capacity quote at the bottom.

Test steps

  1. Click Refresh at top-right.
  2. Read the Workload classification card. Common outputs: STEADY (low variance, CV < 0.4), SPIKY (CV > 0.8, recommend autoscale), CYCLIC (predictable daily peaks, recommend reserved with scheduled scale).
  3. Compare the 5 Monthly bill cards. The cheapest with RECOMMENDED badge is the algorithm's pick.
  4. Look for THROTTLED red banner on the manual card: indicates your current p95 exceeds the simulated tier and would cause 429s.
  5. Scroll to Reserved capacity quote: concrete RU/s number to commit to + monthly savings vs manual.
Expected
STEADY workload (CV=0.27, p95=962 RU/s) → RESERVED-3Y recommended at $38.14/mo vs manual $58.68/mo → $134/yr savings. For a 1000-tenant SaaS that's real margin.
Troubleshoot
Known bug (May 2026): autoscale card sometimes shows misformatted total (e.g. $41.9K/mo) due to a unit error in the simulator. The other 4 cards are correct. Fix pending — track on the project board.
§2.6

Throttling RCA

FREE TIER
Throttling RCA: 429 count, peaks, sparkline + correlated suspect queries
Throttling RCA: 429 count, peaks, sparkline + correlated suspect queries

Counts 429 (rate-limited) responses in the recent window via Azure Monitor, then correlates each burst with queries captured in the Query Cost ring buffer. Surfaces the query shape with highest cumulative RU during the 429 window — the likely culprit.

Test steps

  1. Click Refresh. Read Total 429s, 429 events, and Peak / interval.
  2. Set the Window ± minutes input (default 5). Lower it for tighter correlation, raise it if no queries fell in the window.
  3. Scroll to Suspect queries. Each row = one 429 burst with the most expensive query shape that ran ± window minutes from the burst, with a confidence score (0-1).
  4. If empty: open the Query Cost Inspector tab and run traffic. Without query captures, RCA has nothing to correlate.
Expected
A busy production cluster showing 372.9K total 429s in 15 events (peak 41.7K) is significantly throttled. After running representative queries for ~5 minutes you should see Suspect queries populated with confidence > 0.5 for at least one shape.
Troubleshoot
429 count high but Suspect queries empty = traffic is going through the app servers (not NoSqlStudio). Either run the known-bad query in the Shell to capture it, OR upgrade to Log Analytics (PAID) for full per-shape RCA via KQL.
§2.7

Diagnostic Logs (KQL templates)

PAID TIER
Diagnostic Logs: 10 ready-made KQL templates against MongoRequests table
Diagnostic Logs: 10 ready-made KQL templates against MongoRequests table

10 ready-made Kusto queries against the Cosmos Diagnostic Logs in your Log Analytics workspace. Requires the PAID tier (Diagnostic Settings enabled, Log Analytics workspace ID configured).

Test steps

  1. Verify the AZURE MONITOR banner shows upgrade available: Log Analytics. If you haven't plugged in the workspace ID, this pane shows NO SOURCE and the Run in app button is disabled.
  2. Click any template card (e.g. Slow queries — last 1 hour) to load its KQL into the editor.
  3. Click Edit to tweak the KQL inline (e.g. change ago(1h) to ago(24h)).
  4. Click Run in app. Results table renders below within 2-5 seconds.
  5. For long-form analysis, click Open in portal to deep-link the query into the Azure Portal KQL editor.
Expected
Slow queries template returns 50 rows of the slowest Mongo requests with durationMs column. Top entries usually correlate with the same shapes in Throttling RCA.
Troubleshoot
Empty results despite known load = Diagnostic Settings on the Cosmos account may not have MongoRequests log category enabled, OR < 5 min delay since enabling. 403 Forbidden = SP needs Log Analytics Reader role on the workspace.
§2.8

Composite Index Recommender

ALWAYS-ON
Composite Index: detects multi-field query shapes lacking compound indexes + JSON snippet generator
Composite Index: detects multi-field query shapes lacking compound indexes + JSON snippet generator

Watches the Query Cost ring buffer, identifies query shapes touching 2+ fields without a corresponding composite index, projects ~60% RU savings, emits the JSON snippet to paste into Cosmos's indexingPolicy.

Test steps

  1. Set Min queries to consider = 5 (default). Threshold for considering a shape worth recommending.
  2. cosmosSite.dbaHandbook.compositeIndex.step2
  3. Return here. The Recommendations list fills with shape + projected RU saving + JSON snippet.
  4. Click Copy JSON on any recommendation, paste into your Cosmos collection's indexingPolicy via collMod.
Expected
cosmosSite.dbaHandbook.compositeIndex.expected
Troubleshoot
No multi-field query patterns captured yet = Query Cost ring buffer is empty or only single-field shapes were captured. Run more diverse queries.
§2.9

Point-in-Time Restore

WIZARD
PITR Restore wizard: form for account / timestamp / target / region + CLI preview
PITR Restore wizard: form for account / timestamp / target / region + CLI preview

Disaster-recovery wizard. Picks a point in the Cosmos continuous-backup window (7 or 30 days depending on configured policy), generates the az cosmosdb restore CLI command, optionally executes it via the host shell.

Test steps

  1. Fill Account (the Cosmos account name).
  2. Set the Restore timestamp via slider or ISO-8601 input. Default: now minus 1 hour.
  3. Fill Target account name (must be new — cannot overwrite source) and Region.
  4. Click Validate plan. NoSqlStudio runs az cosmosdb show-backup-information to confirm a backup exists at that timestamp. Returns green check or red error.
  5. Click Run via host shell (az-cli) when ready. Restore takes 30-90 minutes for typical collections.
Expected
For a real DR drill: validate plan succeeds within 5 seconds, shows backup size + timestamp confirmation. Actual restore runs async — monitor in the portal.
Troubleshoot
Account does not have continuous backup enabled = you have periodic backup mode. PITR requires --backup-policy-type Continuous at account creation OR a migration to continuous mode.
§2.10

Migration Wizard Atlas ↔ Cosmos

WIZARD
Migration Wizard: bidirectional Atlas ↔ Cosmos with translation matrix + script generator
Migration Wizard: bidirectional Atlas ↔ Cosmos with translation matrix + script generator

Bidirectional migration assistant. Picks source + target (Atlas → Cosmos or Cosmos → Atlas), shows translation matrix (features that become unsupported), projects monthly bill on the destination, generates the mongosh script.

Test steps

  1. Choose Source and Target: Atlas → Cosmos or vice-versa.
  2. Read the Translation matrix: features that don't survive the move (e.g. Atlas Search → Cosmos: works via NoSqlStudio multi-DB engines locally, not natively; Cosmos PITR → Atlas: replaced by Atlas cluster snapshots).
  3. Check Bill projection: estimated monthly cost on the target based on your current workload.
  4. Click Generate script. Outputs a mongosh script with insertMany batches and progress reporting.
  5. For real migration, dry-run on a test collection first (Cosmos charges write RU for inserts during migration!).
Expected
Translation matrix lists ~5-10 features with status (works/dropped/replaced). Bill projection within ±30% of actual on simple workloads.
Note
For multi-TB migrations, use Azure Data Factory or Cosmos's native migration tool instead — this wizard targets smaller / point migrations and DR scenarios.
§2.11

Cloud Credentials

ALWAYS-ON
Cloud Credentials: AWS DocDB + Azure Service Principal + Log Analytics workspace ID
Cloud Credentials: AWS DocDB + Azure Service Principal + Log Analytics workspace ID

The control panel for what data sources feed every other pane. Three tiers: TIER C local (always on), TIER B Azure Monitor (FREE), TIER A Log Analytics (PAID).

Test steps

  1. Confirm the TIER B card shows CONFIGURED badge if you saved credentials.
  2. Sensitive fields (Resource ID, Tenant ID, Client ID, Client Secret, Workspace ID) appear masked as •••••••XXXX with an Edit button. Click Edit to reveal + change, then Mask to hide again.
  3. After Save: source-mode banner on every metrics pane flips to AZURE MONITOR within < 100 ms (via internal event), or up to 15 s (via poll fallback).
  4. Clear button (red) wipes credentials for this connection from disk + removes the bridge — everything reverts to NO SOURCE / Local samples.
Expected
The Cloud Credentials pane shows green status dots next to each saved credential, and the Operations Center can talk to Log Analytics + Azure Monitor without errors.
Note
Credentials are stored encrypted via Electron safeStorage (OS keychain on macOS, DPAPI on Windows, libsecret on Linux). The file lives at %APPDATA%/NoSqlStudio Dev Local/CloudCredentials/&lt;connectionId&gt;.bin.

§3 Troubleshooting matrix

SymptomLikely causeFix
NO SOURCE banner on RU BudgetBridge not registered for this connection (renderer reload after credential save).Open Cloud Credentials, click Save again. Lazy-register fires automatically on workspace mount in normal operation; force it via re-save when needed.
Banner green AZURE MONITOR but 0 RU/sAuth chain failed silently (DefaultAzureCredential fallback) OR metric mapping bug.Open DevTools (F12). Look for [azure-cosmos-adapter] logs. If you see ChainedTokenCredential failed: Client Secret is empty in Cloud Credentials, fill it. If you see metrics.list 401/403: SP lacks Monitoring Reader role.
Cloud Credentials Save toast green, but RU Budget still shows NO SOURCE after reloadAdapter SDK packages (@azure/arm-monitor, @azure/monitor-query) missing in node_modules.Verify with node -e "require('@azure/arm-monitor')". If missing, run npm install --no-save @azure/arm-monitor @azure/monitor-query in the Compass repo (declared as optionalDependencies so npm sometimes skips them).
Cannot find module 'semver/preload' error at build timeNode 24 × semver 6.x mismatch in hadron-build / @mongodb-js/devtools-github-repo.Create shim: node_modules/semver/preload.js with module.exports = require('./semver.js'). Or downgrade to Node 22.21.1 (the version Compass workflows test against).
Diagnostic Logs (KQL) shows NO SOURCELog Analytics workspace ID not configured (PAID tier unwired).Decide: do you need per-shape KQL? If yes, enable Diagnostic Settings on Cosmos → stream to Log Analytics, paste the workspace GUID into Cloud Credentials. If no, accept that this pane is locked — Throttling RCA and RU Budget still work via Azure Monitor.
Query Cost / Composite Index emptyNo queries run through NoSqlStudio yet (or all queries were via mongosh outside the app).Run actual queries against your collection inside NoSqlStudio (Shell or Query tab). The cost capture hook lives on NoSqlStudio's DataService — it only sees what passes through.
Autoscale card shows nonsense like $41.9K/moKnown bug in the bill simulator's autoscale calculation.Other 4 cards (manual / serverless / reserved-1y / reserved-3y) are accurate. Ignore the autoscale number — fix planned for next release.
All panes scroll cut off mid-contentLeafyGreen &lt;Tab&gt; doesn't propagate height — was a layout bug before May 25 2026.Update to latest NoSqlStudio build — fix is in scrollStyles with explicit maxHeight: calc(100vh - 320px).
Hot Partitions scan returns 16500: TooManyRequestsAggregation throttled because cluster lacks RU headroom right now.Temporarily bump RU/s, run scan, scale back. OR schedule the scan for off-peak window.
Throttling RCA "Suspect queries" empty despite high 429 countTraffic comes from app servers / other clients, not NoSqlStudio.Either replay known-bad queries in NoSqlStudio shell to populate the ring buffer, OR upgrade to Log Analytics (PAID) for full per-shape correlation via KQL.