Overview
The on-prem collector is a lightweight agent that runs within your network. It clones repositories, analyzes commits using an LLM, and streams only structured metadata (knowledge graphs, summaries, metrics) back to the Navigara cloud. Source code never leaves your infrastructure. The collector connects to the Navigara API over a persistent gRPC stream. It receives work assignments (which repos/commits to analyze), processes them locally, and sends back structured results. If the connection drops, it automatically reconnects with exponential backoff and replays any buffered results.Prerequisites
- Docker Engine 24+ and Docker Compose v2+ on a Linux host (Ubuntu 24.04 LTS or Debian 13+ recommended)
- Network access to your Git repositories (GitHub, GitLab, Bitbucket, or self-hosted)
- Outbound HTTPS to the Navigara API (
app.navigara.com:443) - LLM API endpoint — see LLM Configuration below
Hardware Requirements
| Scale | CPU | Memory | Disk |
|---|---|---|---|
| Small (up to 500K commits) | 8 vCPU | 16 GB | 200 GB SSD |
| Medium (up to 5M commits) | 16 vCPU | 32 GB | 500 GB SSD |
| Large (up to 50M commits) | 32 vCPU | 64 GB | 1 TB SSD |
Disk is used for temporary Git clones during analysis. The collector caches cloned repositories to speed up subsequent analyses — SSD storage is recommended.
Git Provider Authentication
The collector supports multiple authentication methods depending on your Git provider and security requirements. All credentials are configured in the Navigara dashboard and forwarded to the collector on demand over the encrypted gRPC stream — no static Git tokens are stored on the collector host.Option 1: Navigara GitHub App (Recommended for GitHub)
Install the Navigara GitHub App on your GitHub organization. The Navigara cloud automatically generates short-lived installation tokens and sends them to the collector over the gRPC stream. No static credentials are stored on the collector. How it works:- Install the Navigara GitHub App on your GitHub organization (or specific repositories)
- Add the repositories in the Navigara dashboard
- The backend generates scoped installation tokens on demand and sends them to the collector
- Tokens are short-lived and automatically rotated
- No static tokens to manage or rotate
- Fine-grained repository access (select specific repos during app installation)
- Works with both GitHub.com and GitHub Enterprise
This is the recommended approach for GitHub users.
Option 2: Provider Tokens, End-to-End Encrypted (All Providers)
For any provider not covered by the GitHub App, use a personal access token — encrypted end-to-end so that Navigara can never read it. You generate a keypair on the collector host, encrypt the token locally, and paste the resulting blob into the dashboard. Navigara stores and forwards that blob verbatim — there is no server-side key and no server-side decryption. The collector holds the private key and decrypts each token in memory, only at the moment it authenticates to the Git host. The plaintext token and the private key never leave your network.This applies to the on-prem collector specifically. In a full on-premises deployment the entire platform already runs inside your network, so the token never leaves your infrastructure regardless — paste it directly without the encryption steps.
Create a personal access token
- GitHub
- GitLab
- Bitbucket
Create a fine-grained personal access token with:
- Repository access: Select the repositories you want to analyze
- Permissions:
Contents(read),Pull requests(read),Metadata(read)
Generate a keypair on the collector host
private.pem (mode 0600) and public.pem to /opt/navigara/keys, and prints the key fingerprint. Keep private.pem on this host — it is never shared.Give the collector the private key
Mount the key and set
COLLECTOR_PRIVATE_KEY_PATH in your docker-compose.yml, then restart:Already have a plaintext token saved? Migrate by minting a fresh PAT at your Git host, encrypting it, pasting the blob, then revoking the old token — a clean rotation with no downtime.
COLLECTOR_PRIVATE_KEY_PATH, or use COLLECTOR_PRIVATE_KEY_PEM for inline PEM) and selects the matching key per blob by fingerprint, so you can roll a new key before re-encrypting old tokens. If a blob can’t be decrypted — wrong key, tampering, or a paste error — the collector drops it and fails the task loudly; it is never forwarded to the Git host.
Installation
1. Prepare the host
2. Generate a Collector API Token
In the Navigara dashboard, go to Settings → API Tokens and create a new API token. This token authenticates the collector with the Navigara backend. Copy it — you’ll need it in the next step.3. Configure the deployment
Create the deployment directory:docker-compose.yml:
.env file:
LLM Configuration
Navigara requires an LLM API endpoint for AI-powered commit analysis. Supported providers:| Provider | Model | Notes |
|---|---|---|
| Anthropic | Claude Sonnet / Claude Haiku | Recommended — best quality/cost ratio |
| Google Vertex AI | Gemini 2.5 Flash | Good cost/performance ratio |
| OpenAI | GPT-5.4 | Widely available |
- Anthropic (Recommended)
- Google Vertex AI
- OpenAI
- Self-hosted (OpenAI-compatible)
LLM_API_URL to that endpoint instead. Do not include /v1 in the URL — the SDK appends /v1/messages itself.4. Start the collector
5. Add repositories
Once the collector is running, add repositories through the Navigara dashboard:- Go to Settings → Repositories → Add Repository
- Select your Git provider and authenticate (if using cloud-managed tokens or GitHub App)
- Select the repositories to analyze
- The collector will automatically begin processing
Configuration Reference
| Variable | Default | Description |
|---|---|---|
SERVER_ADDR | app.navigara.com:443 | Navigara API gRPC address |
COLLECTOR_TLS | true | Enable TLS for gRPC connection |
COLLECTOR_API_KEY | — | API token generated in Settings → API Tokens (required) |
COLLECTOR_ID | hostname | Unique identifier for this collector (defaults to container hostname) |
MAX_WORKERS | 10 | Number of concurrent commit analysis workers |
LLM_PROVIDER | anthropic | LLM provider: anthropic, openai, or genai |
LLM_MODEL | claude-sonnet-4-20250514 | Model name |
LLM_API_KEY | — | API key for the LLM provider |
LLM_API_URL | — | LLM endpoint URL. Required for anthropic (e.g. https://api.anthropic.com) and openai. Also used for Azure Foundry, internal gateways, and self-hosted models. |
GOOGLE_PROJECT | — | GCP project ID (required for genai provider) |
GOOGLE_LOCATION | global | GCP location (for genai provider) |
WORK_DIR | /tmp/git-analysis | Directory for temporary Git clones |
COLLECTOR_PRIVATE_KEY_PATH | — | Path to a PEM private key for encrypted tokens. Comma-separate multiple files for rotation. |
COLLECTOR_PRIVATE_KEY_PEM | — | Inline PEM private key(s) for encrypted tokens. Alternative to COLLECTOR_PRIVATE_KEY_PATH. |
Running Multiple Collectors
You can run multiple collector instances for higher throughput or geographic distribution. Each collector must have a uniqueCOLLECTOR_ID. The Navigara backend distributes work across connected collectors with affinity routing — it prefers sending work to a collector that already has a repository cached locally.
Network Requirements
The collector host must have outbound access to the following services:| Service | Purpose | Endpoint |
|---|---|---|
| Navigara API | Work assignment and result streaming | app.navigara.com:443 |
| Git provider | Repository cloning and fetching | github.com, gitlab.com, or your self-hosted instance |
| LLM API | AI-powered commit analysis | api.anthropic.com, api.openai.com, or Vertex AI endpoints |
Upgrades
Troubleshooting
| Issue | Solution |
|---|---|
connection refused | Verify outbound access to app.navigara.com:443 from the host |
authentication failed | Check that COLLECTOR_API_KEY matches the token configured in Navigara |
LLM analysis failing | Verify LLM_API_KEY and that the host can reach the LLM endpoint |
| Collector keeps reconnecting | Check logs for specific errors; ensure the gRPC stream is not being terminated by a proxy or firewall |
| Slow analysis | Increase MAX_WORKERS (ensure sufficient CPU/memory) or add a second collector |
| High disk usage | The collector caches Git clones in WORK_DIR; restart the container or clear the volume to reclaim space |

