Skip to main content

Overview

The on-prem collector is a lightweight agent that runs within your network. It clones repositories, analyzes commits using an LLM, and streams only structured metadata (knowledge graphs, summaries, metrics) back to the Navigara cloud. Source code never leaves your infrastructure. The collector connects to the Navigara API over a persistent gRPC stream. It receives work assignments (which repos/commits to analyze), processes them locally, and sends back structured results. If the connection drops, it automatically reconnects with exponential backoff and replays any buffered results.

Prerequisites

  • Docker Engine 24+ and Docker Compose v2+ on a Linux host (Ubuntu 24.04 LTS or Debian 13+ recommended)
  • Network access to your Git repositories (GitHub, GitLab, Bitbucket, or self-hosted)
  • Outbound HTTPS to the Navigara API (app.navigara.com:443)
  • LLM API endpoint — see LLM Configuration below

Hardware Requirements

ScaleCPUMemoryDisk
Small (up to 500K commits)8 vCPU16 GB200 GB SSD
Medium (up to 5M commits)16 vCPU32 GB500 GB SSD
Large (up to 50M commits)32 vCPU64 GB1 TB SSD
Disk is used for temporary Git clones during analysis. The collector caches cloned repositories to speed up subsequent analyses — SSD storage is recommended.

Git Provider Authentication

The collector supports multiple authentication methods depending on your Git provider and security requirements. All credentials are configured in the Navigara dashboard and forwarded to the collector on demand over the encrypted gRPC stream — no static Git tokens are stored on the collector host. Install the Navigara GitHub App on your GitHub organization. The Navigara cloud automatically generates short-lived installation tokens and sends them to the collector over the gRPC stream. No static credentials are stored on the collector. How it works:
  1. Install the Navigara GitHub App on your GitHub organization (or specific repositories)
  2. Add the repositories in the Navigara dashboard
  3. The backend generates scoped installation tokens on demand and sends them to the collector
  4. Tokens are short-lived and automatically rotated
Advantages:
  • No static tokens to manage or rotate
  • Fine-grained repository access (select specific repos during app installation)
  • Works with both GitHub.com and GitHub Enterprise
This is the recommended approach for GitHub users.

Option 2: Provider Tokens, End-to-End Encrypted (All Providers)

For any provider not covered by the GitHub App, use a personal access token — encrypted end-to-end so that Navigara can never read it. You generate a keypair on the collector host, encrypt the token locally, and paste the resulting blob into the dashboard. Navigara stores and forwards that blob verbatim — there is no server-side key and no server-side decryption. The collector holds the private key and decrypts each token in memory, only at the moment it authenticates to the Git host. The plaintext token and the private key never leave your network.
This applies to the on-prem collector specifically. In a full on-premises deployment the entire platform already runs inside your network, so the token never leaves your infrastructure regardless — paste it directly without the encryption steps.
1

Create a personal access token

Create a fine-grained personal access token with:
  • Repository access: Select the repositories you want to analyze
  • Permissions: Contents (read), Pull requests (read), Metadata (read)
2

Generate a keypair on the collector host

docker run --rm -v /opt/navigara/keys:/keys \
  europe-docker.pkg.dev/navigara-images/public/vision-collector:${NAVIGARA_VERSION} \
  ./collector keygen --out /keys
Writes private.pem (mode 0600) and public.pem to /opt/navigara/keys, and prints the key fingerprint. Keep private.pem on this host — it is never shared.
3

Give the collector the private key

Mount the key and set COLLECTOR_PRIVATE_KEY_PATH in your docker-compose.yml, then restart:
services:
  collector:
    environment:
      COLLECTOR_PRIVATE_KEY_PATH: /etc/navigara/keys/private.pem
    volumes:
      - ./keys/private.pem:/etc/navigara/keys/private.pem:ro
4

Encrypt the token

echo -n 'ghp_yourtoken' | docker run --rm -i -v /opt/navigara/keys:/keys \
  europe-docker.pkg.dev/navigara-images/public/vision-collector:${NAVIGARA_VERSION} \
  ./collector encrypt-token --public-key /keys/public.pem
Copy the navigara-enc-v1:… blob it prints.
5

Paste the blob into the dashboard

Add the blob as the provider token under Settings → Integrations when connecting your Git account. Navigara stores it as-is and your next analysis run uses it automatically.
Already have a plaintext token saved? Migrate by minting a fresh PAT at your Git host, encrypting it, pasting the blob, then revoking the old token — a clean rotation with no downtime.
The collector can load several private keys at once (comma-separate paths in COLLECTOR_PRIVATE_KEY_PATH, or use COLLECTOR_PRIVATE_KEY_PEM for inline PEM) and selects the matching key per blob by fingerprint, so you can roll a new key before re-encrypting old tokens. If a blob can’t be decrypted — wrong key, tampering, or a paste error — the collector drops it and fails the task loudly; it is never forwarded to the Git host.

Installation

1. Prepare the host

# Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

# Install Docker Compose plugin (if not included)
sudo apt-get install docker-compose-plugin

# Verify
docker compose version

2. Generate a Collector API Token

In the Navigara dashboard, go to Settings → API Tokens and create a new API token. This token authenticates the collector with the Navigara backend. Copy it — you’ll need it in the next step.

3. Configure the deployment

Create the deployment directory:
mkdir -p /opt/navigara && cd /opt/navigara
Create docker-compose.yml:
services:
  collector:
    image: ${COLLECTOR_IMAGE:-europe-docker.pkg.dev/navigara-images/public/vision-collector:${NAVIGARA_VERSION}}
    container_name: navigara-collector
    init: true
    env_file:
      - .env
    environment:
      SERVER_ADDR: ${SERVER_ADDR:-app.navigara.com:443}
      COLLECTOR_TLS: "true"
      COLLECTOR_ID: ${COLLECTOR_ID:-collector-1}
      COLLECTOR_API_KEY: ${COLLECTOR_API_KEY}
      MAX_WORKERS: ${COLLECTOR_MAX_WORKERS:-10}
      LLM_PROVIDER: ${LLM_PROVIDER:-anthropic}
      LLM_MODEL: ${LLM_MODEL:-claude-sonnet-4-20250514}
      WORK_DIR: /tmp/git-analysis
    volumes:
      - collector-workdir:/tmp/git-analysis
      # Uncomment if using Vertex AI with a service account key file
      # - ./vertex-ai-key.json:/etc/navigara/vertex-ai-key.json:ro
    restart: unless-stopped

volumes:
  collector-workdir:
    name: navigara-collector-workdir
Create a .env file:
# Version
NAVIGARA_VERSION=0.11.8

# Collector image (override to use your own registry)
# COLLECTOR_IMAGE=your-registry.example.com/navigara/vision-collector:${NAVIGARA_VERSION}

# Collector authentication (generate in Navigara dashboard → Settings → API Tokens)
COLLECTOR_API_KEY=<your-collector-api-token>

# Collector identity (optional — defaults to container hostname if not set)
# COLLECTOR_ID=collector-1

# LLM Configuration — see "LLM Configuration" section below for all providers
LLM_PROVIDER=anthropic                 # anthropic (recommended) | openai | genai
LLM_MODEL=claude-sonnet-4-20250514     # Model name for your provider
LLM_API_KEY=<your-llm-api-key>        # API key for the LLM provider
LLM_API_URL=https://api.anthropic.com # Required for anthropic/openai. For Azure Foundry, internal gateways, or self-hosted, point this at your endpoint.

# OpenAI (uncomment if using openai provider)
# LLM_PROVIDER=openai
# LLM_MODEL=gpt-5.4
# LLM_API_KEY=<your-openai-api-key>

# Google Vertex AI (uncomment if using genai provider)
# LLM_PROVIDER=genai
# LLM_MODEL=gemini-2.5-flash
# GOOGLE_PROJECT=<your-gcp-project>
# GOOGLE_LOCATION=global
# GOOGLE_APPLICATION_CREDENTIALS=/etc/navigara/vertex-ai-key.json  # Required if host is not authenticated via gcloud

# Concurrency (default: 10 parallel workers)
# COLLECTOR_MAX_WORKERS=10

LLM Configuration

Navigara requires an LLM API endpoint for AI-powered commit analysis. Supported providers:
ProviderModelNotes
AnthropicClaude Sonnet / Claude HaikuRecommended — best quality/cost ratio
Google Vertex AIGemini 2.5 FlashGood cost/performance ratio
OpenAIGPT-5.4Widely available

4. Start the collector

docker compose up -d
Verify the collector is running and connected:
docker compose logs -f collector
You should see output indicating a successful connection:
connected to server at app.navigara.com:443
registered as collector-1 with 10 workers

5. Add repositories

Once the collector is running, add repositories through the Navigara dashboard:
  1. Go to Settings → Repositories → Add Repository
  2. Select your Git provider and authenticate (if using cloud-managed tokens or GitHub App)
  3. Select the repositories to analyze
  4. The collector will automatically begin processing

Configuration Reference

VariableDefaultDescription
SERVER_ADDRapp.navigara.com:443Navigara API gRPC address
COLLECTOR_TLStrueEnable TLS for gRPC connection
COLLECTOR_API_KEYAPI token generated in Settings → API Tokens (required)
COLLECTOR_IDhostnameUnique identifier for this collector (defaults to container hostname)
MAX_WORKERS10Number of concurrent commit analysis workers
LLM_PROVIDERanthropicLLM provider: anthropic, openai, or genai
LLM_MODELclaude-sonnet-4-20250514Model name
LLM_API_KEYAPI key for the LLM provider
LLM_API_URLLLM endpoint URL. Required for anthropic (e.g. https://api.anthropic.com) and openai. Also used for Azure Foundry, internal gateways, and self-hosted models.
GOOGLE_PROJECTGCP project ID (required for genai provider)
GOOGLE_LOCATIONglobalGCP location (for genai provider)
WORK_DIR/tmp/git-analysisDirectory for temporary Git clones
COLLECTOR_PRIVATE_KEY_PATHPath to a PEM private key for encrypted tokens. Comma-separate multiple files for rotation.
COLLECTOR_PRIVATE_KEY_PEMInline PEM private key(s) for encrypted tokens. Alternative to COLLECTOR_PRIVATE_KEY_PATH.

Running Multiple Collectors

You can run multiple collector instances for higher throughput or geographic distribution. Each collector must have a unique COLLECTOR_ID. The Navigara backend distributes work across connected collectors with affinity routing — it prefers sending work to a collector that already has a repository cached locally.
# docker-compose.yml with two collectors
services:
  collector-1:
    image: europe-docker.pkg.dev/navigara-images/public/vision-collector:${NAVIGARA_VERSION}
    container_name: navigara-collector-1
    init: true
    env_file: .env
    environment:
      COLLECTOR_ID: collector-1
      # ... other env vars
    restart: unless-stopped

  collector-2:
    image: europe-docker.pkg.dev/navigara-images/public/vision-collector:${NAVIGARA_VERSION}
    container_name: navigara-collector-2
    init: true
    env_file: .env
    environment:
      COLLECTOR_ID: collector-2
      # ... other env vars
    restart: unless-stopped

Network Requirements

The collector host must have outbound access to the following services:
ServicePurposeEndpoint
Navigara APIWork assignment and result streamingapp.navigara.com:443
Git providerRepository cloning and fetchinggithub.com, gitlab.com, or your self-hosted instance
LLM APIAI-powered commit analysisapi.anthropic.com, api.openai.com, or Vertex AI endpoints
No inbound ports need to be opened. The collector initiates all connections outbound.

Upgrades

cd /opt/navigara

# Update NAVIGARA_VERSION in .env, then:
docker compose pull
docker compose up -d
The collector is stateless — it can be stopped and restarted at any time without data loss. In-progress work is automatically reassigned by the Navigara backend.

Troubleshooting

IssueSolution
connection refusedVerify outbound access to app.navigara.com:443 from the host
authentication failedCheck that COLLECTOR_API_KEY matches the token configured in Navigara
LLM analysis failingVerify LLM_API_KEY and that the host can reach the LLM endpoint
Collector keeps reconnectingCheck logs for specific errors; ensure the gRPC stream is not being terminated by a proxy or firewall
Slow analysisIncrease MAX_WORKERS (ensure sufficient CPU/memory) or add a second collector
High disk usageThe collector caches Git clones in WORK_DIR; restart the container or clear the volume to reclaim space