Documentation Index
Fetch the complete documentation index at: https://docs.navigara.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The on-prem collector is a lightweight agent that runs within your network. It clones repositories, analyzes commits using an LLM, and streams only structured metadata (knowledge graphs, summaries, metrics) back to the Navigara cloud. Source code never leaves your infrastructure. The collector connects to the Navigara API over a persistent gRPC stream. It receives work assignments (which repos/commits to analyze), processes them locally, and sends back structured results. If the connection drops, it automatically reconnects with exponential backoff and replays any buffered results.Prerequisites
- Docker Engine 24+ and Docker Compose v2+ on a Linux host (Ubuntu 24.04 LTS or Debian 13+ recommended)
- Network access to your Git repositories (GitHub, GitLab, Bitbucket, or self-hosted)
- Outbound HTTPS to the Navigara API (
app.navigara.com:443) - LLM API endpoint — see LLM Configuration below
Hardware Requirements
| Scale | CPU | Memory | Disk |
|---|---|---|---|
| Small (up to 500K commits) | 8 vCPU | 16 GB | 200 GB SSD |
| Medium (up to 5M commits) | 16 vCPU | 32 GB | 500 GB SSD |
| Large (up to 50M commits) | 32 vCPU | 64 GB | 1 TB SSD |
Disk is used for temporary Git clones during analysis. The collector caches cloned repositories to speed up subsequent analyses — SSD storage is recommended.
Git Provider Authentication
The collector supports multiple authentication methods depending on your Git provider and security requirements. You can combine methods — for example, use the Navigara GitHub App for some repositories and a personal access token for others.Option 1: Navigara GitHub App (Recommended for GitHub)
Install the Navigara GitHub App on your GitHub organization. The Navigara cloud automatically generates short-lived installation tokens and sends them to the collector over the gRPC stream. No static credentials are stored on the collector. How it works:- Install the Navigara GitHub App on your GitHub organization (or specific repositories)
- Add the repositories in the Navigara dashboard
- The backend generates scoped installation tokens on demand and sends them to the collector
- Tokens are short-lived and automatically rotated
- No static tokens to manage or rotate
- Fine-grained repository access (select specific repos during app installation)
- Works with both GitHub.com and GitHub Enterprise
This is the recommended approach for GitHub users.
Option 2: Personal Access Token (All Providers)
Pass a personal access token (PAT) directly to the collector via environment variables. The collector uses this token for all Git operations (clone, fetch) and API calls (repo discovery, PR fetching, user lookup).- GitHub
- GitLab
- Bitbucket
Create a fine-grained personal access token with the following permissions:
- Repository access: Select the repositories you want to analyze
- Permissions:
Contents(read),Pull requests(read),Metadata(read)
.env file:- Simple setup — just one environment variable
- Works with any Git provider (GitHub, GitLab, Bitbucket, self-hosted)
- Full control over token scope and lifetime
Option 3: Cloud-Managed Tokens
Add Git provider tokens directly in the Navigara dashboard (Settings → Integrations). The Navigara backend securely stores the tokens and sends them to the collector on demand over the encrypted gRPC stream. How it works:- Add your Git provider token in the Navigara dashboard
- The backend stores the token encrypted at rest
- When the collector needs to access a repository, the backend sends the token over the gRPC stream
- The collector uses the token for that specific operation, then discards it
- Tokens are managed centrally in the dashboard
- No secrets stored on the collector host
- Supports all providers (GitHub, GitLab, Bitbucket)
Cloud-managed tokens and environment variable tokens can be used together. Environment variable tokens act as a fallback when no cloud-managed token matches a repository URL.
Installation
1. Prepare the host
2. Generate a Collector API Token
In the Navigara dashboard, go to Settings → API Tokens and create a new API token. This token authenticates the collector with the Navigara backend. Copy it — you’ll need it in the next step.3. Configure the deployment
Create the deployment directory:docker-compose.yml:
.env file:
LLM Configuration
Navigara requires an LLM API endpoint for AI-powered commit analysis. Supported providers:| Provider | Model | Notes |
|---|---|---|
| Anthropic | Claude Sonnet / Claude Haiku | Recommended — best quality/cost ratio |
| Google Vertex AI | Gemini 2.5 Flash | Good cost/performance ratio |
| OpenAI | GPT-5.4 | Widely available |
- Anthropic (Recommended)
- Google Vertex AI
- OpenAI
- Self-hosted (OpenAI-compatible)
4. Start the collector
5. Add repositories
Once the collector is running, add repositories through the Navigara dashboard:- Go to Settings → Repositories → Add Repository
- Select your Git provider and authenticate (if using cloud-managed tokens or GitHub App)
- Select the repositories to analyze
- The collector will automatically begin processing
Configuration Reference
| Variable | Default | Description |
|---|---|---|
SERVER_ADDR | app.navigara.com:443 | Navigara API gRPC address |
COLLECTOR_TLS | true | Enable TLS for gRPC connection |
COLLECTOR_API_KEY | — | API token generated in Settings → API Tokens (required) |
COLLECTOR_ID | hostname | Unique identifier for this collector (defaults to container hostname) |
MAX_WORKERS | 10 | Number of concurrent commit analysis workers |
LLM_PROVIDER | anthropic | LLM provider: anthropic, openai, or genai |
LLM_MODEL | claude-sonnet-4-20250514 | Model name |
LLM_API_KEY | — | API key for the LLM provider |
LLM_API_URL | — | Custom LLM endpoint (for self-hosted models) |
GOOGLE_PROJECT | — | GCP project ID (required for genai provider) |
GOOGLE_LOCATION | global | GCP location (for genai provider) |
GITHUB_TOKEN | — | GitHub PAT for repository access |
GITLAB_TOKEN | — | GitLab PAT for repository access |
WORK_DIR | /tmp/git-analysis | Directory for temporary Git clones |
Running Multiple Collectors
You can run multiple collector instances for higher throughput or geographic distribution. Each collector must have a uniqueCOLLECTOR_ID. The Navigara backend distributes work across connected collectors with affinity routing — it prefers sending work to a collector that already has a repository cached locally.
Network Requirements
The collector host must have outbound access to the following services:| Service | Purpose | Endpoint |
|---|---|---|
| Navigara API | Work assignment and result streaming | app.navigara.com:443 |
| Git provider | Repository cloning and fetching | github.com, gitlab.com, or your self-hosted instance |
| LLM API | AI-powered commit analysis | api.anthropic.com, api.openai.com, or Vertex AI endpoints |
Upgrades
Troubleshooting
| Issue | Solution |
|---|---|
connection refused | Verify outbound access to app.navigara.com:443 from the host |
authentication failed | Check that COLLECTOR_API_KEY matches the token configured in Navigara |
LLM analysis failing | Verify LLM_API_KEY and that the host can reach the LLM endpoint |
| Collector keeps reconnecting | Check logs for specific errors; ensure the gRPC stream is not being terminated by a proxy or firewall |
| Slow analysis | Increase MAX_WORKERS (ensure sufficient CPU/memory) or add a second collector |
| High disk usage | The collector caches Git clones in WORK_DIR; restart the container or clear the volume to reclaim space |

