Privacy-First Test Data Platform

Launch production-grade, safe and realistic test data without moving raw data out of your VPC

Ark discovers PII, masks sensitive fields, and provisions safe environments for engineering and security teams.

Platform

Discover -> Mask -> Subset -> Provision

Deployment

Control plane in SaaS, execution inside your network

Outcome

Safer test databases for CI, QA, and sandbox work

Supported source databases

Ark ships with first-class PostgreSQL and MySQL support today. We only list engines that are available in production.

Available now PostgreSQL postgresql
Available now MySQL mysql

Additional engines are on the roadmap — ask us if yours is a priority.

Direct Contrast

The Old Way vs. The Ark Way

Why engineering teams are replacing manual scripts and heavy database copies.

Legacy

The Legacy / Brittle Way

  • ×
    Full DB Clones

    Restoring terabytes of production dumps into staging. Expensive cloud costs, slow refreshes, and insecure exposure.

  • ×
    Brittle Masking Scripts

    Writing custom SQL scripts that break on schema changes, leaving unmasked data exposed.

  • ×
    SaaS Compliance Risks

    Uploading raw database rows to external SaaS runtimes for classification, violating privacy guidelines.

Ark

The Modern / Ark Way

  • Referential Subsetting

    Deploy small, consistent slices (e.g. 5% of users). Quick to load, easy to manage, zero schema breakage.

  • AI-Driven Masking

    Automatic PII discovery inside schemas, free text notes, and nested JSON payloads.

  • VPC-Native Execution

    All data transformations run 100% locally via the VPC Agent. SaaS control plane only manages orchestration.

Feature Catalog

Explore Every Feature In Detail

Ark is built to serve platform engineers, security operators, and compliance officers alike.

Referential Integrity Subsetting

Slice large databases while maintaining foreign key relationships so tests never crash on missing relations.

CLI-First Automation

Seamlessly trigger masked ephemeral test databases directly in your CI/CD pipeline or local developer shell.

Typed SDKs (Go & JS)

Programmatically automate test data provisioning directly inside internal platforms or custom developer portals.

Testcontainers & Docker

Spin up temporary Postgres, MySQL, or Mongo targets using local containers and populate them with safe subsets.

Developer Workflow

Fits the workflows your teams already have

Start with ark-cli for fast operator execution, then use Go or JavaScript SDKs when provisioning belongs inside platform code, test harnesses, or internal tools.

ARK CONSOLE
MODE INTERACTIVE LANG bash
FILE: TDM-WORKFLOW.SH LANG: BASH
1 # authenticate once
2 ark-cli login \
3 --api-url "https://control.ark.dev" \
4 --api-key "$ARK_API_KEY" \
5 --tenant-id "$ARK_TENANT_ID"
6
7 # inspect available source configs
8 ark-cli configs list
9
10 # provision a masked ephemeral database
11 ark-cli testenvs create \
12 --config "550e8400-e29b-41d4-a716-446655440000" \
13 --wait
14

Control Panel

See the control plane behind the workflow

Browse the real product surfaces used to manage configs, policies, jobs, and prepared environments.

Control Panel Snapshots
1 / 22
Control panel snapshot Snapshot 1

Security Architecture

Zero-trust data flow, simplified

Ark keeps orchestration centralized while execution stays close to the source data inside your environment.

A Live orchestration flow

Orchestration loop

B Trust boundary model
Ark SaaS

Metadata Isolation

Our SaaS never sees your actual row-level data. Only schema metadata and job statuses are tracked.

  • Auth & RBAC
  • Job scheduling
  • Compliance logs
  • Policy & metadata
stores ark-meta · audit · policies
Your VPC

Safe Agent Model

Runs as a stateless container in your VPC. No inbound ports required; no persistent data storage.

  • AI classification
  • Local masking
  • DB subsetting
  • Test environments
stores raw rows · masked dumps · sandboxes
Production access, on your terms

Connect the agent with a read-only database user, or point it at a production replica instead of the primary. Ark only needs SELECT-level access to build subsets — write access to production is never required.

Raw production rows never cross this line Governance metadata flows upward

Why Ark

The real cost of doing nothing

Every week without automated data governance is a week of mounting risk, wasted engineering hours, and regulatory exposure. Here's what changes when you deploy Ark.

Risk Reduction

Eliminate data breach exposure

The average data breach costs $4.88M globally — and 82% involve human error. Ark removes the human factor by automating PII discovery, classification, and masking before data ever reaches a test environment.

Productivity

Reclaim 50% of engineering time

Engineers and QA teams spend up to half their time locating, preparing, and cleaning test data. Ark provisions masked, referentially-intact datasets on-demand — in minutes, not days.

Compliance

Stay ahead of regulators

GDPR fines exceeded €7.1B cumulative. KVKK penalties for data security breaches reach ₺17M per violation. Ark maps every column to KVKK, GDPR, CCPA, and HIPAA automatically — giving auditors the exact legal reference they need.

Data Sovereignty

Zero data egress — ever

Unlike SaaS-only tools that upload your rows to external runtimes, Ark's agent runs inside your VPC. Classification, masking, and subsetting happen where your data already lives. Raw data never crosses the network boundary.

AI Privacy

AI-powered, vendor-independent

Ark's triple-engine classifier (schema + content + LLM) runs entirely on local models — Qwen, Gemma, Granite via Ollama. No data sent to OpenAI or any external API. Full accuracy, full privacy.

Auditability

Complete audit trail, out of the box

Every API key creation, rotation, revocation — every classification run, every masking job — is logged with who, when, and what. When auditors ask 'who accessed this data?', you answer in two clicks.

See the transformation

Test data prep time
2 weeks 15 minutes
PII exposure in staging
Unmasked Zero egress
Compliance readiness
Manual audit Auto-mapped

Ark doesn't just reduce risk — it turns test data into a competitive advantage. Ship faster with realistic data. Sleep better knowing production PII never leaks.

Deployment Ready

Turn test data into a safe, repeatable workflow

Start with the getting started guide, review the architecture, or walk through the product in more detail.

Enterprise-Grade Security & Compliance

SOC 2 Type II ReadyGDPR CompliantKVKK CompliantHIPAA CompliantISO 27001 Aligned

Ark operates 100% inside your secure VPC. Your actual row-level data never leaves your private network perimeter.

entr