zero-regex-engine

Customer SRS

Project: vgrep (Void Grep) - Zero-Knowledge Regex Engine

Status: Highly Experimental / Proof of Concept

1. Introduction

1.1  Purpose

The Purpose of this document is to define the functional and non-functional requirements for vgrep (Void Grep) vgrep is a Command-Line Interfaace (CLI) utility designed to perform Regular Expression (Regex) pattern matching over Fully Homomorphically Encrypted (FHE) datasets. It allows a client to search a remote server's logs without exposing the search query to the server, and without the server exposing the raw logs to the client.

1.2 Scope

vgrep replaces standard text-search utilities (like grep or awk) in environments where data privacy is paramount.

  • The Client encrypts a Regex pattern into a multi-dimensional mathematical cipher.
  • The Server evaluates this cipher against an encrypted dataset using a Finite State Machine (FSM) mapped to FHE boolean operations.
  • The Output is an encrypted boolean/string match returned to the client for local decryption.

1.3 Definitions & Acronyms

  • FHE (Fully Homomorphic Encryption): A cryptographic scheme allowing arbitrary computations on ciphertexts, generating an encrypted result that matches the result of operations performed on the plaintext.
  • Zero-Knowledge Oracle: A server that executes logic on data without knowing the input, state, or output.
  • AST (Abstract Syntax Tree): The structural representation of the parsed Regular Expression.
  • Ciphertext Noise: The cryptographic "static" that grows with each FHE operation. If noise exceeds a threshold, decryption fails.

2. Overall Description

2.1  Purpose

vgrep is designed for POSIX-compliant backend systems.

  • Client OS: Linux (Ubuntu/Debian prioritized).
  • Shell Integration: Fully compatible with standard shells (Bash, Zsh) and easily bindable within .zshrc aliases.
  • Session Management: Designed to run persistently in background jobs or detached tmux sessions, as FHE evaluation on large datasets may take significant compute time.
  • Server OS: Linux-based cloud instances equipped with heavy CPU/RAM resources for lattice-math calculations.

System Features (Functional Requirements)

3.1 Local Keypair Generation

  • Description: The system must generate a public evaluation key (for the server) and a private decryption key (safeguarded by the client).
  • Constraint: The private key must never leave the local execution environment.

3.2 Client-Side Regex Parsing & Encryption

  • Description: When the user inputs vgrep "^[0-9]{3}-DARK", the client binary must parse this Regex into an AST.
  • Mechanism: Instead of encrypting the string as a whole, the client maps each character to a u8 ASCII value and encrypts each bit into an FHE cipher state (e.g., using TFHE-rs or Zama's Concrete library).

3.3 Server-Side FHE Evaluation

  • Description: The Python/Rust server must ingest the $Encrypted\_Regex$ and stream it against the $Encrypted\_Logs$.
  • Mechanism: The server executes a Homomorphic Finite State Machine. It translates logic gates (AND, OR, NOT) into polynomial additions and multiplications ($Enc(A) \otimes Enc(B)$).
  • Bootstrapping: The server must periodically apply "bootstrapping" algorithms to reduce Ciphertext Noise during deep recursive regex matching.

3.4 Match Resolution & Decryption

Description: The server must yield an $Encrypted\_Boolean$ (Match Found: True/False) or an $Encrypted\_Line$ without ever knowing the result. The client then decrypts this locally using the private key and prints the plaintext match to stdout

4. Non-Functional Requirements

4.1 Security & Cryptography

  • Post-Quantum Resistance: The underlying encryption must utilize Lattice-based cryptography (e.g., Learning With Errors problem), rendering it resistant to future quantum computing attacks.
  • Data in Memory: The server must physically be unable to dump plaintext memory, as all RAM states during execution remain encrypted.

4.2 Performance Constraints

  • Speed: FHE operations are computationally expensive. Users must be explicitly warned that vgrep will be exponentially slower than standard grep.
  • Optimization: The server must parallelize homomorphic gate evaluations across multiple CPU cores to minimize latency.

4.3 Error Handling

  • Silent Failures Forbidden: If the ciphertext noise ceiling is breached, the system must aggressively throw an exception rather than returning a corrupted false-negative match.
  • Syntax Limits: Complex regex features that require infinite lookarounds or unbounded backtracking must be gracefully rejected by the client parser before encryption.

5. Interface Requirements

5.1 Command-Line Syntax

The CLI must mimic standard Unix conventions to reduce the learning curve.

  • Standard Usage: vgrep [OPTIONS] "PATTERN" [REMOTE_ENDPOINT]
  • Flags:
    • -k, --keypath: Specify the path to the local .pem or .bin private key.
    • -i, --ignore-case: Homomorphically mutate the AST to match upper and lower case bits.
    • -v, --verbose: Print cryptographic metadata (noise levels, gate counts) to stderr.
    • -k, --keypath: Specify the path to the local .pem or .bin private key.
    • -i, --ignore-case: Homomorphically mutate the AST to match upper and lower case bits.
    • -v, --verbose: Print cryptographic metadata (noise levels, gate counts) to stderr.