Project: vgrep (Void Grep) - Zero-Knowledge Regex Engine
Status: Highly Experimental / Proof of Concept
1. Introduction
1.1 Purpose
The Purpose of this document is to define the functional and non-functional requirements for vgrep (Void Grep) vgrep is a Command-Line Interfaace (CLI) utility designed to perform Regular Expression (Regex) pattern matching over Fully Homomorphically Encrypted (FHE) datasets. It allows a client to search a remote server's logs without exposing the search query to the server, and without the server exposing the raw logs to the client.
1.2 Scope
vgrep replaces standard text-search utilities (like grep or awk) in environments where data privacy is paramount.
- The Client encrypts a Regex pattern into a multi-dimensional mathematical cipher.
- The Server evaluates this cipher against an encrypted dataset using a Finite State Machine (FSM) mapped to FHE boolean operations.
- The Output is an encrypted boolean/string match returned to the client for local decryption.
1.3 Definitions & Acronyms
- FHE (Fully Homomorphic Encryption): A cryptographic scheme allowing arbitrary computations on ciphertexts, generating an encrypted result that matches the result of operations performed on the plaintext.
- Zero-Knowledge Oracle: A server that executes logic on data without knowing the input, state, or output.
- AST (Abstract Syntax Tree): The structural representation of the parsed Regular Expression.
- Ciphertext Noise: The cryptographic "static" that grows with each FHE operation. If noise exceeds a threshold, decryption fails.
2. Overall Description
2.1 Purpose
vgrep is designed for POSIX-compliant backend systems.
- Client OS: Linux (Ubuntu/Debian prioritized).
- Shell Integration: Fully compatible with standard shells (Bash, Zsh) and easily bindable within .zshrc aliases.
- Session Management: Designed to run persistently in background jobs or detached tmux sessions, as FHE evaluation on large datasets may take significant compute time.
- Server OS: Linux-based cloud instances equipped with heavy CPU/RAM resources for lattice-math calculations.
System Features (Functional Requirements)
3.1 Local Keypair Generation
- Description: The system must generate a public evaluation key (for the server) and a private decryption key (safeguarded by the client).
- Constraint: The private key must never leave the local execution environment.
3.2 Client-Side Regex Parsing & Encryption
- Description: When the user inputs vgrep "^[0-9]{3}-DARK", the client binary must parse this Regex into an AST.
- Mechanism: Instead of encrypting the string as a whole, the client maps each character to a u8 ASCII value and encrypts each bit into an FHE cipher state (e.g., using TFHE-rs or Zama's Concrete library).
3.3 Server-Side FHE Evaluation
- Description: The Python/Rust server must ingest the $Encrypted\_Regex$ and stream it against the $Encrypted\_Logs$.
- Mechanism: The server executes a Homomorphic Finite State Machine. It translates logic gates (AND, OR, NOT) into polynomial additions and multiplications ($Enc(A) \otimes Enc(B)$).
- Bootstrapping: The server must periodically apply "bootstrapping" algorithms to reduce Ciphertext Noise during deep recursive regex matching.
3.4 Match Resolution & Decryption
Description: The server must yield an $Encrypted\_Boolean$ (Match Found: True/False) or an $Encrypted\_Line$ without ever knowing the result. The client then decrypts this locally using the private key and prints the plaintext match to stdout
4. Non-Functional Requirements
4.1 Security & Cryptography
- Post-Quantum Resistance: The underlying encryption must utilize Lattice-based cryptography (e.g., Learning With Errors problem), rendering it resistant to future quantum computing attacks.
- Data in Memory: The server must physically be unable to dump plaintext memory, as all RAM states during execution remain encrypted.
4.2 Performance Constraints
- Speed: FHE operations are computationally expensive. Users must be explicitly warned that vgrep will be exponentially slower than standard grep.
- Optimization: The server must parallelize homomorphic gate evaluations across multiple CPU cores to minimize latency.
4.3 Error Handling
- Silent Failures Forbidden: If the ciphertext noise ceiling is breached, the system must aggressively throw an exception rather than returning a corrupted false-negative match.
- Syntax Limits: Complex regex features that require infinite lookarounds or unbounded backtracking must be gracefully rejected by the client parser before encryption.
5. Interface Requirements
5.1 Command-Line Syntax
The CLI must mimic standard Unix conventions to reduce the learning curve.
- Standard Usage: vgrep [OPTIONS] "PATTERN" [REMOTE_ENDPOINT]
- Flags:
- -k, --keypath: Specify the path to the local .pem or .bin private key.
- -i, --ignore-case: Homomorphically mutate the AST to match upper and lower case bits.
- -v, --verbose: Print cryptographic metadata (noise levels, gate counts) to stderr.
- -k, --keypath: Specify the path to the local .pem or .bin private key.
- -i, --ignore-case: Homomorphically mutate the AST to match upper and lower case bits.
- -v, --verbose: Print cryptographic metadata (noise levels, gate counts) to stderr.