tsdb Whitepaper

tsdb is a command-line database runner for DOTSV (.dov) files. It accepts a target database file and a plain-text action file, then executes the requested operations with strict conflict detection and atomic writes.

Version: 0.1 Draft Binary: tsdb Usage: tsdb <target.dov> <action.txt>


1. Overview

tsdb is a command-line database runner for DOTSV (.dov) files. It accepts a target database file and a plain-text action file, then executes the requested operations.

Design principles:

  • Same parser everywhere — the action file format is byte-identical to the DOTSV pending section. No new grammar, no new tokenizer.
  • Stream processing — action files are read line-by-line, never fully loaded into memory.
  • Fail-strict by default — conflicting operations (duplicate insert, missing delete target) produce errors, not silent data loss.

2. Invocation

tsdb target.dov action.txt
Argument Description
target.dov The DOTSV database file to operate on
action.txt Plain-text file containing operations to apply

tsdb reads target.dov via mmap, streams action.txt line-by-line, applies each operation, and writes the result back to target.dov.


3. Action File Format

An action file is a UTF-8 text file. Each line is one operation. The format is identical to the DOTSV pending section.

Example

# Add two records
+NGk26cHcv001   name=Alice  city=Tokyo  age=30
+NGk26cHdn002   name=Bob    city=Osaka

# Update Alice's city and age
~NGk26cHcv001   city=Kyoto  age=31

# Remove Bob
-NGk26cHdn002

# Upsert Carol (insert if missing, replace if exists)
!EGk26cICK001   name=Carol  city=London

Lines starting with # are comments. Blank lines are ignored.


4. Opcodes

Four single-byte prefixes define all operations:

Prefix Name Behavior On Conflict
+ Append Insert a new record Error if UUID exists
- Delete Remove a record by UUID Error if UUID missing
~ Patch Update specific KV pairs in a record Error if UUID missing
! Upsert Insert if absent, full replace if present Never errors

4.1 Append (+)

Inserts a new record. The full set of KV pairs must be provided. If the UUID already exists, tsdb reports an error and aborts.

4.2 Delete (-)

Removes the record with the given UUID. No payload beyond the UUID. If the UUID does not exist, tsdb reports an error.

4.3 Patch (~)

Modifies specific key-value pairs in an existing record. Only the changed pairs are listed. Existing pairs not mentioned are preserved unchanged.

Rules: - To update a value: include the key with the new value. - To add a new key: include the key with its value. - To delete a key: include the key with tombstone value \x00 (the null byte).

If the UUID does not exist, tsdb reports an error.

4.4 Upsert (!)

If the UUID exists, the entire record is replaced. If the UUID does not exist, the record is inserted. This operation never fails due to presence/absence conflicts.


5. Parsing

The action file parser is a single function — the same one used for the DOTSV pending section:

enum Action<'a> {
    Append(&'a str, Vec<(&'a str, &'a str)>),
    Delete(&'a str),
    Patch(&'a str, Vec<(&'a str, &'a str)>),
    Upsert(&'a str, Vec<(&'a str, &'a str)>),
    Comment,
    Blank,
}

fn parse_action(line: &str) -> Action<'_> {
    if line.is_empty() { return Action::Blank; }
    match line.as_bytes()[0] {
        b'#' => Action::Comment,
        b'+' => {
            let mut fields = line[1..].split('\t');
            let uuid = fields.next().unwrap();
            Action::Append(uuid, parse_kv(fields))
        }
        b'-' => Action::Delete(&line[1..].trim_end()),
        b'~' => {
            let mut fields = line[1..].split('\t');
            let uuid = fields.next().unwrap();
            Action::Patch(uuid, parse_kv(fields))
        }
        b'!' => {
            let mut fields = line[1..].split('\t');
            let uuid = fields.next().unwrap();
            Action::Upsert(uuid, parse_kv(fields))
        }
        _ => Action::Blank,
    }
}

fn parse_kv<'a>(fields: impl Iterator<Item = &'a str>) -> Vec<(&'a str, &'a str)> {
    fields.filter_map(|pair| pair.split_once('=')).collect()
}

One byte dispatch, then the same split('\t') path as record parsing. No tokenizer, no lookahead, no state machine.


6. Execution Model

6.1 Processing Pipeline

action.txt ──► line-by-line streaming
               │
               ▼
               parse opcode (1 byte check)
               + split KV (memchr-accelerated)
               │
               ▼
target.dov ◄── apply op to .dov file (mmap)

6.2 Operation Strategies

Operation Strategy
Append Binary search for insert position — write to pending section
Delete Binary search — mark in pending section
Patch Binary search — in-place overwrite if fits, else pending patch
Upsert Binary search — overwrite or append depending on existence

6.3 Compaction

After processing all actions, tsdb checks whether the pending section exceeds a configurable threshold (default: 100 lines). If so, it performs a compaction pass:

  1. Read sorted section sequentially.
  2. Merge pending operations in UUID order.
  3. Write the new sorted section.
  4. Clear the pending section.

This is a single O(n) sequential pass over the file.


7. Error Handling

tsdb operates in strict mode by default:

Condition Behavior
+ with existing UUID Error, abort
- with missing UUID Error, abort
~ with missing UUID Error, abort
! with any UUID Always succeeds
Malformed line in action file Error, abort
Invalid UUID (not 12-char base62-Gu) Error, abort

On error, tsdb reports the line number in the action file and the offending content. The target .dov file is not modified until all actions are validated (atomic write via temp file + rename).


8. Concurrency and Queue Management

Multiple tsdb instances can target the same .dov file simultaneously. Coordination uses a lock file that acts as both a kernel-level lock and a human-readable queue manifest.

8.1 Lock File

The lock file uses flock() for atomic metadata access. The lock is held only for microseconds — just long enough to read or update the manifest.

Why not lock the .dov directly: The atomic write strategy does temp file to rename, which replaces the file descriptor. A lock on the original fd would be lost. The .lock file is stable.

8.2 Queue Manifest Format

Each line in the lock file represents one queued tsdb instance:

<status>    <process_id>    <uuid1>,<uuid2>,... <timestamp>\n
Field Spec
Status EXEC (currently running) or WAIT (queued)
Process ID 16 lowercase hex chars, randomly generated at startup
UUID list Comma-separated target UUIDs extracted from action file
Timestamp Unix epoch seconds, refreshed periodically by EXEC

Example with three instances:

EXEC    a1b2c3d4e5f6a7b8    NGk26cHcv001,NGk26cHdn002,EGk26cICK001  1711700000
WAIT    d9e0f1a2b3c4d5e6    NGk26dAa0001,EGk26dBb0001   1711700005
WAIT    f7a8b9c0d1e2f3a4    NGk26eC10001,NGk26eC20001   1711700008

8.3 Conflict Detection

Before joining the queue, tsdb pre-scans the action file to collect all target UUIDs into a set. It then checks for set intersection against every existing entry in the lock file.

The rule: your UUID set intersected with any queued UUID set must be empty, or a conflict is reported.

Opcodes are irrelevant. Any queued operation ahead of you may alter the record's state before your turn arrives.

Instance A Instance B Same UUID? Result
+ insert + insert yes B rejected
+ insert ~ patch yes B rejected
~ patch ~ patch yes B rejected
+ insert + insert no Both queue fine

On conflict, tsdb exits immediately without joining the queue.

8.4 Two Layers of Validation

Phase 1 — Queue level (before execution):
  Pre-scan action.txt → collect UUID set
  flock(.lock) → read manifest → check UUID set intersection
  ├── overlap found → error, exit, do not queue
  └── no overlap   → append WAIT line, release lock

Phase 2 — Data level (during execution):
  mmap target.dov → apply each opcode
  + on existing UUID  → error
  - on missing UUID   → error
  ~ on missing UUID   → error
  ! on any UUID       → always ok

8.5 Execution Flow

 1. Generate random 16-hex process ID
 2. Pre-scan action.txt → collect all target UUIDs
 3. flock(LOCK_EX) on .lock              (microseconds)
 4. Read .lock manifest
 5. Conflict check (UUID set intersection)
    ├── overlap → release lock, report error, exit
    └── clean  → append WAIT line, release lock
 6. Poll loop: promote self from WAIT to EXEC when clear
 7. Execute: mmap .dov, apply all actions
 8. Write target.dov.tmp → rename to target.dov
 9. flock(LOCK_EX) briefly → remove own entry from .lock
10. Release lock

8.6 Crash Recovery

flock() is released automatically by the kernel when a process exits. The dead process's line persists in the manifest but is evicted when any other instance finds a timestamp more than 30 seconds old.

fn is_stale(entry: &QueueEntry, threshold_secs: u64) -> bool {
    let now = SystemTime::now()
        .duration_since(UNIX_EPOCH).unwrap().as_secs();
    now - entry.timestamp > threshold_secs
}

8.7 Rust Implementation Sketch

use fs2::FileExt;
use std::collections::HashSet;
use std::fs::OpenOptions;
use std::path::Path;

fn enqueue(
    lock_path: &Path,
    my_id: &str,
    my_uuids: &HashSet<String>,
) -> Result<(), ConflictError> {
    let lock_file = OpenOptions::new()
        .create(true).read(true).write(true)
        .open(lock_path)?;

    lock_file.lock_exclusive()?;

    let entries = read_manifest(lock_path)?;
    for entry in &entries {
        let overlap: Vec<_> = entry.uuids.intersection(my_uuids).cloned().collect();
        if !overlap.is_empty() {
            lock_file.unlock()?;
            return Err(ConflictError {
                with_process: entry.process_id.clone(),
                with_status: entry.status.clone(),
                overlapping_uuids: overlap,
            });
        }
    }

    append_to_manifest(lock_path, "WAIT", my_id, my_uuids)?;
    lock_file.unlock()?;
    Ok(())
}

9. Escaping

The action file uses the same escaping rules as DOTSV:

Byte Escaped Form Reason
\n \x0A Record/line delimiter
\t \x09 Field delimiter
= \x3D Key-value separator
\ \\ Escape character itself

10. Workflow Examples

10.1 Bulk Import

# Generate action file from CSV
awk -F',' '{printf "+%s\tname=%s\tcity=%s\n", $1, $2, $3}' data.csv > import.txt
tsdb mydata.dov import.txt

10.2 Targeted Update

echo '~NGk26cHcv001 status=active' > action.txt
tsdb mydata.dov action.txt

10.3 Batch Delete

cat > cleanup.txt << 'EOF'
-NGk26cHcv001
-NGk26cHdn002
-EGk26cICK001
EOF
tsdb mydata.dov cleanup.txt

10.4 Git-Friendly Workflow

tsdb users.dov changes.txt
tsdb users.dov --compact
git add users.dov
git commit -m "update user records"

10.5 Concurrent Access

# Terminal 1 — modifies records A, B
tsdb data.dov batch1.txt &

# Terminal 2 — modifies records C, D (no UUID overlap — queues cleanly)
tsdb data.dov batch2.txt &

# Terminal 3 — modifies record A (overlaps with T1 — rejected immediately)
tsdb data.dov batch3.txt
# error: conflict with process a1b2c3d4e5f6a7b8
#   overlapping UUIDs: NGk26cHcv001
#   status: EXEC (running)
#   action: aborted, not queued

11. Design Rationale

Goal Mechanism
Fast parsing 1-byte opcode dispatch + memchr-accelerated tab split
Zero new syntax Action format = DOTSV pending section; one parser for everything
Stream processing Line-by-line read; constant memory regardless of action file size
Safe by default Strict mode catches conflicts; atomic write prevents corruption
Concurrent-safe UUID-level conflict detection; flock-based queue; no global lock
Human-authorable Plain text, writable by hand, by echo, by awk, by any tool
Composable Action files can be concatenated, diffed, version-controlled

12. Dependencies

Crate Purpose
memmap2 Memory-mapped file I/O
memchr SIMD-accelerated byte search
fs2 Cross-platform flock() wrapper

Minimal dependency surface. No serde, no async runtime, no allocation-heavy parsing frameworks.