TorsoHPC Technical Documentation

What is TorsoHPC?

TorsoHPC is a high-performance, secure remote linear algebra framework. It solves the problem of "Compute Gravity"—where large-scale scientific problems (millions of unknowns) require specialized hardware (HPC clusters, NVIDIA GPUs, Intel Xeon processors) that isn't typically available on a user's workstation or browser.

Simplicity & Speed: TorsoHPC is designed to be fast and minimal. By using a custom C++ core and zero-overhead binary streaming, it delivers near-native performance over the network.

AI & Heavy Workloads: Beyond standard linear algebra, TorsoHPC's optimized BLAS kernels (SGEMM/SGEMV) and CUDA acceleration make it ideal for heavy AI inference workloads, where massive matrix multiplications must be offloaded to dedicated GPU nodes without the complexity of a full ML framework.

System Architecture

TorsoHPC uses a decoupled architecture where clients define the problem and the remote server provides the "muscle".

[ WEB BROWSER ] <----(V3 Streaming)----> [ FLASK GATEWAY ]
(Zero-RAM UI)                            (Session Mgr)
                                              |
                                              | (Secure Blob)
[ PYTHON CLI ]  <----(V2 Framed)---->    [ COMPUTE SERVER ]
(Data Science)                           (Remote Cluster)
                                         /      |       \
[ C++ CLIENT ]  <----(V2 Framed)----> [INTEL] [NVIDIA] [LAPACK]
(Automation)                           (MKL)   (CUDA)   (DENSE)

Under One Hood

Scientific computing often requires juggling different libraries. TorsoHPC unifies these under a single Secure Blob Protocol. You no longer need to write custom CUDA or LAPACK code—just one call away from:

MKL PARDISO: Direct sparse solver for extreme precision.
FGMRES: Iterative Krylov subspace methods for massive systems.
CUDA / AMGCL: GPU-accelerated algebraic multigrid solvers.
Sparse Eigen: Eigensolvers for structural and vibration analysis.

Multi-Tier Storage

To balance performance and scalability, TorsoHPC supports three distinct storage tiers. Users can toggle these tiers dynamically via the UI or CLI to optimize for microsecond latency or massive horizontal scale.

LOCAL (Default): Standard Load-on-Demand. Blobs are stored on the server's local NVMe drive and loaded/decrypted only when needed. Ideal for one-off computations.
NFS (Shared Filesystem): Optimized for multi-VM cloud deployments. Blobs are stored on a shared network mount (Google Filestore, AWS EFS). This allows infinite horizontal scaling as any compute node can access any user's data instantly.
VRAM (Resident Memory): The LLM-style "Cheat Mode." Pins the decrypted matrix into the server's RAM (or GPU VRAM). Subsequent operations skip disk I/O and decryption entirely, running in microseconds.

Note: If VRAM is full, TorsoHPC will gracefully degrade to LOCAL/NFS loading without failing the request.

Browser Interface

The TorsoHPC Web UI provides a zero-RAM streaming pipeline for managing massive datasets directly from your browser.

Precision Selection: Use the dropdown next to the + UPLOAD button to select FP64, FP32, or FP16 before uploading. Precision is fixed at birth to ensure CAS consistency.
Storage Toggling: Use the Storage column in the Workspace table to switch tiers. Set a matrix to VRAM for high-frequency iterative solves.
Hardware Fallback: If you request a GPU solver (e.g., KOKKOS) on a server without CUDA, TorsoHPC will automatically trigger an FP32 Backdoor fallback on the CPU, emitting a warning in the server logs while ensuring the compute finishes.
Workspace Sync: Click SYNC WORKSPACE to discover blobs already present on the server. TorsoHPC's CAS ensures you never upload the same data twice.

Intelligent Caching (CAS)

TorsoHPC implements Content-Addressable Storage (CAS) to solve the bottleneck of large-scale data transfer. Instead of identifying files by name or random UUIDs, TorsoHPC identifies data by its unique Intrinsic Mathematical Signature.

Key Benefits:

Instant File Reuse: If you or any other client has already uploaded a specific matrix, the server recognizes it instantly. Subsequent "uploads" take 0 milliseconds.
Zero-Copy Scalability: Workspaces are automatically synchronized. When you refresh the page or connect from a different device, all server-side blobs are discovered and registered automatically.
Reduced Deployment Overhead: Ideal for CI/CD pipelines and automated testing. Testing 10 different solvers on the same 1GB dataset requires only 1GB of total network transfer, not 10GB.

Deployment

Spinning up the Server

The Compute Server can run on any Linux node with Intel MKL or CUDA. Use the unified runner:

# Start the compute server
./torsohpc.sh server

Environment Configuration

What is `paths.env`?

The paths.env file is the Single Source of Truth for your network and build configuration. It tells all clients (Web, Python, C++) where the Compute Server is located.

# Example paths.env
TORSO_SERVER_IP=192.168.1.119
TORSO_SERVER_PORT=9000

Note: When you "source" this file or set these variables in your shell, TorsoHPC clients will automatically prioritize them over the default 127.0.0.1 settings.

Unified CLI (Built-in Tool)

Use the provided runner to solve problems immediately from the terminal. The CLI supports multiple precisions via the --dtype flag and integrates with TorsoHPC's JWT identity system.

Authentication & Identity

The Python client allows you to log in directly from the terminal, saving its session token locally in the owner-only .torso_auth file. Browser and TorsoCAE requests handle authentication automatically.

# Create a new local account
./torsohpc.sh pyexample signup "user@domain.com" "secure_password" --name "Alice"

# Log into an existing account (saves JWT locally)
./torsohpc.sh pyexample login "user@domain.com" "secure_password"

# Manually provide a token to the C++ or Python client
./torsohpc.sh pyexample compute solve A.mtx b.mtx --token "eyJhb..."
./torsohpc.sh example solve A.mtx b.mtx --token "eyJhb..."

Executing Operations

# Standard sparse solve
./torsohpc.sh example solve A.mtx b.mtx PARDISO

# FP16 GEMV with Kokkos pinned to VRAM
./torsohpc.sh example gemv --dtype=FP16 --storage=VRAM large_A.bin vector_x.bin KOKKOS

# Option B: JSON Configuration
./torsohpc.sh example --json problem.json

Native C++ API

Integrate TorsoHPC directly into your own C++ application.

#include "torso_client.hpp"

// Initialize client
torso::RemoteTorsoClient client("192.168.1.119", 9000);

// Pack and Solve
std::vector> buffers = { ... };
auto [meta, results] = client.solve(problem_json, buffers);

Python Usage

Leverage pytorso for data-science workflows.

from pytorso import RemoteTorsoClient
client = RemoteTorsoClient("192.168.1.119", 9000)

# Define problem
problem = {
    "operation": "solve",
    "solver": "PARDISO",
    "matrix_A": {"path": "large_A.mtx", "weight": "Sparse"},
    "vector_B": {"path": "vector_b.mtx", "weight": "Dense"}
}

# One-call solution
metadata, results = client.solve(json.dumps(problem), [])

TorsoHPC Documentation