The Future of AI Computing

ThinkBe Hard Circuit AI chips represent a shift in machine learning inference through Atomic Hardware Operations. By eliminating software overhead and implementing pure circuit execution, we've achieved unprecedented performance levels that redefine what's possible in AI acceleration.

Our Reconfigurable Neural Units deliver 330x–655x the performance of leading GPU-based solutions—when matched for clock speed, power, and core count—offering zero GEMM overhead and true end-to-end circuit-based inference for all modern machine learning models. All that with up to a 95% energy reduction for smaller models.

Technology development: 2021–2025

Universal Architecture

One chip, infinite possibilities. Complete autonomy or GPU replacement.

Versatile Model Support

RNUs support CNNs, Transformers, RNNs, and hybrid models through weight reconfiguration. Our Python to circuit compiler converts entire ML pipelines, including preprocessing, into pure circuits.

True Brain on Chip

Complete autonomy without CPU, OS, or software stack. Or operate as GPU replacement for easier market entry. Same silicon, dual deployment modes for maximum flexibility.

How It Works

From model to inference in three seamless steps

1

Select Your Model

Upload your pre-trained or open-source PyTorch or TensorFlow ML models: LLMs, Vision, Audio, etc. Our platform supports industry-standard formats for seamless integration.

2
⚙️

Automated Conversion

Our conversion software analyzes and prepares your model for circuit implementation as an accelerator or end to end synthesis. Complex preprocessing is intelligently offloaded unless supported natively by the chip.

3

Deploy & Accelerate

Your model is transferred to the ThinkBe HC AI-Chip for always-on, pure circuit-based inference. Experience unprecedented performance with zero to minimal software overhead.

Our Products

The NEXT GENERATION of dedicated Machine Learning hardware

Balanced Performance
From $7,499
ThinkBe Core-Zero - Color Option 1 ThinkBe Core-Zero - Color Option 2

ThinkBe Core-Zero

Custom rebuilt and redesigned Ubuntu OS with built-in LLM support. Powered by an ARM processor with up to 512GB LPDDR5X On-Package Unified Memory. Available in 256GB or 512GB configurations.

LLM support in DOS before bootup
🔗 External AI accelerator for PC or Laptop
🖥️ Standalone PC with low powered ARM core
🧠 End to end pure circuit inference
📊 TensorFlow and PyTorch support
🎨 Open source LLM and image generation
CONTACT
Maximum Efficiency
From $1,500
ThinkBe Deck

ThinkBe Deck

USB-C connected ML inference and built-in WiFi for remote server deployment, accessible from anywhere. Available in 64GB or 128GB LPDDR5X On-Package Unified Memory configurations.

🔌 Direct USB-C plug-in connection
📡 Built-in WiFi for remote access
🌐 Remote server capabilities
Optimized for power efficiency
💼 Brushed aluminum casing
🚀 Portable ML inference
CONTACT
Enterprise Ready
Custom HC Chips

Custom Hardcore

Designed for enterprise, industrial and consumer hardware businesses requiring always on end to end ML inference on custom PCB. Perfect for tight spaces with built-in or external RAM & memory configurations

🏭 Industrial hardware integration
🤖 Robotics applications
📱 Consumer hardware products
📐 Custom PCB form factors
⏱️ Always on inference capability
⚖️ Balanced performance and power
REQUEST DEMO

Benchmarks & Roadmap

FPGA proof-of-concept today, ASIC tomorrow

Our live FPGA demonstration implements a 3-layer Weather-Net CNN end-to-end in ≈30 000 LUT6 (≈1 – 1.4 M estimated ASIC transistors after synthesis) on a Zynq-7020 device, sustaining 117 FPS @ 100 MHz at just 3.25 W total board power.


Scaling the same RTL (Register Transfer Level—the hardware description language representing our circuit design) to the 710 MHz Stratix IV GX board delivered 1.47 M FPS (<0.01 ms/frame). Projecting the identical netlist to a modern 14 nm ASIC indicates only ≈0.25 – 0.3 W core power with multi-GHz head-room.

Throughput in Conventional Terms

A single 3 × 3 convolution over a 1024 × 768 frame needs 9 MACs per pixel – roughly 7.1 M operations per layer. Our 3-layer Weather-Net therefore costs about 21 M MACs per frame.


At 117 FPS the FPGA run pushes ≈2.5 G MAC/s (≈5 GFLOPs if you prefer the floating-point yard-stick). The same RTL at 1.47 M FPS on Stratix reaches ≈31 T MAC/s (≈62 TFLOPs). Remember: every MAC is hard-wired, so HC-Chip is ultimately judged by completed operations per second, not peak FLOPs.

PlatformLogic Util.ClockThroughputFabric Power (est.)*
Zynq-7020 FPGA30 k / 53 k LUT (56%)100 MHz117 FPS~2 – 3 W
Stratix IV GX FPGA30 k / 230 k LC (13%)710 MHz1.47 M FPS~5 – 10 W
14 nm ASIC (est.)~1 – 1.4 M Tr4.0 GHz6.0 M FPS0.25 – 0.3 W
Swipe ↔

* Estimated dynamic + leakage power of the programmable fabric actually exercised; our demos instantiate no ARM cores, hard DSP blocks or I/Os beyond camera input, so these figures remove auxiliary silicon seen in board-level measurements.

Stage-1 Roadmap

  • Full Python → Circuit compiler with 10× code-size reduction.
  • Native support for MLPs, CNNs and Transformer variants (Stage 1 goal).
  • First ASIC tape-out & live demo on a deeper, larger main stream model.
  • Expanding library for generative-image models (UNet, VAE, Diffusion) and advanced hybrids.
  • Dedicated complex pre-processing chiplets to eliminate residual CPU steps we can't currently handle.

Transistor Density

FPGA LUTs contain configuration SRAM and routing that do not translate 1-to-1 into ASIC area. When mapped to standard-cell logic our 30 k-LUT netlist synthesises to roughly 1 – 1.4 million physical transistors and hosts 3 first-generation RNU cores. Allowing head-room for on-chip SRAM, NoC fabric and future arithmetic expansion, we budget ≈200 k transistors per next-gen RNU.


On a 30-billion-transistor 2 nm device, dedicating 35 % of the silicon to compute clusters would therefore enable ≈50,000 – 60,000 RNU cores—still two orders of magnitude above the scale required for current frontier models, and leaving the majority of the die for memory macros, I/O and power delivery.