ThinkBe Hard Circuit AI chips represent a shift in machine learning inference through Atomic Hardware Operations. By eliminating software overhead and implementing pure circuit execution, we've achieved unprecedented performance levels that redefine what's possible in AI acceleration.
Our Reconfigurable Neural Units deliver 330x–655x the performance of leading GPU-based solutions—when matched for clock speed, power, and core count—offering zero GEMM overhead and true end-to-end circuit-based inference for all modern machine learning models. All that with up to a 95% energy reduction for smaller models.
Technology development: 2021–2025
One chip, infinite possibilities. Complete autonomy or GPU replacement.
RNUs support CNNs, Transformers, RNNs, and hybrid models through weight reconfiguration. Our Python to circuit compiler converts entire ML pipelines, including preprocessing, into pure circuits.
Complete autonomy without CPU, OS, or software stack. Or operate as GPU replacement for easier market entry. Same silicon, dual deployment modes for maximum flexibility.
From model to inference in three seamless steps
Upload your pre-trained or open-source PyTorch or TensorFlow ML models: LLMs, Vision, Audio, etc. Our platform supports industry-standard formats for seamless integration.
Our conversion software analyzes and prepares your model for circuit implementation as an accelerator or end to end synthesis. Complex preprocessing is intelligently offloaded unless supported natively by the chip.
Your model is transferred to the ThinkBe HC AI-Chip for always-on, pure circuit-based inference. Experience unprecedented performance with zero to minimal software overhead.
The NEXT GENERATION of dedicated Machine Learning hardware
Custom rebuilt and redesigned Ubuntu OS with built-in LLM support. Powered by an ARM processor with up to 512GB LPDDR5X On-Package Unified Memory. Available in 256GB or 512GB configurations.
High performance PCI Express ML inference card available in 48GB, 96GB, or 192GB HBM3E On-Package configurations. RNU core count and clock frequency are currently undecided. More information coming soon.
USB-C connected ML inference and built-in WiFi for remote server deployment, accessible from anywhere. Available in 64GB or 128GB LPDDR5X On-Package Unified Memory configurations.
Designed for enterprise, industrial and consumer hardware businesses requiring always on end to end ML inference on custom PCB. Perfect for tight spaces with built-in or external RAM & memory configurations
FPGA proof-of-concept today, ASIC tomorrow
Our live FPGA demonstration implements a 3-layer Weather-Net CNN end-to-end in ≈30 000 LUT6 (≈1 – 1.4 M estimated ASIC transistors after synthesis) on a Zynq-7020 device, sustaining 117 FPS @ 100 MHz at just 3.25 W total board power.
Scaling the same RTL (Register Transfer Level—the hardware description language representing our circuit design) to the 710 MHz Stratix IV GX board delivered 1.47 M FPS (<0.01 ms/frame). Projecting the identical netlist to a modern 14 nm ASIC indicates only ≈0.25 – 0.3 W core power with multi-GHz head-room.
A single 3 × 3 convolution over a 1024 × 768 frame needs 9 MACs per pixel – roughly 7.1 M operations per layer. Our 3-layer Weather-Net therefore costs about 21 M MACs per frame.
At 117 FPS the FPGA run pushes ≈2.5 G MAC/s (≈5 GFLOPs if you prefer the floating-point yard-stick). The same RTL at 1.47 M FPS on Stratix reaches ≈31 T MAC/s (≈62 TFLOPs). Remember: every MAC is hard-wired, so HC-Chip is ultimately judged by completed operations per second, not peak FLOPs.
| Platform | Logic Util. | Clock | Throughput | Fabric Power (est.)* |
|---|---|---|---|---|
| Zynq-7020 FPGA | 30 k / 53 k LUT (56%) | 100 MHz | 117 FPS | ~2 – 3 W |
| Stratix IV GX FPGA | 30 k / 230 k LC (13%) | 710 MHz | 1.47 M FPS | ~5 – 10 W |
| 14 nm ASIC (est.) | ~1 – 1.4 M Tr | 4.0 GHz | 6.0 M FPS | 0.25 – 0.3 W |
* Estimated dynamic + leakage power of the programmable fabric actually exercised; our demos instantiate no ARM cores, hard DSP blocks or I/Os beyond camera input, so these figures remove auxiliary silicon seen in board-level measurements.
FPGA LUTs contain configuration SRAM and routing that do not translate 1-to-1 into ASIC area. When mapped to standard-cell logic our 30 k-LUT netlist synthesises to roughly 1 – 1.4 million physical transistors and hosts 3 first-generation RNU cores. Allowing head-room for on-chip SRAM, NoC fabric and future arithmetic expansion, we budget ≈200 k transistors per next-gen RNU.
On a 30-billion-transistor 2 nm device, dedicating 35 % of the silicon to compute clusters would therefore enable ≈50,000 – 60,000 RNU cores—still two orders of magnitude above the scale required for current frontier models, and leaving the majority of the die for memory macros, I/O and power delivery.