for 15 years.

We deliver robust optimization and take the guesswork out of Cost of Ownership.

We have turnkey solutions that beat your competition and we have staying power.

Stillwater Supercomputing, Inc. started with a vision to deliver sophisticated high performance computing solutions with simple human values at heart.We make the extraordinary not only achievable but sustainable.

We grow and unlock potential while respecting the planet we inhabit and the community we are proud to be a part of.

During our time here we delivered High Performance Computing and Cloud Infrastructure solutions to organizations large and small, helped under-served communities and collaborated with universities across three continents.

We met friends, like-minded innovators, supporters and allies.

Wonderful people from all walks of life united by the goal to go faster, further, together.

Still water runs deep.

Together, we celebrate 15 years of innovation and collaboration.

Together, we democratize high performance computing.

Together, we Accelerate Innovation™.

Achieve near zero-latency

of real-time, embedded and control systems.

Take the guesswork out of Total Cost of Ownership. Run cloud analytics and security services at lower cost.

Reduce Cost per Operation of transactional systems in traditional banking and on the blockchain.

Best-in-class energy savings so you can innovate and be mindful of our planet at the same time.

Machine Learning, Industry 4.0, Telecommunication, Finance,

Cyber Security and Defense markets.

If your use case isn't listed here, please contact us for a free consultation.

We've been trusted by

turnkey solutions from edge to cloud

with our next generation platform. We did the all the hard work so you don't have to - cut down latency and boost productivity with just a few lines of code.

The Open-KL run-time is a modern virtual machine for knowledge processing applications. It encapsulates the data structures and operators that cover the math and computer science of intelligent systems operators, and presents high level abstractions for use by the application. The run-time is aware of the underlying hardware, and will dispatch a low level algorithm to execute the operator that is matched to the characteristics of the machine.

A key problem created by hardware acceleration is a new asymmetry between computational resources. In the standard stored program machine model, processing resources are instruction driven and access a shared, flat memory space. Coordination between computational resources is managed by instruction stream barriers, and pipes. Since there is symmetry among the processing elements each thread of execution assumes the same model of computation. However, for asymmetric hardware accelerated platforms, threads of execution have a very specific context, with very different performance and power characteristics. This creates the problem of coordination and collaboration between different computational resources, typically the central processor and the hardware accelerator.

This coordination and collaboration tries to minimize power consumption, and computational time. This minimization problem is the same for all accelerators, and thus a common run-time that manages this minimization is advantageous.

High Performance

Knowledge processing operators, such as machine learning and sensor fusion, are complex algorithms.

OpenKL provides finely tuned parallel implementations that work with CPU, GPU, KPU, and in the elastic cloud.

Elastic Cloud Enabled

When applying knowledge processing techniques on Big Data, you'll want to leverage scalable cloud platforms.

OpenKL provides implementations that setup and tear down clusters, in the cloud if needed.

High-touch Support

Expert assistance when you need it.

Application-tailored precision and dynamic range for Deep Learning, DSP, HPC, and IoT workloads.

Deep Learning applications have highlighted the inefficiencies of the IEEE floating point format. Both Google and Microsoft have jettisoned IEEE floating point for their AI cloud services to gain two orders of magnitude better performance over their competitors. Similarly, AI applications for mobile and embedded applications have moved away from IEEE floating point to optimize performance per Watt.

However, Deep Learning applications are hardly the only applications that expose the limitations of IEEE floating point. Cloud scale, IoT, embedded, control, and HPC applications are also limited by the inefficiencies of the format. As NVIDIA, Google, and Microsoft have demonstrated, a simple change to a new number system can improve scale and cost of these applications by orders of magnitude, and create completely new application and service domains.

When performance and/or power efficiency are differentiating attributes for an application, the complexity of IEEE floats simply can't compete with number systems that are tailored to the needs of the application. Posits are a tapered floating point format, designed to replace IEEE floating point and provide a more robust computational arithmetic for the reals. The Stillwater Universal Number library provides application developers a ready-to-use arithmetic library to incorporate this new number system in their applications. To get started, simply clone the library and follow the README.

The core limitations of IEEE floating point are caused by two key problems of the format:

- inefficient representation of the reals
- inability to reproduce results across different concurrency environments

The complete list of issues that are holding back IEEE floating point formats:

**Wasted Bit Patterns**- 32-bit IEEE floating point has around eight million ways to represent NaN (Not-A-Number), while 64-bit floating point has two quadrillion. A NaN is an exception value to represent undefined or invalid results, such as the result of a division by zero, so there is absolutely no reason for allocating that many encodings to NaN.**Mathematically Incorrect**- The format specifies two zeroes - a negative and positive zero - which behave differently.
- Loss of associative and distributive arithmetic laws due to rounding after each operation.
- This loss of associative and distributive arithmetic behavior is problematic for reproducibility. This problem is particularly acute for embedded and control applications that need to behave predictably, for example, control systems in autonomous vehicles.
**Overflows to ± inf and underflows to 0**- Overflowing to ± inf increases the relative error by an infinite factor, while underflowing to 0 loses sign information.**Unused dynamic range**- The dynamic range of double precision floats is a whopping 2^2047, whereas most numerical software is architected to operate around 1.0.**Complicated Circuitry**- Denormalized floating point numbers have a hidden bit of 0 instead of 1. This creates a host of special handling requirements that complicate compliant hardware implementations.**No Gradual Overflow and Fixed Accuracy**- If accuracy is defined as the number of significand bits, IEEE floating point have fixed accuracy for all numbers except denormalized numbers because the number of signficand digits is fixed. Denormalized numbers are characterized by a decreased number of significand digits when the value approaches zero as a result of having a zero hidden bit. Denormalized numbers fill the underflow gap (i.e. the gap between zero and the least non-zero values). The counterpart for gradual underflow is gradual overflow which does not exist in IEEE floating points.

In contrast, the posit number system is designed to be efficient, symmetric, and mathematically correct in any concurrency environment. Avoiding any special cases, such as denormalized numbers, yields a more efficient execution pipeline and higher performance per Watt.

**Economical**- No bit patterns are redundant. There is one representation for infinity denoted as ± inf and zero. All other bit patterns are valid distinct non-zero real numbers. ± inf serves as a replacement for NaN.**Mathematical Elegant**- There is only one representation for zero, and the encoding is symmetric around 1.0. Associative and distributive laws are supported through deferred rounding via the quire, enabling reproducible linear algebra algorithms in any concurrency environment.**Tapered Accuracy**- Tapered accuracy is when values with small exponent have more digits of accuracy and values with large exponents have less digits of accuracy. This concept was first introduced by Morris (1971) in his paper ”Tapered Floating Point: A New Floating-Point Representation”.**Parameterized precision and dynamic range**-- posits are defined by a size,*nbits*, and the number of exponent bits,*es*. This enables system designers the freedom to pick the right precision and dynamic range required for the application. For example, for AI applications we may pick 5 or 6 bit posits without any exponent bits to improve performance. For embedded DSP applications, such as 5G base stations, we may select a 16 bit posit with one exponent bit to improve performance per Watt.**Simpler Circuitry**- There are only two special cases, Not a Real and Zero. No denormalized numbers, overflow, or underflow.

This library is a bit-level arithmetic reference implementation of the evolving Universal Number Type III (posit and valid) standard. The library provides a faithful posit arithmetic layer for any C/C++/Python environment.

As a reference library, there is extensive test infrastructure to validate the arithmetic, and there is a host of utilities to become familiar with the internal workings of posits and valids.

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

Template C++ Library

Header-only C++ template library makes it trivial to integrate into your computational software. Many software packages have gone before you, Eigen, MTL4, G+SMO, ODE, so you are in good company.

Accurate

The library models the arithmetic at the bit-level and is the validation vehicle for our posit-enabled tensor processor hardware.

Fully Parameterized

The library provides a complete set of posit configurations, ranging from the very small, **posit<2,0>**, to the very large, **posit<256,5>**.

Subscribe to our newsletter to be the first to know about new articles, blog posts, white papers and webinars.

We never spam or share your data with third parties.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.