nCompass Tech

Maker

📌

Hello Product Hunt! We're excited to launch the nCompass AI inference platform for reliable, scalable and fast inference of any HuggingFace model available! We're looking forward to have you build your AI apps on top of our system.

We have three connected products that we're launching today:

Public Inference API with no enforced rate limits
Perfect for startups and developers prototyping AI features or evaluating open source alternatives to GPT / Claude

Managed Inference Platform
Perfect for SMBs and Enterprises who want to deploy their own AI models reliably with speed, scalability and observability built in.

White-labelled AI Inference Stack
Perfect for Enterprises with strict compliance needs and datacenters looking to setup their own AI-as-a-Service offerings.

Below is an overview of each of these:

=====

Public Inference API with no enforced rate limits
Currently we have two state-of-the-art multimodal models set up; Gemma 3 27B and Llama 4 Maverick. The interface is fully OpenAI compatible and self-serve. This means all you have to do is change your API key, model name and base URL in your existing stack and you'll be able to capitalize on open source models that are potentially up to 18x cheaper and 2x faster than their closed source equivalents. View your usage and performance metrics live via our dashboard. Every sign-up gets some free credits to try out the system, so give it a go right now by signing up here (https://app.ncompass.tech).

=====

The next two products are currently not self-serve, but they're ready to go, you just have speak to us as there are some manual steps involved in the on-boarding process.

=====

Managed Inference Platform
Pick any HuggingFace model you want and we deploy it on dedicated hardware that we manage. We deal with picking the best inference engine and hardware for the deployment and we give you separated dev and prod clusters. Deploying the model to the dev cluster is one click and promoting to production once you're ready is just another. It really is that simple.

The best part about all of this is that we package each model you want to deploy with our custom optimized inference engine. If there's a model that we currently don't have optimizations for, we'll build those GPU kernels to ensure that you can run as many requests as possible on the minimum number of GPUs.

Why? Because we want AI to be cost-effective, we're not looking to sell GPUs, we want to bring you fast and scalable inference.

=====

White-labelled AI Inference Stack
This is basically all of the previous offering, but deployed on your infrastructure with your branding with extra admin console views so you can manage and monitor your users.

=====

So why did we start this? Well all of us in the nCompass team are experts in hardware acceleration and wanted to apply our expertise to improve AI inference performance. We believe that AI model use is going to be ubiquitous in the future and there's still plenty of performance to eke out of GPUs in order to make using AI models at scale and in production both reliable and cost-effective. We don't believe what you need is 72 GPU clusters, so we're ensuring you can meet your AI inference requirements on existing infrastructure.

We'd love for you to sign up and try out our API. Alternatively if you're looking for a dedicated, no queue production AI inference deployment, please do reach out by booking a call or just emailing us at hello@ncompass.tech.

If you’ve tried it out, we'd love to hear your feedback on what did or didn't work. We're constantly trying to improve our offering :)