Under Active Development: Our current priority is building the best models, not the perfect website! Please bear with us as we improve both.

    Speech-To-Text.
    Open Source, Fast & Accurate.

    low-footprint models (70MB) that run on any device or server. Apache licensed source code, permissive CC-BY-SA models and affordable commercial models.

    The Kroko.AI Advantage

    Blazing Fast on CPU

    Achieve over 10x real-time transcription speed on a single, ordinary CPU core. No expensive, power-hungry GPUs needed.

    Small Footprint, Any Device

    We specialize in lightweight models that run efficiently everywhere—from mobile devices and web browsers to on-premise servers.

    Uncompromised Data Privacy

    With 100% on-premise and on-device deployment, your sensitive audio data never leaves your infrastructure.

    Global Language Support

    Built for a global audience with support for 12+ languages, and our community is helping us add more all the time.

    Built for Innovators

    ScenarioThe Kroko.AI Advantage
    SaaS & Mobile Apps
    On-device voice control, dictation or captions.
    Privacy-Conscious Industries
    Ultimate privacy and compliancy with hallucination-free local transcripts.
    Call Centers
    Real-time, on-site or on-device Agent Assist or Quality Assurance.
    Communications
    Visual voicemail, IVR's, voicebots.
    Open Source Projects
    Feel free to use only the CC-BY-SA models or offer both.
    Media
    Live Low-latency subtitles.

    How Do We Compare?

    Official, standardized benchmarking is a complex process, especially with diverse numbers and abbreviations. While we are formalizing our results, here are our transparent, at-a-glance findings based on extensive internal testing.

    Compared to Whisper v3 Large

    Accuracy:Our models show a lower Word Error Rate (WER) across most tested languages (except for English).
    Real-time Streaming:Kroko offers true, low-latency streaming. Whisper's streaming is simulated and has higher latency.
    Reliability:Kroko does not hallucinate. We prioritize factual transcripts, avoiding the invented sentences Whisper can produce.
    Performance & Size:Our models are significantly smaller, download faster, and are more efficient on CPU.
    Features:Whisper offers auto-translation, which Kroko does not.

    Compared to NVIDIA Nemo

    These are preliminary findings pending more extensive testing.

    General Recognition:Kroko appears to have slightly better general recognition in our initial tests.
    Specialized Names:Nemo currently scores better on foreign names, especially on datasets like FLEURS where they are prevalent.
    CPU Streaming:Kroko is built for efficient, low-latency streaming directly on CPU, a key design focus.
    Features:Some Nemo models can translate languages; Kroko does not.

    A Note on Benchmarks: We do not recommend relying solely on "people-reading wikipedia" benchmarks like Common Voice or FLEURS, as they are often not representative of real conversational speech. We strongly encourage you to make your own comparison on your own data. (More information on why common benchmarks fail is coming soon in a dedicated blog post.)

    Ready to Start Building?

    Read the Docs

    Get in Touch

    Have questions about our models or need custom solutions? We're here to help.