Summary:
Sesame releases the CSM-1B model, powering the realistic voice assistant Maya.
The model features 1 billion parameters and is licensed under Apache 2.0 for commercial use.
Utilizes RVQ audio codes for advanced audio encoding, similar to Google's SoundStream.
Lacks significant safeguards, relying on an honor system to prevent misuse.
Maya's technology allows for natural speech patterns and can be interrupted during conversation.
Sesame's New AI Model
Sesame, the innovative AI startup, has just unveiled the CSM-1B model, the driving force behind its impressively realistic voice assistant, Maya. This model boasts 1 billion parameters and is licensed under Apache 2.0, allowing for commercial use with minimal restrictions.
What is CSM-1B?
The CSM-1B model is designed to generate RVQ audio codes from both text and audio inputs. RVQ, or residual vector quantization, is a cutting-edge technique for encoding audio into discrete tokens, utilized in various recent AI audio technologies, such as Google's SoundStream and Meta's Encodec.
Technical Backbone
CSM-1B leverages a model from Meta’s Llama family, complemented by an audio decoder component. While a fine-tuned variant of CSM powers Maya, the base model is capable of producing a variety of voices, although it has not been specifically tuned for any single voice.
Limitations and Concerns
Interestingly, Sesame has not disclosed the training data used for CSM-1B. Moreover, the model lacks significant safeguards, relying on an honor system to deter developers from using it to mimic voices without consent or to create misleading content.
Real-World Testing
In a recent demo, cloning a voice took less than a minute, allowing easy generation of speech on sensitive topics. This has raised concerns, echoing warnings from Consumer Reports about the lack of safeguards in popular AI voice cloning tools.
The Vision Behind Sesame
Founded by Brendan Iribe, co-creator of Oculus, Sesame has gained attention for its lifelike assistant technology. Maya and the other assistant, Miles, can speak with natural disfluencies and even be interrupted during speech, emulating human-like interactions.
Additionally, Sesame is working on AI glasses designed for all-day wear, equipped with their proprietary models. The company has secured funding from notable investors such as Andreessen Horowitz, Spark Capital, and Matrix Partners.
Comments