Qlean Dataset Releases Japanese Three-Speaker Speaker-Separated Daily Conversation Corpus Released for Conversational AI │ Qlean Dataset

11/6/2025

Qlean Dataset Releases Japanese Three-Speaker Speaker-Separated Daily Conversation Corpus Released for Conversational AI

Visual Bank Inc. (Tokyo, Japan; CEO Saneyuki Nagai) has announced the release of the “Japanese Three-Speaker Speaker-Separated Daily Conversation Audio Corpus” through its AI-training-data solution, Qlean Dataset, developed under its subsidiary Amana Images Inc.

This dataset contains real-world Japanese speech recordings of natural conversations among three speakers — a male customer, a female customer, and a female store clerk — in a café setting.
It includes four types of audio files, each featuring speaker-separated tracks as well as a mixed version, making it widely applicable for Automatic Speech Recognition (ASR), speaker separation AI, and multimodal or voice-based generative AI foundations, such as audio-integrated LLMs.

Because the recordings include natural responses, overlapping speech, and ambient sounds, the dataset is ideal for evaluating ASR and dialogue generation accuracy, and for training customer-service AI, educational support AI, and conversational LLMs in realistic environments.

▶ About Qlean Dataset: https://qleandataset.visual-bank.co.jp/en

About the “AI Data Recipe” of Qlean Dataset

The "AI Data Recipe" within Qlean Dataset represent its commercially available lineup of original datasets.
They are designed for flexible combination based on usage, accuracy, and delivery requirements, and include both annotated and non-annotated data. Each dataset can be customized or expanded to meet specific needs.
Through partnerships with organizations such as Chiba Lotte Marines and Toyo Keizai Inc., as well as domestic and international networks and new recording projects, Qlean Dataset continues to expand its lineup.
This approach significantly reduces the workload required for data collection and preparation in AI development and accelerates project execution.

▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup

Overview of the Newly Released Dataset

Data type: Audio
Subjects: Japanese nationals — one male customer, one female customer, one female store clerk
Format: WAV audio files
Notes:
– Recorded duration: approximately 7 minutes per audio file
– Scene: Everyday conversation and café order scenario
– Speaker-separation breakdown: male customer, female customer, female store clerk, and a full mixed version of all three speaking together
Sample details URL: https://qleandataset.visual-bank.co.jp/en/lineup/pn-032

Use Case Examples of the Dataset

Speech recognition and speaker separation model improvement
By recording actual café-scene dialogue with three speakers, including speech overlaps, ambient noise, and intonation differences, the dataset can be effectively applied to verify and optimise models for speaker-separated ASR, source localisation, and multi-speaker speech recognition.
Natural Japanese conversational AI training
Because it includes natural conversation flows such as requests, confirmations and responses in everyday speech style and timing, the dataset supports training of chat-bots and concierge-AIs in customer-service and retail settings, as well as conversational generation models for Japanese.
Development of emotion recognition and vocal-feature-analysis AI
The dataset allows analysis of differences in speech tone — for example, the clerk’s polite register or shifts in customer emotion — making it suitable for speech-emotion recognition, paralinguistic analysis, and acoustic-feature extraction research in human-centred AI domains.
Japanese language education and communication-training AI
As a collection of natural Japanese conversational data, the corpus is also valuable for Japanese-language learning AI for non-native speakers, pronunciation-practice applications, and customer-service training materials — recognised as real-world conversation examples with cultural context.
Strengthening audio understanding in LLMs / multimodal AI
In Japanese LLMs and voice-based multimodal models, the dataset supports improved conversation-structure understanding after audio-to-text conversion, and can serve as benchmark data for voice-dialogue LLMs.

Features of Qlean Dataset

All datasets are rights-cleared and commercially usable, collected with full participant consent and international privacy compliance.
Delivered via flexible “Data Recipe” for rapid deployment and customizable dataset creation.

Contact form: https://qleandataset.visual-bank.co.jp/en/contact
Service site: https://qleandataset.visual-bank.co.jp/en

About Visual Bank Inc.

Visual Bank Inc. is a next-generation data infrastructure company committed to “unleashing the potential of all data.”
The company operates THE PEN, an AI-powered assistance tool for manga artists, and wholly owns Amana Images Inc., which provides the AI training data service Qlean Dataset.
Visual Bank has been recognized in national R&D programs and continues to advance initiatives toward real-world AI implementation.

CEO: Saneyuki Nagai
Address: C-Cube Minami Aoyama Bldg. 6F, 7-1-7 Minami Aoyama, Minato-ku, Tokyo 107-0062
Corporate website: https://visual-bank.co.jp/en/
Amana Images overview: https://qleandataset.visual-bank.co.jp/en/company-overview

Back to News