1/9/2026
Qlean Dataset Launches a Japanese Educational Dialogue Speech Corpus for AI Development

Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai), through its subsidiary Amana Images Inc., has begun offering a “Japanese Two-Speaker Education-Themed Dialogue Speech Corpus and Transcripts” via its AI training data solution, Qlean Dataset.
This dataset is designed to support the development of speech- and language-based AI technologies, including Automatic Speech Recognition (ASR), Natural Language Processing (NLP), LLMs.
The dataset consists of Japanese dialogue audio in which two speakers discuss topics related to education, career guidance, learning environments, and personal decision-making. Each recording is provided with aligned transcripts reflecting the spoken content.
All conversations are unscripted and progress naturally through questions, responses, and the sharing of experiences. Speaker turn-taking and contextual references are preserved, making the dataset suitable for evaluating dialogue understanding and conversational structure.
Qlean Dataset provides rights-cleared data for both research and commercial AI development. This dataset is offered to support researchers and developers who require Japanese educational dialogue data for evaluation and validation purposes.
Overview of the Japanese Two-Speaker Education Dialogue Corpus
Data Type | Voice, text |
|---|---|
Subject attributes | Men and women in their 20s to 50s |
Data Format | Audio data: wav |
Recording Time | Total: Approximately 883 hours (approximately 5-60 minutes per audio segment) |
Audio Rate | 44.1kHz |
Target Scenes | ・Japanese audio recordings featuring dialogues between two speakers on topics related to education, learning, and career paths |
sample |
Use Case Examples for the Japanese Two-Speaker Education Dialogue Corpus
[Research Applications]
Evaluation and Analysis of ASR Models Using Dialogue Speech
Dialogue audio related to education and career guidance can be used to analyze Japanese speech recognition accuracy and error patterns under conditions involving speaker alternation and interactive responses.Dialogue Understanding Research in Education and Career Guidance Contexts
Dialogue transcripts related to career choices and learning policies can be used to study dialogue understanding and contextual analysis methods, including topic transitions and opinion formation processes.
Industrial Applications
Validation of Dialogue AI for Education and Career Counseling
The dataset can be used as evaluation data for intent understanding and response design in conversational AI and chatbots designed for education and career consultation scenarios.Preliminary Evaluation of Japanese Dialogue Processing in LLMs
Dialogue text that includes values and decision-making related to education and learning can be used to evaluate Japanese dialogue handling capabilities and contextual retention performance of LLMs.
Additional Practical Use Cases
Dialogue Quality Evaluation for Education and Learning Support Services
Dialogue audio covering topics such as career selection, entrance examinations, and parenting policies can serve as reference data for evaluating the naturalness and flow of conversations in education-related consultation services.Speech Recognition Evaluation for Education Support Contact Centers
Dialogue audio containing education-specific vocabulary and topics can be used to assess speech recognition and transcription accuracy for inquiry handling and consultation desk scenarios.
About Qlean Dataset
Qlean Dataset is a commercial-use-ready AI training data solution provided by Amana Images Inc., a subsidiary of Visual Bank Inc.
It supports a wide range of data types, including images, videos, audio, 3D assets, and text, enabling both research and commercial AI development in a legally safe environment.
Through collaborations with data partners such as Chiba Lotte Marines Co., Ltd. and Toyo Keizai Inc., Qlean Dataset continues to expand its specialized, industry-focused lineup known as the “AI Data Recipe.”
By reducing the operational burden of data collection and preparation, Qlean Dataset helps organizations establish AI development environments that are both legally compliant and risk-free.
▶ Qlean Dataset: https://qleandataset.visual-bank.co.jp/en
▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup




Key Features of Qlean Dataset
Existing datasets deliverable within one business day
Custom data collection and recording services available
▶ Contact: https://qleandataset.visual-bank.co.jp/en/contact
About Visual Bank Inc.
Visual Bank Inc. is a Tokyo-based startup building Next-Generation Data infrastructure to enhance AI development capabilities under the mission “Unlocking Data Accessibility.”
The company operates THE PEN, an AI-assisted creative tool for manga artists and the Qlean Dataset service.
Its subsidiaries include Amana Images Inc., one of Japan’s largest photostock providers; Qlean Dataset, which leads research and development in AI data; and THE PEN Inc., an AI-assisted creative tool for manga artists.
CEO: Saneyuki Nagai
Address: 6F, C-Cube Minami Aoyama Building, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo 107-0062
Corporate Site: https://visual-bank.co.jp/en
Amana Images: https://qleandataset.visual-bank.co.jp/en/company-overview





