1/9/2026

Qlean Dataset Launches a Japanese Educational Dialogue Speech Corpus for AI Development

Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai), through its subsidiary Amana Images Inc., has begun offering a “Japanese Two-Speaker Education-Themed Dialogue Speech Corpus and Transcripts” via its AI training data solution, Qlean Dataset.

This dataset is designed to support the development of speech- and language-based AI technologies, including Automatic Speech Recognition (ASR), Natural Language Processing (NLP), LLMs.
 The dataset consists of Japanese dialogue audio in which two speakers discuss topics related to education, career guidance, learning environments, and personal decision-making. Each recording is provided with aligned transcripts reflecting the spoken content.

All conversations are unscripted and progress naturally through questions, responses, and the sharing of experiences. Speaker turn-taking and contextual references are preserved, making the dataset suitable for evaluating dialogue understanding and conversational structure.

Qlean Dataset provides rights-cleared data for both research and commercial AI development. This dataset is offered to support researchers and developers who require Japanese educational dialogue data for evaluation and validation purposes.

Overview of the Japanese Two-Speaker Education Dialogue Corpus

Data Type

Voice, text

Subject attributes

Men and women in their 20s to 50s

Data Format

Audio data: wav
Text data: txt

Recording Time

Total: Approximately 883 hours (approximately 5-60 minutes per audio segment)

Audio Rate

44.1kHz

Target Scenes

・Japanese audio recordings featuring dialogues between two speakers on topics related to education, learning, and career paths
・Conversations covering themes such as teacher certification, future planning, entrance examinations, educational policies, and the role of social media

sample

https://qleandataset.visual-bank.co.jp/en/lineup/pn-016

Use Case Examples for the Japanese Two-Speaker Education Dialogue Corpus 

[Research Applications] 

  • Evaluation and Analysis of ASR Models Using Dialogue Speech
    Dialogue audio related to education and career guidance can be used to analyze Japanese speech recognition accuracy and error patterns under conditions involving speaker alternation and interactive responses.

  • Dialogue Understanding Research in Education and Career Guidance Contexts
    Dialogue transcripts related to career choices and learning policies can be used to study dialogue understanding and contextual analysis methods, including topic transitions and opinion formation processes.

Industrial Applications 

  • Validation of Dialogue AI for Education and Career Counseling
    The dataset can be used as evaluation data for intent understanding and response design in conversational AI and chatbots designed for education and career consultation scenarios.

  • Preliminary Evaluation of Japanese Dialogue Processing in LLMs
    Dialogue text that includes values and decision-making related to education and learning can be used to evaluate Japanese dialogue handling capabilities and contextual retention performance of LLMs.

Additional Practical Use Cases 

  • Dialogue Quality Evaluation for Education and Learning Support Services
    Dialogue audio covering topics such as career selection, entrance examinations, and parenting policies can serve as reference data for evaluating the naturalness and flow of conversations in education-related consultation services.

  • Speech Recognition Evaluation for Education Support Contact Centers
    Dialogue audio containing education-specific vocabulary and topics can be used to assess speech recognition and transcription accuracy for inquiry handling and consultation desk scenarios.

About Qlean Dataset

Qlean Dataset is a commercial-use-ready AI training data solution provided by Amana Images Inc., a subsidiary of Visual Bank Inc.
It supports a wide range of data types, including images, videos, audio, 3D assets, and text, enabling both research and commercial AI development in a legally safe environment.
Through collaborations with data partners such as Chiba Lotte Marines Co., Ltd. and Toyo Keizai Inc., Qlean Dataset continues to expand its specialized, industry-focused lineup known as the “AI Data Recipe.”
By reducing the operational burden of data collection and preparation, Qlean Dataset helps organizations establish AI development environments that are both legally compliant and risk-free.

▶ Qlean Dataset: https://qleandataset.visual-bank.co.jp/en
▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup

Key Features of Qlean Dataset

  • Existing datasets deliverable within one business day

  • Custom data collection and recording services available

▶ Contact: https://qleandataset.visual-bank.co.jp/en/contact

About Visual Bank Inc.

Visual Bank Inc. is a Tokyo-based startup building Next-Generation Data infrastructure to enhance AI development capabilities under the mission “Unlocking Data Accessibility.”
The company operates THE PEN, an AI-assisted creative tool for manga artists and the Qlean Dataset service.
Its subsidiaries include Amana Images Inc., one of Japan’s largest photostock providers; Qlean Dataset, which leads research and development in AI data; and THE PEN Inc., an AI-assisted creative tool for manga artists.

CEO: Saneyuki Nagai
Address: 6F, C-Cube Minami Aoyama Building, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo 107-0062
Corporate Site: https://visual-bank.co.jp/en
Amana Images: https://qleandataset.visual-bank.co.jp/en/company-overview

    amana images inc.

    Visual Bank Inc.


    © amanaimages inc.