Qlean Dataset Launches a Japanese Two-Speaker Social & Cultural Dialogue Audio Dataset with Transcripts │ Qlean Dataset

1/20/2026

Qlean Dataset Launches a Japanese Two-Speaker Social & Cultural Dialogue Audio Dataset with Transcripts

Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai), through its subsidiary Amana Images Inc., has launched a new dataset within its AI training data solution, Qlean Dataset. The dataset is titled “Japanese Two-Speaker Social & Cultural Dialogue Audio Corpus with Transcripts” and is designed to support the development of speech- and language-based AI systems, including Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Large Language Models (LLMs).

This dataset is a new addition to Qlean Dataset’s machine learning lineup, AI Data Recipe. It features Japanese dialogue audio in which two speakers—a male and a female—engage in natural conversations on everyday social and cultural topics such as daily life, relationships, personal values, work, and living environments. Each recording is paired with accurately aligned transcripts.

All conversations are recorded without scripts, allowing speakers to exchange opinions freely at a natural pace. The dataset captures realistic dialogue structures, including turn-taking, backchannel responses, topic shifts, and expressions of agreement, hesitation, and empathy, reflecting real-world conversational dynamics.

Dataset Overview: Japanese Two-Speaker Social & Cultural Dialogue Audio Corpus with Transcripts

Data Types	Audio, Text
Speaker Attributes	Japanese speakers, male and female, aged 20s to 50s
Data Formats	Audio: mp3 / wav Text: txt / json / csv
Total Duration	Approximately 450 hours in total (each recording ranges from approximately 5 to 60 minutes)
Audio Sampling Rate	44.1kHz / 48kHz
Conversation Scenarios	・Japanese two-speaker dialogues on everyday social and cultural topics, including daily life, personal values, and social context	・Unscripted, naturally flowing conversations with backchannel responses, topic shifts, and concrete examples
Sample Details	https://qleandataset.visual-bank.co.jp/lineup/pn-017

Use Case Examples for the Japanese Two-Speaker Social & Cultural Dialogue Dataset

Research Applications

Analyzing Value Expression and Opinion Exchange Structures in Japanese Dialogue
Using dialogue audio and transcripts related to daily life, human relationships, and perspectives on work, this dataset can support research in linguistics and information science focused on value-based expressions, opinion conflicts, and consensus-building processes. It is suitable for evaluating utterance understanding and semantic analysis within conversational context.

Industrial Applications

Evaluating Everyday Conversation and Value-Based Responses in Conversational AI
Natural dialogue data related to daily life, working styles, and interpersonal relationships can be used to evaluate empathetic responses, opinion-based replies, and conversation continuity in conversational AI systems and chatbots. The dataset is suitable for assessing dialogue scenarios involving opinion exchange, which differ from standard FAQ-style interactions.
Evaluating Conversational Context Understanding and Response Generation in Japanese LLMs
Dialogue texts that include personal experiences and perspectives can be used to evaluate and fine-tune Japanese LLMs for context retention, handling topic transitions, and generating responses to value-laden utterances.

Other Practical Applications

Educational Material for Communication Design and Dialogue Analysis
The combination of dialogue audio and transcripts on everyday social topics can be used as educational material for analyzing dialogue structures and the progression of opinion exchange. The dataset is suitable for educational use in learning the alignment between spoken language and text.

About Qlean Dataset

Qlean Dataset is a commercial-use-ready AI training data solution provided by Amana Images Inc., a subsidiary of Visual Bank Inc.
It supports a wide range of data types, including images, videos, audio, 3D assets, and text, enabling both research and commercial AI development in a legally safe environment.
Through collaborations with data partners such as Chiba Lotte Marines Co., Ltd. and Toyo Keizai Inc., Qlean Dataset continues to expand its specialized, industry-focused lineup known as the “AI Data Recipe.”
By reducing the operational burden of data collection and preparation, Qlean Dataset helps organizations establish AI development environments that are both legally compliant and risk-free.

▶ Qlean Dataset: https://qleandataset.visual-bank.co.jp/en
▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup

Key Features of Qlean Dataset

Existing datasets deliverable within one business day
Custom data collection and recording services available

▶ Contact: https://qleandataset.visual-bank.co.jp/en/contact

About Visual Bank Inc.

Visual Bank Inc. is a Tokyo-based startup building Next-Generation Data infrastructure to enhance AI development capabilities under the mission “Unlocking Data Accessibility.”
The company operates THE PEN, an AI-assisted creative tool for manga artists and the Qlean Dataset service.
Its subsidiaries include Amana Images Inc., one of Japan’s largest photostock providers; Qlean Dataset, which leads research and development in AI data; and THE PEN Inc., an AI-assisted creative tool for manga artists.

CEO: Saneyuki Nagai
Address: 6F, C-Cube Minami Aoyama Building, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo 107-0062
Corporate Site: https://visual-bank.co.jp/en
Amana Images: https://qleandataset.visual-bank.co.jp/en/company-overview

Back to News