2/12/2026
Qlean Dataset Launches a Japanese Single-Speaker Read Speech Corpus with Transcripts

Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai), through its subsidiary amanaimages Inc., has released a new dataset under its AI training data solution, Qlean Dataset: a Japanese single-speaker read speech corpus with transcripts for speech- and language-based AI development.
The dataset consists of Japanese read-aloud audio on business, self-improvement, and practical topics, paired with accurate transcripts. The content includes explanatory and instructional material—such as descriptions of work processes, structured thinking, and procedural guidance—reflecting speech intended to convey meaning and knowledge rather than simple narration.
Recorded in a stable read-speech format, the corpus features long-form, logically structured audio that supports evaluation of speech recognition as well as downstream tasks including spoken language understanding, summarization, and response generation. All data is provided as part of Qlean Dataset’s original lineup, AI Data Recipe, and is fully rights-cleared for both research and commercial AI development.
Dataset Overview Japanese Single-Speaker Read Speech Corpus (Business, Self-Development, and Practical Topics)

Data Types | Audio, Text | |
|---|---|---|
Speaker Attributes | Japanese | |
Data Formats | Audio: mp3 | |
Recording Length | Per file: approximately 30 seconds to 160 minutes | |
Sampling Rate | 44.1 kHz / 48 kHz | |
Recorded Scenarios | ・A single speaker reading aloud texts from business books, self-development materials, and practical or instructional publications | ・Read speech that explains procedures or organizes concepts while being spoken |
Sample Details |
Use Case Examples
Research Applications
Evaluation of Japanese Speech-Based Language Understanding Models
The dataset can be used to evaluate comprehension accuracy and inference behavior in speech–language models that take Japanese speech as input and perform tasks such as content understanding, summarization, and question answering, using business and practical documents as source material.Multimodal Research Based on Speech–Text Alignment
By leveraging paired audio and transcripts of identical content, researchers can analyze the relationship between spoken expression and textual structure, as well as the impact of speech information on language understanding.
Industrial
Applications Validation of Foundational Models for Voice-Enabled Business Support AI
For AI products designed to understand and process business knowledge or procedural explanations via voice input, the dataset can be used to evaluate recognition and comprehension performance using domain-relevant Japanese speech data. LLM Fine-Tuning withSpeech-Derived Japanese Text
The dataset supports quality evaluation for summarization and answer generation by LLMs trained on Japanese text derived from speech, particularly for explanatory content and logically structured narratives.
About Qlean Dataset
Qlean Dataset is a commercial-use-ready AI training data solution provided by Amana Images Inc., a subsidiary of Visual Bank Inc.
It supports a wide range of data types, including images, videos, audio, 3D assets, and text, enabling both research and commercial AI development in a legally safe environment.
Through collaborations with data partners such as Chiba Lotte Marines Co., Ltd. and Toyo Keizai Inc., Qlean Dataset continues to expand its specialized, industry-focused lineup known as the “AI Data Recipe.”
By reducing the operational burden of data collection and preparation, Qlean Dataset helps organizations establish AI development environments that are both legally compliant and risk-free.
▶ Qlean Dataset: https://qleandataset.visual-bank.co.jp/en
▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup




Key Features of Qlean Dataset
Existing datasets deliverable within one business day
Custom data collection and recording services available
▶ Contact: https://qleandataset.visual-bank.co.jp/en/contact
About Visual Bank Inc.
Visual Bank Inc. is a Tokyo-based startup building Next-Generation Data infrastructure to enhance AI development capabilities under the mission “Unlocking Data Accessibility.”
The company operates THE PEN, an AI-assisted creative tool for manga artists and the Qlean Dataset service.
Its subsidiaries include Amana Images Inc., one of Japan’s largest photostock providers; Qlean Dataset, which leads research and development in AI data; and THE PEN Inc., an AI-assisted creative tool for manga artists.
CEO: Saneyuki Nagai
Address: 6F, C-Cube Minami Aoyama Building, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo 107-0062
Corporate Site: https://visual-bank.co.jp/en
Amana Images: https://qleandataset.visual-bank.co.jp/en/company-overview





