1/28/2026

Qlean Dataset Launches a Japanese Single-Speaker Music-Themed Audio Dataset with Transcripts

Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai), through its subsidiary amanaimages Inc., has launched a new dataset under its AI training data solution, Qlean Dataset: the Japanese Single-Speaker Music-Themed Audio Corpus with Transcripts.

The dataset is designed to support the development and evaluation of speech- and language-based AI systems, including ASR, NLP, and LLMs.

This dataset is a new addition to Qlean Dataset’s machine learning lineup, AI Data Recipe.
It features Japanese audio recordings in which a single speaker delivers extended, monologue-style speech on music-related topics such as artists, songs, musical experiences, genres, and cultural background. Each recording is paired with accurate transcripts that reflect the spoken content.

All recordings are conducted without strict scripting, allowing speakers to express their thoughts naturally in continuous speech.
As a result, the dataset is well suited for evaluating speech recognition, discourse continuity, vocabulary usage, and language understanding in AI systems that process long-form spoken input.

Qlean Dataset provides AI development data for both research and commercial use, with rights clearance and usage conditions carefully organized.
This dataset is offered to support reliable evaluation environments using Japanese speech and text data in the music and cultural content domain.

Dataset Overview:Japanese Single-Speaker Music-Themed Audio Corpus with Transcripts

Data Types

Audio, Text

Speaker Attributes

Japanese speakers, male and female, aged 20s to 50s

Data Formats

Audio: mp3 / wav
Text: txt /json /csv

Total Duration

Approximately 210 hours (each recording ranges from approximately 5 to 60 minutes)

Audio Sampling Rate

44.1kHz / 48kHz

Recorded Scenarios

Single-speaker scenes in which the speaker continuously explains or discusses music-related topics

Sample Details

https://qleandataset.visual-bank.co.jp/en/lineup/pn-012

Use Case Examples for the Japanese Single-Speaker Music-Themed Audio Corpus with Transcripts

  • Evaluation of Japanese ASR models with domain-specific vocabulary

    This dataset can be used to evaluate ASR models on continuous single-speaker speech that includes domain-specific terms, proper nouns, and titles related to cultural fields such as music, comics, and film. It enables assessment of how consistently models recognize explanatory and evaluative speech over extended segments.

- Research Use Cases 

  • Evaluation of language understanding models for review-style audio content

    Assuming audio content such as music reviews and artist commentary spoken from an individual perspective, the dataset can be used to evaluate downstream NLP and LLM tasks after speech recognition, including content understanding, key point extraction, and summary generation.

  • Validation of voice-based recommendation and search systems

    Based on titles, artist names, and evaluative expressions contained within speech, the dataset can serve as evaluation data for voice-input search and recommendation systems that extract, classify, and relate cultural content.

- Industrial Use Cases 

  • Subtitle generation and summarization for cultural audio content

    The dataset can be applied to the evaluation of speech processing functions for educational and informational use cases, such as subtitle generation and overview creation for explanatory audio related to music, film, and comics.

About Qlean Dataset

Qlean Dataset is a commercial-use-ready AI training data solution provided by Amana Images Inc., a subsidiary of Visual Bank Inc.
It supports a wide range of data types, including images, videos, audio, 3D assets, and text, enabling both research and commercial AI development in a legally safe environment.
Through collaborations with data partners such as Chiba Lotte Marines Co., Ltd. and Toyo Keizai Inc., Qlean Dataset continues to expand its specialized, industry-focused lineup known as the “AI Data Recipe.”
By reducing the operational burden of data collection and preparation, Qlean Dataset helps organizations establish AI development environments that are both legally compliant and risk-free.

▶ Qlean Dataset: https://qleandataset.visual-bank.co.jp/en
▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup

Key Features of Qlean Dataset

  • Existing datasets deliverable within one business day

  • Custom data collection and recording services available

▶ Contact: https://qleandataset.visual-bank.co.jp/en/contact

About Visual Bank Inc.

Visual Bank Inc. is a Tokyo-based startup building Next-Generation Data infrastructure to enhance AI development capabilities under the mission “Unlocking Data Accessibility.”
The company operates THE PEN, an AI-assisted creative tool for manga artists and the Qlean Dataset service.
Its subsidiaries include Amana Images Inc., one of Japan’s largest photostock providers; Qlean Dataset, which leads research and development in AI data; and THE PEN Inc., an AI-assisted creative tool for manga artists.

CEO: Saneyuki Nagai
Address: 6F, C-Cube Minami Aoyama Building, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo 107-0062
Corporate Site: https://visual-bank.co.jp/en
Amana Images: https://qleandataset.visual-bank.co.jp/en/company-overview

    amana images inc.

    Visual Bank Inc.


    © amanaimages inc.