2/18/2026
Qlean Dataset Launches Japanese Single-Speaker Horror Story Read-Aloud Speech Corpus with Transcripts

Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai) has launched a new dataset under its AI training data solution, Qlean Dataset, through its subsidiary amanaimages Inc.: a Japanese single-speaker horror-themed read-aloud speech corpus with transcripts designed for speech- and language-based AI development and research, including Automatic Speech Recognition (ASR), speech understanding, and Large Language Models (LLMs).
This dataset consists of Japanese audio recordings in which a native Japanese speaker reads horror and ghost-story texts aloud, paired with transcripts that faithfully reflect the spoken content. As the narrative progresses, the speaker naturally expresses tension and unease, capturing emotionally nuanced delivery in addition to structured read speech. The recordings therefore include both stable narration and continuous, emotion-infused speech suitable for advanced speech modeling.
Because horror storytelling relies heavily on prosody, pauses, and tonal shifts closely tied to narrative context, the dataset supports not only sentence-level speech recognition but also long-context speech understanding and language model training. As the corpus is recorded in a single-speaker format, it is well suited for model evaluation without speaker separation, as well as controlled analysis of speech and language behavior under fixed speaker conditions.
Qlean Dataset provides training data structured for both research and commercial AI development, with rights and usage conditions clearly organized to support real-world deployment. This corpus is designed for validation, evaluation, and training phases in speech and language AI development and is offered as part of Qlean Dataset’s original data lineup, AI Data Recipe.
Dataset Overview:Japanese Single-Speaker Horror-Themed Read-Aloud Speech Corpus with Transcripts
Data Types | Audio, Text |
|---|---|
Speaker Attribute | Japanese |
File Formats | Audio: mp3 |
Recording Length | Approximately 30 seconds to 90 minutes per audio file |
Sampling Rate | 44.1 kHz / 48 kHz |
Scene Description | A single speaker reading horror or ghost-story texts with emotional expressionNarrative delivery characterized by tension and unease |
Sample Details |
Use Case Examples
Research Applications
Evaluation of ASR and Speech Understanding Models for Long-Form Audio Input
The continuous narrative structure of horror storytelling enables evaluation of long-utterance recognition accuracy and analysis of error patterns across extended contextual speech in ASR systems.Context Retention Assessment for Language Models Using Speech Input
The corpus can be used to evaluate how LLMs or speech understanding models handle narrative context retention and semantic comprehension when processing ASR outputs derived from extended storytelling audio.
Industrial Applications
Validation Data for Conversational AI and Voice Generation Systems
The emotionally expressive speech, including prosodic variation and pauses, can be used to evaluate input comprehension and output quality in conversational AI and speech synthesis systems.Pre-Deployment Testing for Call Center and Voice UI Processing Models
Continuous speech containing emotional nuance supports validation of recognition stability and operational risk assessment for voice UI systems and speech processing infrastructure.
About Qlean Dataset
Qlean Dataset is a commercial-use-ready AI training data solution provided by Amana Images Inc., a subsidiary of Visual Bank Inc.
It supports a wide range of data types, including images, videos, audio, 3D assets, and text, enabling both research and commercial AI development in a legally safe environment.
Through collaborations with data partners such as Chiba Lotte Marines Co., Ltd. and Toyo Keizai Inc., Qlean Dataset continues to expand its specialized, industry-focused lineup known as the “AI Data Recipe.”
By reducing the operational burden of data collection and preparation, Qlean Dataset helps organizations establish AI development environments that are both legally compliant and risk-free.
▶ Qlean Dataset: https://qleandataset.visual-bank.co.jp/en
▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup




Key Features of Qlean Dataset
Existing datasets deliverable within one business day
Custom data collection and recording services available
▶ Contact: https://qleandataset.visual-bank.co.jp/en/contact
About Visual Bank Inc.
Visual Bank Inc. is a Tokyo-based startup building Next-Generation Data infrastructure to enhance AI development capabilities under the mission “Unlocking Data Accessibility.”
The company operates THE PEN, an AI-assisted creative tool for manga artists and the Qlean Dataset service.
Its subsidiaries include Amana Images Inc., one of Japan’s largest photostock providers; Qlean Dataset, which leads research and development in AI data; and THE PEN Inc., an AI-assisted creative tool for manga artists.
CEO: Saneyuki Nagai
Address: 6F, C-Cube Minami Aoyama Building, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo 107-0062
Corporate Site: https://visual-bank.co.jp/en
Amana Images: https://qleandataset.visual-bank.co.jp/en/company-overview





