3/17/2026

Qlean Dataset: High-Fidelity Japanese Reading Corpus for Foreign Literature

Visual Bank Inc. (Minato-ku, Tokyo; Saneyuki Nagai, Representative Director and CEO), through its subsidiary amana images inc., is pleased to announce the release of the "Japanese Reading Corpus of Foreign Literature" under its AI training data solution, Qlean Dataset. This dataset is specifically optimized for improving Automatic Speech Recognition (ASR) accuracy and training Text-to-Speech (TTS) models with natural prosody.

The dataset focuses on Japanese translations of foreign literary works, featuring high-quality audio by a single native Japanese speaker who narrates descriptive scenes and philosophical passages in a calm, consistent tone. It comprehensively covers "written-style" Japanese, characterized by the sophisticated syntax and complex grammatical structures unique to translated literature—distinct from everyday conversational speech. This makes it an ideal resource for research and development in long-context speech analysis and Natural Language Processing (NLP) involving advanced vocabulary. The consistent vocal quality enables the construction of learning models that can reproduce narrative texts with high clarity and listener engagement.

This release is part of the "AI Data Recipe," Qlean Dataset’s lineup of original data assets designed for AI development. It is intended for use across various development phases, from generating professional narrations for audiobooks to validating context-dependent ASR engines. Visual Bank and amana images remain committed to supporting global AI research and development by providing high-quality Japanese visual and linguistic assets.

Dataset Overview: Japanese Reading Corpus of Foreign Literature

Data Types:

Audio (MP3), Text (TXT)

Subject Attribute:

Japanese native speaker

Recording Length:

30 seconds to 90 minutes per file

Audio Sampling Rate:

44.1kHz / 48kHz

Target Scenes:

・Reading Japanese translations of foreign literary works。
・Narration of stories and philosophical thoughts in a calm, steady tone

Sample Details:

https://qleandataset.visual-bank.co.jp/en/lineup

Potential Use Cases

Research & Academia

  • Accuracy Validation for Long-Context ASR Models: 

    The dataset serves as a benchmark to measure how well ASR models maintain context when transcribing Japanese literary prose, which often features long sentences, inversions, and complex modifiers.

Industrial Applications

  • Development of Specialized Narrative TTS Engines: 

    This corpus provides high-quality ground truth data for training TTS models in the entertainment and media sectors. It is ideal for producing audiobooks or automated news narration that requires expressive, evocative speech without excessive emotional bias.

Education & Social Implementation (EdTech & Accessibility)

  • Pronunciation Assessment and Listening Support for Japanese Learners: 

    By using standard, polite Japanese pronunciation as ground truth, this data can be used to build AI for correcting learners' pronunciation or implementing natural, fatigue-free narration for reading-assistive devices for the visually impaired.

  • Fine-Tuning LLMs for Literary Context Understanding: 

    Pairing structured literary text with its corresponding audio allows for the fine-tuning of specialized, high-quality models to improve summary generation and the translation of sophisticated literary expressions.

About Qlean Dataset

Qlean Dataset is a commercially cleared AI training data solution provided by Amana Images, a subsidiary of Visual Bank Group. The platform offers diverse data formats including image, video, audio, 3D, and text, as well as a specialized AI Data Recipe lineup developed through collaborations with major media organizations and data rights holders.

URL:https://qleandataset.visual-bank.co.jp/en

About Visual Bank Inc.

Visual Bank Group is a technology company developing data infrastructure and AI solutions that support advanced AI development. The company operates THE PEN, an AI tool for manga creators, and its subsidiary, amanaimages Inc., provides commercial digital content and AI training data solutions, including Qlean Dataset. Visual Bank is also a selected participant in GENIAC, a Japanese government initiative supporting the advancement of next generation AI technologies.

CEO: Saneyuki Nagai
Website:https://visual-bank.co.jp/en

    amana images inc.

    Visual Bank Inc.


    © amanaimages inc.