1/8/2026

Qlean Dataset Launches a Japanese Business Document & Form Dataset for AI Development

Qlean Dataset, an AI training data solution operated by Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai), has launched a new collection of business document and form datasets designed for the development and research of large language models (LLMs), optical character recognition (OCR), and multimodal AI systems.


This dataset is provided as a new addition to Qlean Dataset’s machine learning dataset lineup, AI Data Recipe. It consists of document data commonly used in real-world business processes, including resumes, CVs, receipts, application forms, and questionnaires.

The dataset includes documents stored in PDF and image formats and incorporates practical characteristics unique to business documents, such as diverse layout structures, textual information, and variations in field placement and formatting. These elements reflect real-world input conditions that are difficult to reproduce using plain text data alone.

As generative AI and workflow automation continue to be implemented across industries, understanding and processing unstructured documents accumulated within organizations has become a critical challenge in AI development. At the same time, business documents often contain personal or contractual information, requiring careful consideration of data rights and handling when used as training data.

This dataset provides business documents and forms that have been organized specifically for AI development purposes, enabling realistic training and evaluation of document understanding and information extraction models. Drawing on its experience in providing AI training data, Visual Bank continues to prepare datasets suitable for both research and commercial development.

Overview of the Business Document & Form Dataset

Document Examples:

Resumes, CVs, receipts, application forms, questionnaires, and more

Data Formats:

PDF / JPEG / PNG

Sample Details:

https://qleandataset.visual-bank.co.jp/en/lineup/ds-047

Use Case Examples for the Business Document & Form Dataset

[Research Applications]

  • Document Structure and Layout Analysis
    The dataset can be used to research and evaluate document structure analysis and layout understanding models by focusing on field placement and layout patterns commonly found in business documents.

  • Validation of Information Extraction and Question Answering Models
    Through tasks such as extracting specific information from resumes or application forms, the dataset supports accuracy evaluation of information extraction and question answering models based on NLP and LLM technologies.

[Industrial Applications]

  • Development of Document Processing AI (OCR / IDP)
    The dataset can be used in the development and validation of OCR and Intelligent Document Processing (IDP) systems, covering end-to-end processes from text recognition to structured field extraction for receipts and application forms.

  • Evaluation of Document Understanding in Internal LLM Systems
    It can serve as evaluation data for assessing comprehension accuracy and response validity when business documents are used as inputs in internal document search systems or AI-powered business support chatbots.

About Qlean Dataset

Qlean Dataset is a commercial-use-ready AI training data solution provided by Amana Images Inc., a subsidiary of Visual Bank Inc.
It supports a wide range of data types, including images, videos, audio, 3D assets, and text, enabling both research and commercial AI development in a legally safe environment.
Through collaborations with data partners such as Chiba Lotte Marines Co., Ltd. and Toyo Keizai Inc., Qlean Dataset continues to expand its specialized, industry-focused lineup known as the “AI Data Recipe.”
By reducing the operational burden of data collection and preparation, Qlean Dataset helps organizations establish AI development environments that are both legally compliant and risk-free.

▶ Qlean Dataset: https://qleandataset.visual-bank.co.jp/en
▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup

Key Features of Qlean Dataset

  • Existing datasets deliverable within one business day

  • Custom data collection and recording services available

▶ Contact: https://qleandataset.visual-bank.co.jp/en/contact

About Visual Bank Inc.

Visual Bank Inc. is a Tokyo-based startup building Next-Generation Data infrastructure to enhance AI development capabilities under the mission “Unlocking Data Accessibility.”
The company operates THE PEN, an AI-assisted creative tool for manga artists and the Qlean Dataset service.
Its subsidiaries include Amana Images Inc., one of Japan’s largest photostock providers; Qlean Dataset, which leads research and development in AI data; and THE PEN Inc., an AI-assisted creative tool for manga artists.

CEO: Saneyuki Nagai
Address: 6F, C-Cube Minami Aoyama Building, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo 107-0062
Corporate Site: https://visual-bank.co.jp/en
Amana Images: https://qleandataset.visual-bank.co.jp/en/company-overview

    amana images inc.

    Visual Bank Inc.


    © amanaimages inc.