# Chunkr

## About

Parse PDFs, images, and spreadsheets into LLM-ready HTML/Markdown or JSON. OCR, layout detection, reading order, bounding boxes, citations, and schema-based extraction.

- Verified: Yes

## Services

### AI-Powered Data & Document Analysis
- [AI Document and Data Analysis](https://bilarna.com/ai/ai-powered-data-and-document-analysis/ai-document-and-data-analysis)

### Document Processing & Data Extraction
- [Document Parsing & Data Extraction](https://bilarna.com/ai/document-processing-and-data-extraction/document-parsing-and-data-extraction)

## Frequently Asked Questions

**Q: What types of documents can be processed by document intelligence APIs?**
A: Document intelligence APIs can process a wide range of document types including PDFs, images, and spreadsheets. These APIs are designed to extract structured data from complex documents by using techniques such as OCR (Optical Character Recognition), layout detection, and schema-based extraction. This allows the transformation of various document formats into machine-readable formats like HTML, Markdown, or JSON, making the data ready for further analysis or integration with large language models.

**Q: How does OCR technology enhance data extraction from documents?**
A: OCR, or Optical Character Recognition, is a technology that converts different types of documents, such as scanned paper documents or images, into editable and searchable data. In document intelligence systems, OCR plays a crucial role by recognizing and digitizing text within images or PDFs. This enables the extraction of textual information that would otherwise be inaccessible for automated processing. By integrating OCR with layout detection and schema-based extraction, document intelligence APIs can accurately parse complex documents and convert them into structured formats like JSON or HTML for further use.

**Q: What output formats are commonly supported by document parsing APIs?**
A: Document parsing APIs typically support output formats that facilitate easy integration and further processing. Common formats include HTML and Markdown, which preserve the document's structure and are suitable for web or text-based applications. JSON is also widely supported as it provides a flexible, structured data format ideal for programmatic access and manipulation. These formats enable developers to convert complex documents into machine-readable data that can be used for analytics, machine learning, or feeding into large language models.

## Links

- Profile: https://bilarna.com/provider/chunkr
- Structured data: https://bilarna.com/provider/chunkr/agent.json
- API schema: https://bilarna.com/provider/chunkr/openapi.yaml