README.md•8.29 kB
# ra-mcp (WIP)
## MCPs for Riksarkivet
A MCP server and command-line tools for searching and browsing transcribed historical documents from the Swedish National Archives (Riksarkivet).
## Features
- **Full-text search** across millions of transcribed historical documents
- **Complete page transcriptions** with accurate text extraction from historical manuscripts
- **Reference-based document browsing** using official archive reference codes
- **Contextual search highlighting** to identify relevant content quickly
- **High-resolution image access** to original document scans via IIIF
## Getting Started
### Quick Setup
```bash
# Search for anything - uv will auto-install dependencies
uv run tools/ra.py search "Stockholm"
```
## How to Use
### 1. Search for Keywords
Find documents containing specific words or phrases:
```bash
# Basic search
uv run tools/ra.py search "Stockholm"
# Search with full page transcriptions
uv run tools/ra.py search "trolldom" --context --max-pages 5
# Search without document grouping
uv run tools/ra.py search "vasa" --context --no-grouping --max-pages 3
```
**Options:**
- `--max N` - Maximum search results (default: 50)
- `--max-display N` - Maximum results to display (default: 20)
- `--context` - Show full page transcriptions
- `--max-pages N` - Maximum pages to load context for (default: 10)
- `--no-grouping` - Show pages individually instead of grouped by document
### 2. Browse Specific Documents
When you find interesting documents, browse them directly:
```bash
# View single page
uv run tools/ra.py browse "SE/RA/123" --page 5
# View page range
uv run tools/ra.py browse "SE/RA/123" --pages "1-10"
# View specific pages with search highlighting
uv run tools/ra.py browse "SE/RA/123" --page "5,7,9" --search-term "Stockholm"
```
**Options:**
- `--page` or `--pages` - Page numbers (e.g., "5", "1-10", "5,7,9")
- `--search-term` - Highlight this term in the text
- `--max-display N` - Maximum pages to display (default: 20)
### 3. Get Full Context
See complete pages with surrounding context for better understanding:
```bash
# Find pages with keyword and show full transcriptions
uv run tools/ra.py show-pages "Stockholm" --max-pages 5
# Include surrounding pages for context
uv run tools/ra.py show-pages "trolldom" --context-padding 2
# Show pages individually
uv run tools/ra.py show-pages "vasa" --no-grouping
```
**Options:**
- `--max-pages N` - Maximum pages to display (default: 10)
- `--context-padding N` - Include N pages before/after each hit (default: 1)
- `--no-grouping` - Show pages individually instead of grouped by document
## Output Features
### Search Results
- **Grouped by document** for better context
- **Institution and date** information
- **Page numbers** with search hits
- **Snippet previews** with keyword highlighting
- **Browse command examples** for further exploration
### Full Page Display
- **Complete transcriptions** from ALTO XML
- **Keyword highlighting** in yellow
- **Document metadata** (title, date, hierarchy)
- **Direct links** to images, ALTO XML, and Bildvisning
- **Context pages** marked clearly
### Links Provided
- **ALTO XML** - Full transcription data
- **IIIF Images** - High-resolution document images
- **Bildvisning** - Interactive viewer with search highlighting
- **Collections & Manifests** - IIIF metadata
## Examples
### Basic Workflow
1. **Search for a keyword:**
```bash
uv run tools/ra.py search "Stockholm"
```
2. **Get full context for interesting hits:**
```bash
uv run tools/ra.py search "Stockholm" --context --max-pages 3
```
3. **Browse specific documents:**
```bash
uv run tools/ra.py browse "SE/RA/123456" --page "10-15" --search-term "Stockholm"
```
### Advanced Usage
```bash
# Comprehensive search with context
uv run tools/ra.py show-pages "handelsbalansen" --context-padding 2 --max-pages 8
# Targeted document browsing
uv run tools/ra.py browse "SE/RA/760264" --pages "1,5,10-12" --search-term "export"
# Large search with selective display
uv run tools/ra.py search "järnväg" --max 100 --max-display 30
```
## Technical Details
### Riksarkivet APIs & Data Sources
This tool integrates with multiple Riksarkivet APIs to provide comprehensive access to historical documents:
#### Current Integrations
- **[Search API](https://data.riksarkivet.se/api/records)** - Primary endpoint for full-text search across transcribed materials ([Documentation](https://github.com/Riksarkivet/dataplattform/wiki/Search-API))
- **[IIIF Collections](https://lbiiif.riksarkivet.se/collection/arkiv)** - Access to digitized document collections via IIIF standard ([Documentation](https://github.com/Riksarkivet/dataplattform/wiki/IIIF))
- **[ALTO XML](https://sok.riksarkivet.se/dokument/alto)** - Structured text transcriptions with precise positioning data
- **[IIIF Images](https://lbiiif.riksarkivet.se)** - High-resolution document images with zoom and cropping capabilities
- **[Bildvisning](https://sok.riksarkivet.se/bildvisning)** - Interactive document viewer with search highlighting
- **[OAI-PMH](https://oai-pmh.riksarkivet.se/OAI)** - Metadata harvesting for archive records and references ([Documentation](https://github.com/Riksarkivet/dataplattform/wiki/OAI-PMH))
#### Additional Resources
The [Riksarkivet Data Platform Wiki](https://github.com/Riksarkivet/dataplattform/wiki) provides comprehensive documentation for building additional MCP integrations.
#### Experimental Features
- **[Förvaltningshistorik](https://forvaltningshistorik.riksarkivet.se/Index.htm)** - Semantic search interface (under evaluation)
- **AI-Riksarkivet HTRflow** - Handwritten text recognition pipeline (PyPI package)
## Troubleshooting
### Common Issues
1. **No results found**: Try broader search terms or check spelling
2. **Page not loading**: Some pages may not have transcriptions available
3. **Network timeouts**: Tool includes retry logic, but very slow connections may time out
### Getting Help
```bash
uv run tools/ra.py --help
uv run tools/ra.py search --help
uv run tools/ra.py browse --help
uv run tools/ra.py show-pages --help
```
## MCP Server Development
### Running the MCP Server
```bash
# Install dependencies
uv sync && uv pip install -e .
# Run the main MCP server (stdio)
cd src/ra_mcp && python server.py
# Run with SSE/HTTP transport on port 8000
cd src/ra_mcp && python server.py --http
```
### Testing with MCP Inspector
Use the [MCP Inspector](https://github.com/modelcontextprotocol/inspector) to test and debug the MCP server:
```bash
# Test the server interactively
npx @modelcontextprotocol/inspector uv run python src/ra_mcp/server.py
```
The MCP Inspector provides a web interface to test server tools, resources, and prompts during development.

___
## Current MCP Server Implementation
```
This server provides access to the Swedish National Archives (Riksarkivet) through multiple APIs.
SEARCH-BASED WORKFLOW (start here):
- search_records: Search for content by keywords (e.g., "coffee", "medical records")
- get_collection_info: Explore what's available in a collection
- get_all_manifests_from_pid: Get all image batches from a collection
- get_manifest_info: Get details about a specific image batch
- get_manifest_image: Download specific images from a batch
- get_all_images_from_pid: Download all images from a collection
URL BUILDING TOOLS:
- build_image_url: Build IIIF Image URLs with custom parameters
- get_image_urls_from_manifest: Get all URLs from an image batch
- get_image_urls_from_pid: Get all URLs from a collection
TYPICAL WORKFLOW:
1. search_records("your keywords") → find PIDs
2. get_collection_info(pid) → see what's available
3. get_manifest_info(manifest_id) → explore specific image batch
4. get_manifest_image(manifest_id, image_index) → download specific image
Example PID: LmOmKigRrH6xqG3GjpvwY3
```
___