sources.toml Reference
The sources.toml file defines which data sources the indexer processes. It’s located at data/config/sources.toml.
File Structure
Section titled “File Structure”[defaults]schema_version = "1.0"data_dir = "data/fixtures" # or "data/sources" for production
[[sources]]name = "pinboard"enabled = truedb_path = "pinboard/chronicle/chronicle.db"category = "reading"
[[sources.queries]]table = "events"entry_type = "bookmark"action_type = "BookmarkAction"object_type = "WebPage"sql = """SELECT source_id as external_id, json_extract(object, '$.name') as title, json_extract(object, '$.description') as content, end_time as occurred_at, json_extract(object, '$.url') as urlFROM events WHERE type = 'BookmarkAction'"""Defaults Section
Section titled “Defaults Section”| Field | Required | Description |
|---|---|---|
schema_version | No | Config format version (currently “1.0”) |
data_dir | No | Base directory for source databases. Defaults to data/sources. Use data/fixtures for testing. |
Source Configuration
Section titled “Source Configuration”Each [[sources]] block defines a data source.
| Field | Required | Description |
|---|---|---|
name | Yes | Unique identifier for the source |
enabled | No | Whether to process this source. Defaults to true |
db_path | Yes | Path to SQLite database, relative to data_dir |
category | No | Category for filtering. See categories below |
default_visibility | No | Visibility for items without explicit visibility. Options: public, unlisted, private, secret |
Categories
Section titled “Categories”| Category | Use for |
|---|---|
reading | Bookmarks, highlights, RSS |
music | Listening history, scrobbles |
social | Posts, messages, replies |
comms | Email, chat, DMs |
productivity | Tasks, time tracking |
browse | Browser history, searches |
notes | Notes, documents |
photos | Images, screenshots |
location | Check-ins, GPS logs |
code | Commits, issues, PRs |
ai | AI conversations |
curation | Collections, boards |
video | Watch history |
calendar | Events, meetings |
Query Configuration
Section titled “Query Configuration”Each source can have one or more [[sources.queries]] blocks.
| Field | Required | Description |
|---|---|---|
table | No | Table name for validation (optional) |
entry_type | Yes | Type of entry (e.g., “bookmark”, “listen”, “note”) |
action_type | No | Schema.org action type (e.g., “BookmarkAction”) |
object_type | No | Schema.org object type (e.g., “WebPage”) |
sql | Yes | SQL query to extract events |
Required SQL Columns
Section titled “Required SQL Columns”Your SQL query should return these columns:
| Column | Required | Description |
|---|---|---|
external_id | Yes | Unique ID within the source. Also accepts source_id or id |
title | Yes | Display title |
content | No | Full text for search indexing |
occurred_at | Yes | When the event happened. Also accepts timestamp, created_at, date, etc. |
url | No | Link to original item |
visibility | No | Item visibility level |
Timestamp Handling
Section titled “Timestamp Handling”The indexer automatically parses many timestamp formats:
- Unix timestamps:
1704067200(seconds) or1704067200000(milliseconds) - ISO 8601:
2024-01-01T00:00:00Z - Date strings:
2024-01-01 - Various formats:
Jan 1, 2024,01/01/2024, etc.
If parsing fails, the event uses the current time with a warning.
Aggregation
Section titled “Aggregation”For high-volume sources (like time tracking), you can enable aggregation:
[[sources]]name = "timing"enabled = truedb_path = "timing/chronicle/chronicle.db"
[sources.aggregation]strategy = "daily"key_fields = ["title"]Available strategies:
- daily — Group events by date + key fields (e.g., 2.2M records → ~50K daily aggregates)
- hourly — Group events by hour + key fields
- session — Group events with gaps less than
time_bucketseconds (default 30 minutes)
| Field | Description |
|---|---|
strategy | Aggregation period: daily, hourly, session, or none |
key_fields | Fields to group by |
time_bucket | For session strategy: gap threshold in seconds (default 1800 = 30 min) |
Configured Sources
Section titled “Configured Sources”The default configuration includes 23 sources:
Reading
Section titled “Reading”- pinboard — Bookmarks from Pinboard
- readwise — Highlights from Readwise
- spotify — Listening history from Spotify
- lastfm — Scrobbles from Last.fm
- apple-podcasts — Podcast episodes
Messaging
Section titled “Messaging”- imessage — iMessage conversations
- linkedin — LinkedIn messages
Productivity
Section titled “Productivity”- things — Tasks from Things app
- timing — App usage from Timing
Browse
Section titled “Browse”- safari — Safari browsing history
- chrome — Chrome history (from Google Takeout)
- google-search — Search queries
- apple-notes — Notes from Apple Notes
- notion — Pages from Notion
Photos
Section titled “Photos”- apple-photos — Photos with metadata
Location
Section titled “Location”- foursquare — Check-ins
- github — GitHub activity
- claude — Claude conversation exports
Curation
Section titled “Curation”- arena — Are.na blocks and channels
Social
Section titled “Social”- twitter-** — Twitter/X archive (supports multiple accounts)
- youtube — YouTube watch history
Calendar
Section titled “Calendar”- gcal — Google Calendar events
Example: Adding a Custom Source
Section titled “Example: Adding a Custom Source”Here’s a complete example for adding a custom notes database:
[[sources]]name = "my-notes"enabled = truedb_path = "my-notes/notes.db"category = "notes"default_visibility = "private"
[[sources.queries]]table = "notes"entry_type = "note"action_type = "CreateAction"object_type = "NoteDigitalDocument"sql = """SELECT id as external_id, title, body as content, strftime('%s', created_at) as occurred_at, NULL as urlFROM notesWHERE deleted_at IS NULLORDER BY created_at DESC"""Key points:
- Use
strftime('%s', ...)to convert datetime columns to Unix timestamps - Filter out deleted items in the WHERE clause
- Return
NULLfor optional columns you don’t have - Set
default_visibilityto control indexing
Validating Configuration
Section titled “Validating Configuration”Check your configuration before indexing:
cd packages/otso-indexercargo run --release -- validateThis checks:
- TOML syntax
- Required fields
- Table existence
- SQL syntax (via EXPLAIN)
CLI Reference
Section titled “CLI Reference”# Build all enabled sourcescargo run --release -- build
# Build a specific sourcecargo run --release -- build --source pinboard
# Rebuild search index only (skip event store)cargo run --release -- build --meili-only
# Show statisticscargo run --release -- stats
# Validate configurationcargo run --release -- validate
# Full rebuild from event storecargo run --release -- rebuild