

DungBeetle is a lightweight, single binary distributed job server designed for queuing and asynchronously executing large numbers of SQL read jobs (eg: reports) against SQL databases. When the read jobs are executed, the results are written to separate ephemeral results databases (where the results of every job is its own dedicated table), enabling faster retrieval.
A prominent usecase is user facing report generation in applications where requests can be queued and reports returned asynchronously without overloading large source databases.
Features
- Supports MySQL, PostgreSQL, ClickHouse as source databases.
- Supports MySQL and PostgreSQL as result / cache databases for job results.
- Standalone server that exposes HTTP APIs for managing jobs and groups of jobs (list, post, status check, cancel jobs).
- Reads SQL queries from .sql files and registers them as jobs ready to be queued.
- Supports multi-process, multi-threaded, asynchronous distributed job queueing via a common broker backend (Redis, AMQP etc.)
Usecase
Consider an application with a very large SQL database. When there are several thousand concurrent users requesting reports from an application connected to it, every second of I/O delay during query execution can bottleneck the application and the database, causing a snowball effect.
Instead, user requests for report generations can be deferred to a job queue in the backend, there by immediately freeing up the frontend application. The reports are presented to users as they're executed (frontend polls the job's status and prevents the user from sending any more queries). DungBeetle server and worker instances also act as traffic control and prevent the primary database from being indundated with requests.
Once the reports are generated (SQL queries finish executing), it's natural for users to further transform the results by slicing, sorting, and filtering, generating additional queries to the primary database. To offset this load, these subsequent queries can be sent to the smaller, much faster results cache database. These results are of course ephemeral and can be thrown away or expired.

Concepts
Task
A task is a named SQL query that is loaded into the server on startup. Tasks are defined in .sql files in the simple goyesql format. Such queries are self-contained and produce the desired final output with neatly named columns. They can take arbitrary positional arguments for execution. A task can be attached to one or more specific databases defined in the configuration using the -- db: tag. In case of multiple databases, the query will be executed against a random one from the list, unless a specific database is specified in the API request (db). A -- queue: tag to always route the task to a particular queue, unless it's overriden by the queue param when making a job request. A -- results: tag specifies the results backend to which the results of a task will be written. If there are multiple result backends specified, the results are written a random one.