## System Architecture Design

Before explaining the architecture of the scheduling system, let's first understand the commonly used terms of the
scheduling system

### 1.Glossary

**DAG：** The full name is Directed Acyclic Graph, referred to as DAG. Tasks in the workflow are assembled in the
form of a directed acyclic graph, and topological traversal is performed from nodes with zero in-degree until
there are no subsequent nodes. An example is shown below:

![about-glossary](../../../img/new_ui/dev/about/glossary.png)

**Process definition**: Visualization of a **DAG** formed by dragging task nodes and establishing associations between them.

**Process instance**: A process instance is the instantiation of a process definition, which can be generated by
manual start or scheduled triggering. Each time a process definition runs, a process instance is generated

**Task instance**: An instantiation of a task node within a process definition, representing a specific execution
of that task.

**Task type**: Currently supports SHELL, SQL, SUB_WORKFLOW, PROCEDURE, MR, SPARK, PYTHON, DEPENDENT (
depends), and plans to support dynamic plug-in expansion, note: **SUB_WORKFLOW**  need be associated with another
workflow definition which is a separate process definition that can be started and executed separately

**Scheduling method**: The system supports scheduled triggering (based on cron expressions) and manual triggering.
Command types support: start workflow, start execution from current node, resume fault-tolerant workflow,
resume pause process, start execution from failed node, complement, timing, rerun, pause, stop, resume waiting thread.
Among them the command types **Resume fault-tolerant workflow** and **Resume waiting thread** are used by the internal
scheduling control and cannot be invoked externally.

**Scheduled**: The system adopts **quartz** distributed scheduler, and supports visual generation of cron expressions

**Dependencies**: The system not only supports simple **DAG** dependencies between predecessor and successor nodes, but also
provides **task dependent** nodes, supporting dependencies **between processes**

**Priority**: Supports priority settings for both process instances and task instances. If no priority is specified,
the system defaults to a first-in, first-out (FIFO) execution order.

**Email alert**: Support **SQL task** Query result email sending, process instance running result email alert and fault
tolerance alert notification

**Failure strategy**: For workflows with parallel task execution, the system provides two failure handling strategies.
**Continue** If a task fails, the system continues executing other parallel tasks to completion, regardless of the failure.
The overall process is marked as failed only after all parallel tasks have finished running.
**End** means that upon a task failure, the system immediately marks the process as failed and terminates any currently
running parallel tasks

**Complement**: backfilling historical data，supports **interval parallel** and **serial** two complement modes,
and two date selection methods including **date range** and **date enumeration**.

### 2.Module introduction

- dolphinscheduler-master master module, provides workflow management and orchestration.

- dolphinscheduler-worker worker module, provides task execution management.

- dolphinscheduler-alert alarm module, providing AlertServer service.

- dolphinscheduler-api web application module, providing ApiServer service.

- dolphinscheduler-common General constant enumeration, utility class, data structure or base class

- dolphinscheduler-dao provides operations such as database access.

- dolphinscheduler-extract dolphinscheduler extract module, providing master/worker/alert sdk.

- dolphinscheduler-service service module, including Quartz, Zookeeper, log client access service, easy to call server
  module and api module

- dolphinscheduler-ui front-end module

### Sum up

From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation
ideas of the big data distributed workflow scheduling system — DolphinScheduler. To be continued

