Table of Contents
Widely distributed open network resources are the norm in today's network environments. These resources make up a complex grouping of networks, servers, desktops and applications each with specific requirements and functions.
The Nsys framework infrastructure represents a distributed network environment where the network nodes are connected and providing the system resources for computation. The automation process in computing tasks distribution is the next logical step in framework evolution. The reason why the Nsys Workload Scheduler has been designed.
The Nsys Workload Scheduler is new component for Nsys Framework Infrastructure. The main goal of the scheduler is to provide support for automation process in distribution of tasks across the framework infrastructure. It represents task distribution for system management, distributed computing, monitoring or any another task which can do a specific job.
The NsysDaemon represents the main system service responsible for the network node management. The daemon does not provide any special functionality expect the base system support necessary to keep it operational (e.g. possibility to load/unload plug-ins, daemon management include possibility to do a remote upgrade, access to host system resources). The Management Agent's plugins are able to add the new functionality.
The Management Agent is the first of the two major subsystems hosted within daemon and it is responsible for the plugins management (component Manager). The first subsystem is providing access to the host resources (responsibility of the component Core), interacting with another daemons through Web Service, and providing a cooperation with the Data Processor subsystem.
The second subsystem, the Data Processor is responsible for the data processing of the specific management agent plugin. The NeuralBag is basic unit used for communication and it contains data for processing inside of data processor. The results are store to storage which can be for instance a database or it can be send to another daemon for additional processing.
The daemon uses the bag for information exchange between another daemons in the framework infrastructure. The management agent is accepting and sending bags to infrastructure through the Grid component.
Each Management Agent plugin is able to use for data processing the stages representing individual phases in data processor. The stages are used for the intercommunication of the bags during each phase.
Nice example of the data processor stages usage is the plugin implementing a web crawler functionality which has three different stages in data processor. The first stage is responsible for a web page download, the page content is stored in the bag, the second stage does a web page analyze based on data received from bag, and the last stage does storing of the results of the previously analyzed page to storage.
2.0 Nsys Workload Scheduler
The scheduler complex functionality is implemented by two plugins. The first of them is the Nsys.TaskScheduler which is the entry point of the scheduler cluster for applications requiring to schedule tasks for execution on the resources available in the cluster. The collected data “results” from tasks are stored in database server. The tasks definitions are stored in database as well. The second plugin Nsys.TaskEngine represents a task execution engine in the cluster of task engines managed by scheduler.
The Task Scheduler is the entry point to tasks distributed oriented infrastructure and it is scheduler implementation within daemon. There are four components within Task Scheduler subsystem.
The Task Scheduler subsystem is responsible for dispatching tasks to resources available in the task engine cluster (component Dispatcher), distributing loads to task engines in cluster (component Task Engines Manager), monitoring tasks execution in task engines (component Task Engines Heartbeat), storing collected data “results” from task engines storage to central storage on scheduler for final data processing (component Result Storage) in data processor.
The Task Engine represents a task execution engine in the cluster of task engines managed by scheduler. The Task Engine subsystem is responsible for dispatching tasks to Batchman (component Dispatcher), managing the set of Batchmans running on a node (component Batchmans Manager), monitoring tasks execution in Batchman (component Batchmans Heartbeat), storing collected data “results” to a storage (component Result Storage).
The Batchman executes tasks requests dispatched by the Task Engine. The tasks can run (component Task Processor) in 1:N mode which means execution of more tasks per a thread or in 1:1 mode where each task is running in a dedicated thread. In some scenarios is better to use 1:N mode (Light Weight Process - LWP), for another the 1:1 mode represents dedicated environment for computing tasks (Heavy Weight Process – HWP). The executing tasks are monitored by the Task Heartbeat for timeout conditions.
Task execution within a Batchman should not expect any state (no preexisting initialization) to be persisted between invocations. The task definition is implemented by a Task interface. Individual tasks implementation are part of the custom plugin and they are deployed to the daemon over network.
The task represents a single unit of execution on a task engine. Each task execution in the system has a life cycle represented by following states:
When a task is picked up for execution by scheduler, its initial state is Created. Once the task is being queued for execution, the task state is updated to Waiting. The scheduler dispatcher monitors the queue for waiting tasks and based on the loads plan of task engine cluster it submits the task to a task engine. Before the task is submitted, the scheduler change task state to Running.
The Task Engine dispatchs the task to a Batchman where it is executed. In the best case scenario, if the task executes successfully, the task state is changed to the Finished. If the task is canceled by the user or if the Task Heartbeat detects that the task has time out or is not responding, its state is updated to Canceled. The similar situation applies to state Suspended.
If an error is detected during dispatching or during the task execution the state is changed to Failed.
Each task has a priority for executing. There are available three priority types (Low, Normal, High) where default priority for all tasks is Normal (if not specified). The scheduler supports the custom priority where its able to assign to a task a number and the scheduler is execution tasks from lowest to highest numbers. It is level between High and Normal priority.
The task has following base properties:
- Frequency (the recurrence frequency of the scheduled task)
- Task (The definition of the task to be scheduled)
- State (Identifies the current state of the task)
Timeout (The threshold that the Batchman will use to decide if the task will timeout)
The tasks are distributed for execution to local or remote task engines that are registered with the scheduler.
The goal of task distribution is not only computing on a node but also collecting data “results” from the computation. When a task is complete the data “results” are transferred from Batchman back to task engine, back to scheduler. The scheduler then notifies the daemon that the task has complete along with the results. The collected data are being to processed in data processor stages for specific plugin which initiates the task in scheduler.
The tasks implementation are part of the custom plugin. The task is represent by the Task interface providing base methods for task initialization, cancel, stop, execute. The sample of a task implementation is for example email configuration where the task is able to add/remove email accounts as a part of the mail server management.
3.0 Nsys Infrastructure Manager
The Nsys Infrastructure Manager (NSIM) represents an application able to monitor all running tasks in the framework infrastructure, provide base task management (ability to stop, start, cancel, re-run any task). The NSIM is simple application able to do some configuration actions (database/mail/web server management) or able to do a monitoring of network (node is up, down, ..) and/or system activity on a network node. The NSIM is web based application.