Architecture Guide
Introduction
Kodda is a state-of-the-art data integration platform built on several components all working together around a centralized metadata repository. These components provide web-based design and management interfaces, in memory agents, a scheduler and a logger, plus other advanced features.
Architecture Overview
The architecture is organized around a modular repository, which is accessed in application-server mode by the Kodda web interface and in-memory agents.
Kodda Server
The master Kodda server is an application server that provides the web-based design and management. This server can also house the metadata, agents, scheduler and logger in a single node configuration.
Kodda Metadata
Kodda leverages a centralized metadata repository to store the details about the mappings, workflows and other objects you create in your |Kodda| instance.
Kodda Agents
Kodda agents are lightweight in-memory processes that each perform specialized tasks. The |Kodda| server sends work packages (with source and target connections) to the agents along an integrated service bus.
Kodda Scheduler
Kodda scheduler is an optional standalone scheduler designed to manage the timing and recurrence of mappings and workflow
Kodda Logging
Kodda logger captures metadata and transactional history for the whole |Kodda| environment. This centralized service provides a single log interface for any level of |Kodda| scaling.
Kodda Agents and Scaling
Overview
Kodda supports a multi-node architecture which greatly enhances the performance of both mappings and workflows. To achieve this scaling, Kodda components (namely agents) are installed and run on separate physical or virtual nodes.
Agent Scaling
By installing and running standalone agents on separate nodes, |Kodda| takes advantage of the additional IO, Memory and CPU cycles to maximize throughput of data from source to target
Agent Types
Kodda has several agent types which are responsible for various tasks:
Agent
Description
Default Agent (default)
Agent used to execute browsing source and target SQL and test source and target connections
Generate Agent (genmap)
Agent used to execute the generate mappings function
Workflow Agent (runwf)
Agent used to execute workflow management tasks
Mapping Agent (runmap)
Agent used to execute all target activities of mappings, including DDL data loading
Data Extractor Agent (run_node)
Agent used to execute data extractions from source systems
Agent Tuning
To maximize performance in a multi-node environment, |Kodda| administrators focus on scaling Data Extractor Agents first, followed by Mapping Agents and Workflow Agents as needed. Installing and running Data Extractor and Mapping Agents on as many nodes as possible is the first step in tuning |Kodda| .
There are several parameters that effect Agent performance. Two options to consider are:
--concurrency=8 The number of agent processes/threads can be changed using the --concurrency argument.
--autoscale=10,3 The autoscaler adds more pool processes when there is work to do and starts removing processes when the workload is low. Auto scaling could initially be slower as process start time competes with fixed concurrency, but it will save resources during idle time.
More agent processes are usually better, but there’s a cut-off point where adding more processes affects performance in negative ways. There’s even some evidence to support that having multiple agents running may perform better than having a single large agent. For example 3 agents with 10 processes each may perform better than 1 agent with 30 processes. You need to experiment to find the numbers that works best for you, as this varies based on work load, task run times, your system configuration and performance, and other factors
Last updated