Architecture Guide

Introduction

Kodda is a state-of-the-art data integration platform built on several components all working together around a centralized metadata repository. These components provide web-based design and management interfaces, in memory agents, a scheduler and a logger, plus other advanced features.

Architecture Overview

The architecture is organized around a modular repository, which is accessed in application-server mode by the Kodda web interface and in-memory agents.

Kodda Server

The master Kodda server is an application server that provides the web-based design and management. This server can also house the metadata, agents, scheduler and logger in a single node configuration.

Kodda Metadata

Kodda leverages a centralized metadata repository to store the details about the mappings, workflows and other objects you create in your |Kodda| instance.

Kodda Agents

Kodda agents are lightweight in-memory processes that each perform specialized tasks. The |Kodda| server sends work packages (with source and target connections) to the agents along an integrated service bus.

Kodda Scheduler

Kodda scheduler is an optional standalone scheduler designed to manage the timing and recurrence of mappings and workflow

Kodda Logging

Kodda logger captures metadata and transactional history for the whole |Kodda| environment. This centralized service provides a single log interface for any level of |Kodda| scaling.

Kodda Agents and Scaling

Overview

Kodda supports a multi-node architecture which greatly enhances the performance of both mappings and workflows. To achieve this scaling, Kodda components (namely agents) are installed and run on separate physical or virtual nodes.

Agent Scaling

By installing and running standalone agents on separate nodes, |Kodda| takes advantage of the additional IO, Memory and CPU cycles to maximize throughput of data from source to target

Agent Types

Kodda has several agent types which are responsible for various tasks:

Agent

Description

Default Agent (default)

Agent used to execute browsing source and target SQL and test source and target connections

Generate Agent (genmap)

Agent used to execute the generate mappings function

Workflow Agent (runwf)

Agent used to execute workflow management tasks

Mapping Agent (runmap)

Agent used to execute all target activities of mappings, including DDL data loading

Data Extractor Agent (run_node)

Agent used to execute data extractions from source systems

Agent Tuning

To maximize performance in a multi-node environment, |Kodda| administrators focus on scaling Data Extractor Agents first, followed by Mapping Agents and Workflow Agents as needed. Installing and running Data Extractor and Mapping Agents on as many nodes as possible is the first step in tuning |Kodda| .

There are several parameters that effect Agent performance. Two options to consider are:

--concurrency=8 The number of agent processes/threads can be changed using the --concurrency argument.

--autoscale=10,3 The autoscaler adds more pool processes when there is work to do and starts removing processes when the workload is low. Auto scaling could initially be slower as process start time competes with fixed concurrency, but it will save resources during idle time.

More agent processes are usually better, but there’s a cut-off point where adding more processes affects performance in negative ways. There’s even some evidence to support that having multiple agents running may perform better than having a single large agent. For example 3 agents with 10 processes each may perform better than 1 agent with 30 processes. You need to experiment to find the numbers that works best for you, as this varies based on work load, task run times, your system configuration and performance, and other factors

PreviousData Dictionary NextRelease Notes

Last updated 4 years ago