# Architecture Guide

## Introduction

Kodda is a state-of-the-art data integration platform built on several components all working together around a centralized metadata repository. These components provide web-based design and management interfaces, in memory agents, a scheduler and a logger, plus other advanced features.

### Architecture Overview

The architecture is organized around a modular repository, which is accessed in application-server mode by the Kodda web interface and in-memory agents.

### Kodda Server

The master Kodda server is an application server that provides the web-based design and management. This server can also house the metadata, agents, scheduler and logger in a single node configuration.

### Kodda Metadata

Kodda leverages a centralized metadata repository to store the details about the mappings, workflows and other objects you create in your |Kodda| instance.

### Kodda Agents

Kodda agents are lightweight in-memory processes that each perform specialized tasks. The |Kodda| server sends work packages (with source and target connections) to the agents along an integrated service bus.

### Kodda Scheduler

Kodda scheduler is an optional standalone scheduler designed to manage the timing and recurrence of mappings and workflow

### Kodda Logging

Kodda logger captures metadata and transactional history for the whole |Kodda| environment. This centralized service provides a single log interface for any level of |Kodda| scaling.

## Kodda Agents and Scaling

### Overview

Kodda supports a multi-node architecture which greatly enhances the performance of both mappings and workflows. To achieve this scaling, Kodda components (namely agents) are installed and run on separate physical or virtual nodes.

### Agent Scaling

By installing and running standalone agents on separate nodes, |Kodda| takes advantage of the additional IO, Memory and CPU cycles to maximize throughput of data from source to target

### Agent Types

Kodda has several agent types which are responsible for various tasks:

| Agent                            | Description                                                                                 |
| -------------------------------- | ------------------------------------------------------------------------------------------- |
| Default Agent (default)          | Agent used to execute browsing source and target SQL and test source and target connections |
| Generate Agent (genmap)          | Agent used to execute the   generate mappings function                                      |
| Workflow Agent (runwf)           | Agent used to execute workflow management tasks                                             |
| Mapping Agent (runmap)           | Agent used to execute all target activities of mappings, including DDL data loading         |
| Data Extractor Agent (run\_node) | Agent used to execute data extractions from source systems                                  |

### Agent Tuning

To maximize performance in a multi-node environment, |Kodda| administrators focus on scaling Data Extractor Agents first, followed by Mapping Agents and Workflow Agents as needed. Installing and running Data Extractor and Mapping Agents on as many nodes as possible is the first step in tuning |Kodda| .

There are several parameters that effect Agent performance. Two options to consider are:

\--concurrency=8\
The number of agent processes/threads can be changed using the --concurrency argument.

\--autoscale=10,3 The autoscaler adds more pool processes when there is work to do and starts removing processes when the workload is low. Auto scaling could initially be slower as process start time competes with fixed concurrency, but it will save resources during idle time.

More agent processes are usually better, but there’s a cut-off point where adding more processes affects performance in negative ways. There’s even some evidence to support that having multiple agents running may perform better than having a single large agent. For example 3 agents with 10 processes each may perform better than 1 agent with 30 processes. You need to experiment to find the numbers that works best for you, as this varies based on work load, task run times, your system configuration and performance, and other factors


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.caoanalytics.com/architecture-guide.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
