Guide to Data Ingestion in RDA Fabric
1. Ingesting Data into RDAF Using Event Gateway
RDA Event Gateway is a RDA Fabric component that can be deployed in Cloud or on-premises / edge locations to ingest data from various sources.
RDA Event Gateway supports following endpoints types:
Endpoint Type | Protocols | Description |
---|---|---|
syslog_tcp | TCP/SSL | Syslog or Syslog like event ingestion via TCP or SSL |
syslog_udp | UDP | Syslog or Syslog like event ingestion via UDP |
http | HTTP/HTTPS | JSON or Plain Text formatted events via Webhook. Supports HTTP operations POST & PUT |
tcp_json | TCP/SSL | JSON encoded messages with one message per line |
filebeat | HTTP/HTTPS | Elasticsearch Filebeat / Winlogbeat based ingestion of data |
file | Ingestion of data from one or more file(s) or folder(s) |
RDA Event Gateway Endpoint configuration example for Webhook:
Explanation of configuration fields:
name
: Name of the endpoint. Must be uniqueenabled
: If set tofalse
, Event Gateway will shutdown the endpointtype
: Type of Endpoint. For this examplehttp
secure
: Iftrue
runs the endpoint in HTTPS modecontent_type
: Type of content to expect in incoming payload. Possible values are 'auto', 'json', 'text'. If set to auto, endpoint will detect the content using Content-Type HTTP header.port
: TCP port to listen for datastream
: Name of the RDA Stream where the data will be published for further consumption by RDA Pipelines or Persistent Streamsattrs
: Optional dictionary of attributes that will be added to each message's payload. Event Gateway automatically inserts attributes:- Following attributes are automatically inserted into each message:
rda_gw_ep_type
: Endpoint Type (in this example: 'http')rda_gw_ep_name
: Endpoint Namerda_gw_timestamp
: Ingested timestamp in ISO formatrda_content_type
: HTTP Content-Type header valuerda_url
: HTTP URLrda_path
: Path part of HTTP URLrda_gw_client_ip
: IP Address of the client that posted the datarda_user_agent
: User-Agent of the clientrda_stream
: RDA Stream where this message is being forwarded to
Automatic Archival of Data from Event Gateway:
RDA Event Gateway can be configured to automatically archive using RDA Log Archive feature.
Following is an example snippet for main.yml configuration file:
Note
Log Archive repository (demo_logarchive
) must be pre-created using CLI or RDA Portal.
2. Ingesting Data into RDAF Using Message Queues
RDA Pipelines can continuously ingest data from many types of message queues. Some of the most commonly used approaches are:
See above pages for list of bots available for ingesting data from different types of queues.
3. Ingesting Data into RDAF Using Purpose Built Bots
RDA Provides extensive set of bots to retrieve data from various sources. Following are some of the integrations available:
-
APM
-
Cloud Infrastructure
-
Databases
-
File & Object Storages
- Files & URLs: Supports many formats like CSV, ZIP, GZIP, Parquet
- Minio/ S3
-
Generic APIs
-
Container Infrastructure
-
ITOM & Observability
-
ITSM
-
Infrastructure
- Arista Bigswitch
- Cisco ACI
- Cisco Intersight
- Cisco IoS
- Cisco Meraki
- Cisco NXOS
- Cisco Support
- Cisco UCS CIMC
- Cisco UCS Manager
- Cisco Unified Call Manager
- EMC Isilon
- EMC Unity
- EMC XtremIO
- HPE 3Par
- IBM AIX
- Linux & Docker
- NetApp ONTAP 7-Mode
- NetApp ONTAP C-Mode
- Openstack
- Pure Storage
- Redfish
- VMware vCenter
- Windows
4. Ingesting Data Using Staging Area
RDA Pipelines can continuously ingest data from staging area (for example S3 or minio). Data can be ingested directly from files in a specified bucket and a folder path (or prefix).
Staging area definition specifies where data files are stored so that the data in the files can be ingested into RDA Fabric.
Storage Location
- Storage area definitions are stored in RDAF Object Storage.
- The staging area data can be in RDAF Object Storage or any external storage (S3 or minio). For external staging area, the user needs to create credential of type
stagingarea-ingest
for RDA platform to access the bucket.
Related Bots
Related RDA Client CLI Commands
staging-area-add Add or update staging area
staging-area-delete Delete a staging area
staging-area-get Get YAML data for a staging area
staging-area-list List all staging areas.
See RDA CLI Guide for installation instructions
Sample YAML: For staging area in RDA platform
Sample YAML: For staging area that is external (S3 or minio)
Managing through RDA Portal
- In RDA Portal, Click on left menu Data
- Click on 'View Details' next to Data Staging Area
Managing through RDA Studio
- Studio does not have any user interface for managing the staging area.
5. Ingesting Data once from location
RDA Pipelines can also ingest data once from a given location (S3 or minio). Data can be ingested directly from files in a specified bucket and a folder path (or prefix).
For external location, the user needs to create credential of type stagingarea-ingest
for RDA platform to access the bucket.
Related Bots
6. Ingesting Data from Kafka
Data can be ingested into persistent streams via Kafka.
You can provide Kafka as the messaging platform to read data from and then write to Open Search the following way when you are creating the persistent stream in the attributes section of the UI:
On the Left Side Menu Bar Click on the Configuration → RDA Administration → Persistent Streams → Add → Attributes (please add the below code) → Save
{
"messaging_platform_settings": {
"platform": "kafka",
"credential_name": "mykafka",
"kafka-params": {
"topics": [
"kafka_topic1",
"kafka_topic2"
],
"auto.offset.reset": "latest"
}
}
}
To add kafka-v2 Credentials from the UI: Click on Configuration → RDA Integrations → Credentials → Add → Save
Parameter Name |
Description |
---|---|
credential_name | Name of the credential of type kafka-v2 |
topics | One or more kafka topics to receive data from |
auto.offset.reset | “earliest” or “latest”, Default: “latest” |
Related RDA Client CLI Commands: