Overview:

TekWissen is a global workforce management provider throughout India and many other countries in the world. The below job opportunity is to one of our clients who is a part of a trusted global innovator of IT and business services headquartered in Tokyo. We help clients transform through consulting, industry solutions, business process services, IT modernization and managed services. This client enables us to move confidently into the digital future. This client committed to Long Term success and combine global reach with local client attention to serve them in over 50 Countries.

Position: Data Engineer - WkStrm3 - Syslog-NG

Location: Bangalore/Gurgaon

Work Type: Hybrid

Job Type: Full Time

Job Description:

The client is seeking a Streaming Integration Engineer to own the two streaming ingestion workstreams of the PNC Bank Hadoop-to-Iceberg POC.
This role is responsible for designing and delivering production-grade PySpark Structured Streaming pipelines that ingest data into Apache Iceberg tables operating under specific technical constraints.
For example, workstream 2 requires building a Confluent Kafka-to-Iceberg ingestion application using only Apache-supported APIs. PNC will not permit the use of the unsupported Confluent Iceberg Sink Connector.
Additionally, workstream 3 requires delivering a syslog-ng-to-Iceberg batch ingestion pipeline via rolling log files, as syslog-ng has no native Iceberg sink.
The engineer will work closely with GitHub CoPilot to scaffold, iterate, test, and document the streaming application code acting as the technical reviewer and subject matter expert who ensures AI-generated pipelines are production-ready, PNC-compliant, and correctly integrated with the Iceberg catalog and Protegrity tokenization layer.

Workstream 3 syslog-ng to Iceberg

Implement Option 2 (Batch Flow): configure syslog-ng to write JSON-formatted logs to rolling daily files on HDFS/S3-compatible storage, then build a PySpark file-source streaming job to read and parse these files incrementally into an Iceberg table
Ensure schema alignment between syslog-ng JSON output fields (HOST, MSGHDR, MSG, and RFC5424 metadata) and the target Iceberg table definition
Add production-grade error handling: dead-letter queue for malformed log records, alerting on parse failures, and graceful recovery from checkpoint corruption
Write unit tests validating HOST/MSGHDR/MSG field mapping, schema conformance, and ingestion idempotency across overlapping file windows

Cross-Cutting Responsibilities

Collaborate with the Data Migration Engineer (Workstream 1) to ensure Iceberg catalog configuration, partition strategies, and compaction schedules are consistent across all three workstreams
Review and refine AI-generated PySpark pipelines to ensure they meet PNC coding standards, security requirements, and performance targets
Contribute to the POC close-out stakeholder demo, presenting both streaming pipeline results with live data flow demonstrations

Minimum Skills Required: Log Ingestion & syslog-ng

Experience with syslog-ng or equivalent Linux log aggregation tools (rsyslog, Fluentd, Logstash) including configuration of sources, destinations, filters, and output templates
Familiarity with RFC5424 syslog format and JSON-based log structuring for downstream analytical consumption
Experience designing log-to-data-lake pipelines using file-based intermediary storage (HDFS, S3, Azure Blob) with Spark file-source streaming

TekWissen Group is an equal opportunity employer supporting workforce diversity

Data Engineer - WkStrm3 - Syslog-NG

TekWissen India

Job Description

Services you might be interested in

Improve Your Resume Today