Description
EXPERIENCE: 5–10 Years
Core Responsibilities:
• Pipeline Management: Maintain high-throughput streaming pipelines to ingest logs from various sources (Firewalls, Cloud, Endpoints) to a central destination.
• Log Normalization: Write parsers to convert raw, messy logs into standard schemas (e.g., OCSF or ECS) for consistent querying. • Cost Optimization: Implement routing logic to send "high-value" data to the SIEM and "bulk" data to low-cost Object Storage (Data Lake).
• Data Preparation: Clean and structure data to enable AI/ML detection models and advanced analytics.
Must-Have Skills:
• Data Engineering: Proficiency in Python (for ETL) and SQL (for complex querying).
• Streaming Tech: Experience with Message Queues (e.g., Kafka, Pub/Sub) and stream processing concepts.
• Log Handling: Mastery of Regex and log parsing strategies for standard formats (Syslog, CEF, JSON).
• Storage Architecture: Understanding of Data Lake principles (Parquet/Avro formats) vs. Data Warehouses.
Preferred / Nice to Have:
• Experience with Vector Databases for storing embeddings.
• Knowledge of Log Observability/Routing tools (middleware that routes logs).
• Familiarity with Big Data frameworks (e.g., Spark, Flink).

Clearwater serves a diverse and growing base of customers across the healthcare ecosystem, including several of the nation’s largest health systems as well as a large universe of regional hospitals, physician practice management groups, digital health and other healthcare technology companies, medical device manufacturers, and business service providers. Our mission is to help those organizations move to a more secure, compliant, and resilient state so they can achieve their mission.