DevOps & Cloud Infrastructure Engineer — IoT / Edge AI Systems
Költségvetés: $15.0 - $25.0
HOURLY / FULL_TIME
⭐ 4.98 (53)
United States
python, devops, docker, kubernetes, automated-deployment, cicd, automation-software-release, iot-solutions-design, internet-of-things, mqtt
We are looking for a highly capable DevOps & Cloud Infrastructure Engineer with strong experience in IoT architecture, cloud-connected device fleets, and production infrastructure to help Shyld scale and operate a growing fleet of AI-powered edge devices deployed across healthcare facilities.
This role is ideal for someone who has worked on systems where physical devices, edge compute, cloud services, networking, telemetry, security, and reliability all come together. You should be comfortable thinking beyond traditional cloud infrastructure and designing the architecture needed to securely provision, monitor, update, and operate hundreds or thousands of distributed devices in real-world environments.
Responsibilities
IoT Architecture & Edge Fleet Infrastructure
Design and improve the cloud-to-device architecture for a growing fleet of AI-powered edge devices
Build infrastructure for secure device provisioning, identity management, configuration, and lifecycle management
Support reliable communication between edge devices and cloud services using patterns such as MQTT, Pub/Sub, event streaming, message queues, or similar architectures
Design scalable telemetry pipelines for device health, connectivity, logs, events, and operational status
Help define the architecture for managing device state, software versions, configuration changes, and fleet-wide rollouts
Support scaling from pilot deployments to hundreds or thousands of devices across healthcare facilities
Cloud Infrastructure & Platform Engineering
Design, deploy, and maintain scalable cloud infrastructure on Google Cloud Platform
Manage production services running on Cloud Run, GKE/Kubernetes, Pub/Sub, Cloud SQL, Artifact Registry, IAM, and VPC networking
Build highly reliable backend infrastructure for real-time IoT, AI, and robotics workloads
Optimize cloud performance, uptime, scalability, and operational cost
Edge Device & Fleet Operations
Support deployment and management of Linux-based AI edge devices, including NVIDIA Jetson Orin platforms
Design and improve OTA update pipelines, staged rollouts, rollback systems, remote monitoring, and health checks
Manage device authentication, certificates, secure communication, and access control
Improve device provisioning, onboarding, replacement, and recovery workflows
Build operational tooling to help the team understand fleet health and quickly diagnose field issues
CI/CD & Automation
Build and maintain CI/CD pipelines using GitHub Actions and related tooling
Automate cloud and device infrastructure deployment using Terraform or similar Infrastructure-as-Code tools
Improve release management, deployment safety, staging workflows, canary rollouts, and rollback strategies
Support multi-architecture builds for cloud and edge environments when needed
Observability & Reliability
Build monitoring, alerting, and logging infrastructure across both cloud services and edge devices
Improve visibility into device health, connectivity issues, AI pipelines, cloud services, OTA status, and production incidents
Develop dashboards, alerts, and runbooks for fleet operations
Work with engineering teams to improve system reliability, uptime, and incident response
Lead root-cause investigations for production and field issues
Security & Compliance
Implement cloud, IoT, and device security best practices
Manage secrets, certificates, IAM policies, mTLS, secure communications, and device identity
Support HIPAA/SOC2-oriented infrastructure and operational requirements
Improve auditability, access control, logging, and operational security posture
Help ensure that edge devices and cloud services follow least-privilege and secure-by-default principles
Cross-Functional Collaboration
Work closely with AI engineers, robotics engineers, backend developers, product, field operations, and customer success teams
Support deployment readiness for hospitals and enterprise healthcare environments
Translate real-world field issues into infrastructure, monitoring, automation, and reliability improvements
Megnyitás Upworkön