← Вакансии

Software Engineer – Cloud Systems & Automation

Бюджет: - HOURLY / FULL_TIME ⭐ 5.00 (13) USA

docker, amazon-web-services, kubernetes, git, dynamodb, cicd, cloud-architecture

# Software Engineering Intern – AWS Queue Processing Systems ## Overview We are building a distributed image-processing platform running on AWS and Kubernetes. The system processes jobs from a queue, executes workloads inside Kubernetes pods, and stores generated assets in S3. We are looking for a software engineering intern who can take ownership of existing infrastructure, improve reliability, and implement production-grade distributed systems patterns. The ideal candidate is comfortable working across backend systems, cloud infrastructure, containerized workloads, and production operations. --- ## Responsibilities * Work with AWS services including SQS, S3, IAM, CloudWatch, and DynamoDB * Develop Kubernetes-based workers that process tasks from distributed queues * Implement fault-tolerant job execution and retry mechanisms * Prevent duplicate processing, race conditions, and data inconsistencies * Design monitoring, logging, and observability systems * Write technical documentation and deployment instructions * Improve scalability, reliability, and performance of the queue-processing architecture * Collaborate with the team on infrastructure and backend system design --- ## Preferred Skills * Python, Go, or Node.js * AWS cloud services (SQS, S3, IAM, CloudWatch, DynamoDB) * Docker and Kubernetes * Distributed systems fundamentals * Concurrency, locking, and fault tolerance concepts * Git and CI/CD workflows * API development and backend engineering * Strong debugging and problem-solving skills --- ## Success Criteria The candidate can independently modify production systems, safely process queue messages, handle failures correctly, deploy changes to Kubernetes environments, and contribute meaningful improvements to system reliability and scalability. --- # One-Week Paid Trial Period All candidates will begin with a one-week paid trial period ($1,000 USD). During this week, candidates will be expected to complete the two engineering projects outlined below. These projects are designed to evaluate technical ability, ownership, communication, code quality, infrastructure knowledge, and the ability to deliver production-ready systems independently. Successful completion of the trial period may lead to continuation of the internship and expanded responsibilities within the engineering team. **Trial Compensation:** $1,000 USD --- # Trial Project 1: Reliable Queue-Based Image Processing Worker ## Objective Build a Kubernetes worker that consumes tasks from AWS SQS, executes image-generation jobs, uploads successful outputs to S3, and safely handles failures. --- ## Requirements ### Queue Consumption * Poll messages from an AWS SQS queue * Multiple worker pods may run simultaneously * Each task should be processed exactly once whenever possible ### Successful Execution When a task completes successfully: * Upload the generated image to S3 * Store metadata including: * Job ID * Processing timestamp * S3 object path * Delete the message from SQS ### Failed Execution When processing fails: * Do not delete the SQS message * Allow the message to become visible again * Support automatic retries * After a configurable retry limit, send the message to a Dead Letter Queue (DLQ) ### Race Condition Prevention Implement protection against duplicate processing. Requirements: * Two workers must never process the same job simultaneously * Handle worker or pod crashes gracefully * Prevent duplicate S3 uploads for the same Job ID Suggested approaches: #### Option A: Distributed Locking Use DynamoDB as a lock table: * Acquire lock before processing * Lock expires automatically via TTL #### Option B: Job State Management Maintain job states: * PENDING * PROCESSING * COMPLETED * FAILED Workers must perform atomic state transitions. ### Visibility Timeout Management Worker must: * Extend SQS visibility timeouts for long-running jobs * Prevent another worker from receiving the same message while it is actively being processed ### Kubernetes Deployment Deploy the worker as: * Docker container * Kubernetes Deployment * Configurable replica count --- ## Deliverables * Source code * Dockerfile * Kubernetes manifests * README documentation * Architecture diagram * Local testing instructions --- ## Bonus * CloudWatch metrics * Structured logging * Horizontal Pod Autoscaler * OpenTelemetry tracing * Idempotent upload handling --- # Trial Project 2: Friend Invitation SMS System ## Objective Implement an invitation system that allows existing users to invite friends via SMS. --- ## Requirements ### User Flow 1. User enters a friend's phone number 2. User clicks "Invite Friend" 3. Backend validates the phone number 4. System sends an SMS invitation 5. Invitation event is recorded in the database ### SMS Service Use AWS SNS or Twilio. The solution should: * Send SMS messages successfully to valid phone numbers * Handle provider failures gracefully * Log delivery attempts * Prevent duplicate invitations ### Database Schema Suggested fields: #### Invitations * invitation_id * inviter_user_id * phone_number * invite_code * status * created_at * sent_at ### Rate Limiting Implement protections against abuse: * Maximum 10 invites per user per day * Maximum 1 invite per phone number every 24 hours * Reject invalid phone numbers ### Invitation Codes Generate unique invite codes. Example: APP-8F3K92 Referral URL: [https://app.example.com/invite/APP-8F3K92](https://app.example.com/invite/APP-8F3K92) ### SMS Template Example: "John invited you to try AppName. Sign up using this link: [https://app.example.com/invite/APP-8F3K92](https://app.example.com/invite/APP-8F3K92)" ### API Endpoint POST /api/invitations Request: ```json "phoneNumber": "+15551234567" ``` Response: ```json "success": true, "inviteCode": "APP-8F3K92" ``` ### Error Handling Handle: * Invalid phone numbers * SMS provider failures * Duplicate invitations * Rate-limit violations --- ## Deliverables * API implementation * Database schema * SMS integration * Unit tests * API documentation --- ## Bonus * Invitation acceptance tracking * Referral rewards system * SMS delivery status tracking * Queue-based SMS sending using SQS --- ## Evaluation Criteria Candidates will be evaluated on: * Code quality and maintainability * System design decisions * Reliability and fault tolerance * Documentation quality * Testing coverage * AWS and Kubernetes knowledge * Communication and ownership * Ability to independently execute and deliver working systems We care more about engineering judgment, reliability, and problem-solving ability than perfect implementation details.
Открыть заказ