Senior AWS Infra Engineer — Infrastructure Review of Production Healthcare SaaS Stack

Budget: $400.0 FIXED / ⭐ 5.00 (11) United States

amazon-web-services, devops, aws-lambda, python, amazon-ec2

This is a review of our infrastructure as documented in a 20 page spec. We're a pre-launch healthcare SaaS startup. We've prepared a detailed (20-page) specification of our production AWS infrastructure and want an experienced engineer to review it and answer a focused set of 20 questions about whether it's a solid, scalable, production-ready foundation, with candid commentary on anything you'd flag. This is a document review, not a hands-on account audit - we're providing a complete written spec. You're a fit if you have: 5+ years building/running production AWS infrastructure Direct experience with healthcare or other PHI/regulated workloads on AWS (please describe a specific example) Deep familiarity with ECS Fargate, RDS PostgreSQL, CloudFront/WAF, Cognito, IAM least-privilege, and CDK or similar IaC The ability to form and defend an independent opinion - we want disagreement and pushback where warranted, not validation Scope & deliverable: Read the provided spec (plus a few supporting domain docs available on request) Answer 20 specific questions on a Solid/Good/Adequate/Concern scale, each with 1–3 sentences of reasoning A short prioritized list of anything you'd address before or shortly after launch A 15–20 minute call to walk us through your findings and reasoning (Required. ChatGPT written assessments will not be accepted. We want to make sure we are getting YOUR expertise. Not AI's, which we can do ourselves.) TIMELINE: We need this completed by Friday of this week. PLEASE ONLY RESPOND if you are prepared to engage immediately and complete the job by Friday, These are the questions we are asking, so you can see the scope: 1. Foundation & completeness Q1. For a product like this — web app, backend services, and a database serving business customers — which core infrastructure building blocks are present, and is anything missing or an ill-fitting choice for the job? Q2. Are the technology choices (cloud hosting, containers, managed database, CDN, auth) current and standard, or are there choices here you'd question? Q3. Does the setup read as architected by a capable team, or are there signs it was pieced together ad hoc? Point to specifics either way. 2. Scaling with the business Q4. Can this take on real paying customers as it stands, and grow from a handful to dozens and then hundreds without needing to be rebuilt? Where would the first rebuild pressure show up? Q5. As usage grows or spikes, what happens — does it absorb load, or where does it first slow down or fail? Q6. How will the database hold up as data and concurrent users grow? What's the first bottleneck you'd expect, and at roughly what scale? 3. Security Q7. Is anything reachable from the open internet that shouldn't be? Identify any exposed entry points. Q8. How is sensitive data — especially patient/health information — handled for encryption, access control, and access logging? Where does it fall short of what you'd want? Q9. What protections exist against everyday threats (credential-stuffing the login, malicious or junk traffic, etc.), and where are the gaps? Q10. Given this handles health information, are the protections appropriate for that sensitivity — the controls a HIPAA-minded reviewer would expect? Name anything you'd consider table-stakes that's missing or weak. 4. Reliability & data safety Q11. If a server or the database fails, what actually happens — does it keep running and recover, and how do you know the failover and recovery paths would work rather than just being configured? (The spec claims Multi-AZ and PITR; assess whether the evidence supports confidence that recovery would actually succeed.) Q12. Is the data backed up, and is there a credible basis to believe it could be restored? What would you want to see to be confident? 5. Running it day-to-day Q13. Is there enough monitoring and alerting that the team would learn about a problem before customers do? Any blind spots? Q14. Is the infrastructure defined in a repeatable, version-controlled way — not fragile or dependent on one person — and are changes deployed safely? Flag anything managed by hand or outside the codebase. Q15. Are operating costs sensible for this stage, with no obvious runaway risks? 6. Judgment & tradeoffs Q16. Bottom line. How would you characterize this foundation for taking to market with paying customers — ready as-is, ready with minor cleanup, or material concerns? Give your own verdict, not a restatement of the spec's. Q17. Sequencing. The spec lists a set of open items (a container image with critical/high CVEs, an over-privileged dev role, no AWS Config, no tested DR runbook, default vs. customer-managed encryption keys, and others). Independently of how the spec prioritizes them: which would you block launch on, which would you fix in the first 90 days, and which are genuinely fine to defer? Defend the ordering. Q18. Deferred risk. The spec argues the over-privileged dev role (one role that can delete the production database, administer S3 and Cognito, and modify IAM) can be safely deferred because it's the operator's own access path. Do you accept that reasoning? What would you require before signing off on deferring it? Q19. The thing not in the doc. Reviewing a self-reported spec rather than the live account has limits. What in here would you most want to verify directly before trusting it, and is there anything the document's framing leads you to not ask about that you'd want to see? Q20. Top priorities. Anything not yet covered that you'd want addressed before or shortly after launch?

Öppna på Upwork