← Lavori

AI-Assisted Data Room File Organizer

Budget: $200.0 FIXED / ⭐ 3.81 (277) United States

microsoft-excel, data-entry, python, microsoft-word

Create an AI-Assisted Static Data Room Organizer for Real Estate Development Documents. I need a contractor to build a practical tool that can take a large master folder of mixed project documents and automatically organize them into a data-room-style folder structure for a real estate development / infrastructure project. The goal is to avoid manually opening and sorting hundreds of files. The tool should process a batch of documents, classify them by subject matter, copy them into the appropriate folders, generate an index, and create a static searchable retrieval system. Original files must not be altered. Core functionality: * Accept a master input folder(s) with many files/subfolders. * Extract text and metadata from each document. * OCR scanned PDFs and image-based files where possible. * Classify each file into the correct data-room category using AI/NLP. * Copy files into a clean output folder structure. * Generate a CSV/Excel manifest showing original file name/path, new folder location, document type, assigned category, confidence score, short classification explanation, key terms/entities, duplicate status, and review-needed flag. * Create a static searchable HTML index that can be opened locally without a server. * Flag low-confidence or unreadable files for human review. * Detect exact and near-duplicate files. * Allow the folder taxonomy to be edited and rerun. Required file types: PDF including scanned PDFs, Word .doc/.docx, Excel .xls/.xlsx, PowerPoint .ppt/.pptx, images .jpg/.png/.tif, text files, and email files .msg/.eml if feasible. Initial taxonomy should be editable but include: Admin/Index, Project Overview, Land Control/PSA, Title/Survey/ALTA, Zoning/Land Use/Local Approvals, Environmental/RCRA/BRAC/FOSET, Wetlands/Streams/USACE, Floodplain/Drainage/Stormwater, Geotechnical/Soils, Civil Engineering/Site Planning, Power/Utility/AEP/SWEPCO, BTM Generation/BESS/Energy, Natural Gas, Water/Wastewater, Fiber/Telecom, Permitting/FAST-41/Federal/State, Vendors/Proposals/Budgets, Capital Markets/Investor Materials, Correspondence, and Unclassified Review Queue. Preferred approach is local-first or hybrid: local text extraction/OCR where possible, local duplicate detection, AI/API classification only where needed, ability to run in a controlled local environment, and no permanent upload or storage of confidential documents by the contractor. Possible technologies may include Python, Tesseract OCR, PyMuPDF/pdfplumber, python-docx, openpyxl, python-pptx, sentence-transformers, FAISS/Chroma, OpenAI API or another LLM classifier, and a lightweight local interface such as Streamlit, Flask, or a simple desktop GUI. Minimum acceptable UI is a command-line tool with clear instructions and config file. Preferred UI is a simple local browser or desktop interface where the user can select input folder, select output folder, choose/edit taxonomy, run classification, view progress, open review queue, export manifest, generate static HTML index, and rerun after corrections. Security requirements: * Do not modify original files. * Copy files into output folders. * Do not upload documents to third-party cloud services unless explicitly enabled. * If API use is required, clearly disclose what text/metadata is sent externally. * Do not store API keys in plain text. * Contractor must not retain client documents. * Testing should use dummy/sample files unless otherwise approved. Deliverables: working tool/app/script, source code, editable taxonomy file, data-room folder output generator, CSV/Excel manifest, static searchable HTML index, duplicate report, review queue report, error log, installation instructions, user guide, and demo using sample files. Acceptance criteria: project succeeds if the tool can process a mixed master folder, classify documents into the data room taxonomy, copy files without altering originals, generate an audit-friendly manifest, produce a local searchable HTML index, identify duplicates, and flag uncertain files for review. Contractor proposal should explain recommended architecture, whether solution runs locally/cloud/hybrid, confidentiality protections, OCR approach, classification confidence scoring, duplicate detection method, static HTML search index approach, taxonomy editing, timeline, milestones, and similar projects completed. This is not intended to be a full enterprise data room SaaS platform. I need a practical, reliable static document organization and retrieval tool that can prepare a data-room-style folder system from a large batch of real estate development project documents. The tool must also create error reports of documents that cannot be synthisized/read for cataloging.
Apri su Upwork