← Jobs

Freelance Developer Needed — Local Python / Excel Document Processing Tool

Budget: $3000.0 FIXED / ⭐ 1.67 (2) Australia

python, microsoft-excel, data-scraping

We are looking for an experienced freelance developer to build a local document-processing tool for an internal legal costing workflow. The tool will scan a local matter folder containing emails, PDFs, Word documents and attachments, then produce a structured Excel workbook for review. This is not a web app and does not need to make legal decisions. The purpose is to assist with document organisation, metadata extraction, duplicate flagging, word/page counting, and Excel preparation. The ideal developer will have strong experience with Python automation, Excel reporting, document parsing, and Outlook email files. Project objective We want to build a practical MVP that reduces the repetitive administrative work involved in preparing legal costing files. The first version should focus on: indexing matter documents; extracting useful metadata; linking emails with attachments; identifying likely duplicates; counting words/pages where possible; generating a clean Excel workbook for manual review. The tool should support human review and manual override at all stages. Required features The tool should: run locally on a Windows computer; allow the user to select or nominate a matter folder; scan subfolders; identify common file types, including: PDF files; Word documents; Excel files; Outlook .msg emails; attachments; extract email metadata where possible, including: date; sender; recipient; cc; subject; body preview; attachment names; extract basic document metadata; estimate word counts for emails and Word documents where possible; estimate page counts for PDFs where possible; link attachments to their parent emails where possible; flag likely duplicates or possible email-chain duplicates; export the results into a structured Excel workbook. Excel output required The Excel workbook should include the following tabs: Raw Index A full list of all files and extracted metadata. Review A clean working sheet for manual review. Validation / Warnings Any files that could not be processed, duplicate flags, missing dates, unusual file types, etc. Export A simplified output sheet that can be copied into a final costing document or billing template. Review tab columns The Review tab should include columns such as: Include / exclude Date File name File path File type Document category Email subject / document title Sender Recipient CC Attachment names Parent email Word count Page count Duplicate flag Suggested category Scale item Units Draft item description Final item description Notes The user must be able to manually edit the Excel workbook after it is generated. Suggested categories The tool may provide basic suggested categories, such as: email; letter; medical report; counsel correspondence; court document; pleading; invoice; attachment; unknown. These are only suggestions. The tool does not need to make final legal or costing decisions. Important constraints This project involves confidential legal material, so privacy and security are important. Requirements: The tool must run locally. Client documents must not be uploaded to external services. No client documents may be used for AI training. Development and testing should initially use dummy or redacted sample files. We require clean, readable source code. We require basic handover documentation. An NDA may be required before any sample material is provided. Preferred technical approach We are open to the developer’s recommendation, but expect the solution may involve: Python; openpyxl or similar for Excel generation; document parsing libraries for Word/PDF; Outlook .msg parsing libraries; a simple desktop interface or clear command-line script. A polished interface is not required for the first version, but the tool should be usable by a non-technical user with clear instructions. Preferred milestone structure We are open to discussion, but a possible structure is: Proof of concept Scan folder and generate a basic Excel file index. Email and document extraction Extract metadata from .msg, Word and PDF files. Excel workbook structure Create Raw Index, Review, Validation and Export tabs. Attachment linking and duplicate flags Link attachments to parent emails and flag likely duplicates. Testing, bug fixes and handover Test with sample folders, fix issues, provide source code and documentation. Please respond with Please include the following in your response: Relevant experience with Python automation, Excel reporting or document parsing. Any experience working with Outlook .msg files. Your proposed technical approach. What you would include and exclude from the MVP. Proposed milestone structure and pricing. Any key risks, limitations or assumptions. Confirmation that the tool can run locally without uploading confidential documents. Ideal candidate The ideal person is a practical automation developer who understands document-heavy workflows and can build a useful, reliable internal tool without overcomplicating the first version. Experience with legal, accounting, insurance, medical, professional-services or document-heavy workflows would be highly regarded.
Open job