Join our team and build your career with us
We are building a spatially-aware, AI-enabled data layer focused on compliance and regulatory intelligence for horizontal construction and infrastructure projects. The system integrates GIS mapping with fragmented datasets into a structured, continuously updated geospatial dataset.
This role sits at the intersection of agentic AI and geospatial data engineering, focused on transforming unstructured and distributed inputs into GIS-ready, production-grade datasets.
This is not a role for someone who waits to be told what to do. We want someone who ships things, learns fast, uses AI tools aggressively, and gets excited about solving hard real-world problems with intelligent systems.
Agentic Data Collection System
Design and deploy AI agents that autonomously discover, scrape, and ingest data from government portals, open data platforms, and APIs
Build multi-step agentic pipelines using frameworks such as LangChain, LangGraph, CrewAI, or AutoGen — where agents plan, search, retrieve, validate, and store data with minimal human intervention
Implement intelligent retry logic, source change detection, and self-healing pipelines so data collection runs reliably unattended
Use LLMs to extract structured data from unstructured sources — PDFs, scanned documents, web pages, and regulatory text — turning messy inputs into clean, schema-aligned records
AI-Powered Data Matching Engine
Build an intelligent matching tool that cross-references user inputs Scraped regulatory databases using both exact matching and semantic similarity
Use vector embeddings and retrieval-augmented generation (RAG) to make suggestions.
Design the matching engine to be configurable — regulatory rules evolve, and the system must adapt without requiring a full rebuild
Output match results in structured formats usable by the GIS platform and shareable via SharePoint
GIS Data Readiness & Integration
Prepare and transform all collected data into GIS-ready formats: GeoJSON, PostGIS-compatible schemas, Shapefile, or ESRI REST feature service structures
Work closely with the GIS Developer to understand schema requirements, coordinate systems, and layer needs — your data must slot in cleanly without manual cleaning on their side
Automate data refresh cycles so GIS layers stay current without manual intervention
Build lightweight APIs (FastAPI or similar) that serve structured data to the GIS platform on demand
SharePoint & Power Automate Integration
Connect data pipelines and agent outputs to Microsoft SharePoint — storing datasets, reports, and audit logs in organized document libraries and lists
Build Power Automate flows that trigger agent runs, route outputs, alert stakeholders, and sync records across systems
Implement access control logic that determines what data is public-facing vs. internal, with role-based visibility — manageable without code by non-technical team members
Use the Microsoft Graph API to programmatically manage SharePoint content and automate document workflows
Display Control & Visibility Management
Build a configuration layer that lets team members control which datasets and fields are visible publicly vs. restricted — no code required for day-to-day decisions
Implement tagging, flagging, and approval workflows so data publication is deliberate and auditable
Expose controls via a simple admin panel or SharePoint-based interface
AI & Agentic Systems
Hands-on experience building with LLM APIs — OpenAI, Anthropic (Claude), Gemini, or open-source models (Mistral, LLaMA, etc.)
Experience with agentic frameworks — LangChain, LangGraph, AutoGen, CrewAI, or similar — building multi-step pipelines where AI agents plan and execute tasks
Familiarity with RAG (retrieval-augmented generation) architectures — vector databases (Pinecone, Weaviate, ChromaDB, pgvector), embedding models, and semantic search
Understanding of prompt engineering, tool calling/function calling, and structured output from LLMs
Comfort with AI agent patterns: ReAct, Plan-and-Execute, multi-agent collaboration, tool use, memory management
Data Engineering & Scraping
Strong Python skills — data wrangling (Pandas, GeoPandas), API consumption, async programming, and task scheduling
Experience building web scrapers for real-world, complex sources: Playwright, Selenium, Scrapy, BeautifulSoup, or Puppeteer
Ability to extract structured data from unstructured sources — PDFs, HTML tables, scanned documents — using LLMs or parsing libraries
Experience with REST and GraphQL API authentication patterns: OAuth 2.0, API keys, session tokens, rate limiting
Familiarity with geospatial data formats: GeoJSON, WKT, Shapefile, coordinate systems — deep GIS expertise not required
Software Engineering Fundamentals
Clean, readable Python code — you write things others can maintain and extend
API development and integrations: FastAPI or Flask for building lightweight data-serving endpoints
Basic containerization: Docker for packaging and deploying pipelines and services
Version control: Git — branching, pull requests, commit discipline
Cloud basics: deploying services on Azure, AWS, or GCP (any of the three is fine)
Mindset & Ways of Working
You use AI tools aggressively as part of your workflow — Cursor, GitHub Copilot, Claude, or whatever makes you faster
You treat data as a product: schema design, quality checks, and documentation included
You are comfortable working async in a globally distributed, remote-first team
You communicate clearly about what is built, what is blocked, and what needs a decision
You learn fast and are not afraid of domains you don’t know yet — utility regulation, GIS, permitting — you’ll figure it out
Nice to Have
Experience scraping government portal data or public regulatory datasets
Familiarity with ESRI REST APIs, ArcGIS Online, or GeoServer for GIS platform integration
Experience with fuzzy matching, record linkage, or entity resolution systems
Prior work on compliance, regulatory, or permitting platforms
Knowledge of utility industry data or regulatory frameworks (NERC, FERC, EPA, state permit systems)
Exposure to digital twin or infrastructure asset management concepts
Power BI integration with SharePoint or data pipelines
Contributions to open-source AI or data engineering projects