ML Engineer — production training, inference, evaluation, reliability
I build and operate interpretable machine learning systems and scalable geospatial infrastructure for high-stakes decision support. I specialize in leakage-safe evaluation, distributional robustness, and end-to-end delivery from source data to production web products.
Links: dlewis.ai · LinkedIn · Email · GitHub
- Properlytic — consumer real-estate pricing and forecasting with uncertainty fan charts and expanding-window, leakage-safe evaluation.
- Summit Geospatial — high-resolution elevation data products and delivery infrastructure for engineering and hazard workflows.
- PoliBOM — tariff intelligence and compliance workflows for manufacturers (BOM parsing, retrieval, and policy scenario simulation).
- TACC / TDIS support — state-scale climate, flood, and terrain workflows supporting the $40M Texas Disaster Information System and inter-agency resiliency planning.
- Leakage-safe forecasting systems: evaluation harnesses, reliability, and model governance for decisions under uncertainty
- Interpretable ML for decision support: stable, regime-aware attribution and explainability for high-stakes settings
- Geospatial + time-series pipelines: source data → modeling → APIs → web delivery
- Distributed and HPC workflows: performance profiling, scalability, and explicit cost–performance tradeoffs
Consumer property pricing and forecasting platform with interactive uncertainty fan charts.
- Deployed an NYC semi-supervised VAE current-price model using tax and sales records to handle sparsity and noise; achieved 12% holdout error (vs. Zillow 8.4% internal apples-to-apples).
- Shipped a Houston diffusion-based forecasting system with fan charts; achieved ~8% annualized compounding error at best-performing horizons, delivered via production web UI.
High-resolution elevation data and web delivery for engineering and hazard workflows.
- Engineered a statewide seamless Texas DEM mosaic, resampling 0.5 m LiDAR sources to 1.2 m via nearest-neighbor across 70+ elevation datasets.
- Built the web distribution platform end-to-end (pipeline, tiling, hosting, delivery UX) for planning and engineering use cases.
AI tariff mitigation and trade compliance for manufacturers.
- Specified an agentic workflow and schema definitions for BOM parsing, retrieval, and tariff simulation using embeddings + search, API serving, and operational data stores.
- Directed development of a conversational AI interface to simplify trade workflows, retrieval, and proactive compliance alerts.
State-scale climate, flood, and terrain workflows, supporting the $40M Texas Disaster Information System (TDIS) program.
- Scaled climate and flood models on supercomputers, executing large distributed jobs while managing multi-million-dollar compute budgets and federal partnerships.
- Developed methods to produce high-resolution flood maps from National Water Model outputs for operational response workflows.
- Partnered with federal and state agencies on technical scoping, milestones, and delivery pathways.
Columbia University — Financial Engineering (Research Assistant, Industry-sponsored)
Explainable and distributionally robust ML for forecasting and decision support.
- Built a multi-asset CVAE latent factor model with Skew-T mixture priors; achieved 85% R² vs. commercial SaaS (75%) and Fama-French (58%) on backtested holdout.
- Developed tractable, regime-aware attribution methods to improve stability and interpretability for deep forecasting models.
Columbia University — Electrical Engineering (Research Assistant)
- Math benchmarking and finetuning work related to GRPO-style optimization.
Languages: Python, SQL, Bash, TypeScript, C++, Fortran, PL/pgSQL
Modeling: PyTorch, TensorFlow, scikit-learn, transformers, variational inference, SHAP
LLM apps: embeddings, RAG, LangChain/LangGraph, DSPy, FAISS
Data: Pandas, Polars, PyArrow, Parquet, Postgres, PostGIS, Redis, Elasticsearch, Supabase
Infra: Linux/Unix, Docker, Kubernetes, Ray, Spark, Dask, Airflow, CI/CD, observability
Geospatial: GDAL, GeoPandas, Rasterio
Acceleration: CUDA
- Artificial Intelligence for Modeling Complex Systems: Taming the Complexity of Expert Models to Improve Decision Making
- An Intelligent Interface for Integrating Climate, Hydrology, Agriculture, and Socioeconomic Models
- A Semantic Model Catalog to Support Comparison and Reuse
- Website: https://dlewis.ai
- Email: danielhardestylewis@gmail.com
- GitHub: https://github.com/dhardestylewis
- LinkedIn: https://www.linkedin.com/in/dhardestylewis/
- Location: New York, NY



