About Experience Education Projects Research Blog Contact Resume ↗

Data Scientist & ML Engineer

Building intelligent systems for real-world impact

I engineer large-scale AI platforms and machine learning systems for real-world impact. Currently at Bridges AI Consulting, I build end-to-end solutions for content understanding and automated decision-making, while completing my M.S. in Data Science at CU Boulder.

Nevyn Duarte
5+ Years Experience
$10B+ AUM Supported
Tap to explore
01

About

I'm a Data Scientist and AI Engineer with deep experience in building intelligent systems for quantitative finance and large-scale data engineering.

Currently at Bridges AI Consulting, I design and build end-to-end AI platforms that handle unstructured data streams and enable downstream automation. My background includes roles at M Science (Jefferies), where I built predictive equity models and automated research pipelines, and risk analysis for hedge funds with $10B+ AUM.

I hold a B.S. in Mathematics from UT Austin and am completing my M.S. in Data Science at CU Boulder, focusing on deep learning, parallel computing, and advanced statistical modeling.

ML & AI

  • PyTorch
  • TensorFlow
  • scikit-learn
  • HuggingFace
  • XGBoost

Data Engineering

  • PySpark
  • Databricks
  • SQL
  • AWS
  • Docker

Languages

  • Python
  • R
  • Go
  • C++
  • Bash

Finance Tools

  • FactSet
  • Bloomberg
  • Tableau
  • Power BI
  • JMP
02

Experience

Dec 2025 – Present
  • Designed and built an end-to-end AI platform for large-scale content understanding and generation
  • Developed scalable ingestion and transformation layers for unstructured data streams
  • Applied machine learning techniques to extract signals, structure information, and enable downstream automation
  • Implemented production-oriented workflows emphasizing reliability, extensibility, and operational efficiency
  • Aligned technical execution with product objectives to deliver real-world business impact
Machine Learning AI Systems Data Engineering Python NLP Platform Design Scalable Systems

M Science (Jefferies) ↗

Quantitative Equity Research Associate
Sep 2022 – Feb 2023
  • Developed predictive equity models using PySpark and SQL on Databricks, analyzing millions of transactions and job postings data
  • Produced 10+ data-driven research reports with FactSet and REST APIs, accelerating report cycles from quarterly to monthly
  • Automated data extraction and reporting workflows, reducing reporting effort by 20% and enhancing team efficiency
  • Collaborated with senior analysts to deliver alternative-data insights for institutional investors
PySpark Databricks Python SQL FactSet API

Goodfill ↗

Backend Software Engineer Consultant
Jul 2022 – Sep 2022
  • Developed investor onboarding and FINRA record verification systems using Python and FINRA API integration
  • Engineered backend services in Go with AWS Lambda/SQS/SNS architecture
  • Integrated trading applications with Interactive Brokers Gateway & TWS APIs
  • Deployed containerized infrastructure with Docker to improve scalability and reliability
Go AWS Lambda Docker IB Gateway

Perceive Now ↗

Data Analyst
Jul 2022 – Oct 2022
  • Implemented HuggingFace NLP models on AWS Lambda for zero-shot classification, summarization, and entity extraction to process 50k+ research articles
  • Streamlined REST API pipelines in Python and Jupyter, cutting report generation time from minutes to seconds
  • Enhanced large-scale text processing pipelines, reducing operating costs by 12%
HuggingFace NLP AWS Lambda Python

Citco (Citco Fund Services) ↗

Risk Analysis Intern
May 2022 – Jul 2022
  • Automated AIFMD and Form PF reporting for 20+ hedge funds using Python, VBA, SQL, and PySpark
  • Supported internal risk and valuation models for portfolios exceeding $10B AUM across multiple fund structures
  • Developed Excel analytics dashboards to improve fund performance reporting
PySpark Risk Modeling SQL VBA
Aug 2021 – Jun 2022
Yield Analysis Intern
  • Designed an adaptive outlier detection algorithm using nearest-neighbor ML to flag yield anomalies across 100k+ wafer samples
  • Built interactive Power BI dashboards adopted by 20+ engineers to track production test results and yield analysis
  • Visualized trends in JMP and Python to improve yield analysis accuracy
  • Automated failure alerts with pandas/SciPy/Power Automate for real-time issue resolution
Product Development Intern
  • Built real-time Power BI dashboards to monitor lab testing machine status and work orders
  • Integrated 12+ data sources, including Snowflake and MySQL, using Power Automate and Beautiful Soup
  • Automated reporting and anomaly detection workflows in Python to identify machine failures
  • Created shipping and inventory tracking workflows that improved operational efficiency by ~15%
Machine Learning Python JMP Power BI Snowflake Automation MySQL

BNY (The Bank of New York Mellon Corp) ↗

Summer Data Analytics Analyst
Jun 2021 – Aug 2021
  • Designed Tableau, Excel, and Python dashboards for liquidity monitoring, client data visualization, and analytics across Sales and Analytics teams
  • Reduced reconciliation time by 20% for 3+ business units
  • Developed automation scripts in VBA and Python to reduce manual reporting time
  • Contributed to a centralized data warehouse to improve data accuracy across teams
  • Partnered with cross-functional teams to streamline client liquidity and performance analysis
Tableau Python Data Analytics VBA

UT Austin BWI Project ↗

Undergraduate Research Associate
Jan 2019 – Jun 2020
  • Developed trajectory-based person-following systems using DeepSORT tracking and Google's Triplet Loss function
  • Programmed robot navigation in Python, C++, and ROS for autonomous operation
  • Co-authored research paper and received UT CNS Award for Excellence in Computer Science
Computer Vision Python C++ ROS DeepSORT

Order.co (formerly Negotiatus) ↗

Software Engineering Intern
Jun 2017 – Aug 2017
  • Built interactive D3.js and AJAX dashboards for real-time sales data visualization
  • Migrated CRM data from HubSpot to Salesforce through API integrations and automated scripts
  • Enhanced front-end features with HTML, CSS, and JavaScript, plus RSpec tests in Ruby on Rails
D3.js Ruby on Rails JavaScript Salesforce API

stae ↗

Software Engineering Intern
Jun 2016 – Sep 2016
  • Designed interactive web dashboards with PostgreSQL, React, D3.js, and Node.js for client data visualization
  • Redesigned company web pages to align with updated product interfaces and branding
  • Authored comprehensive developer documentation on Linux systems for team onboarding
React D3.js Node.js PostgreSQL
03

Education

Master of Science — Data Science
2023 – Present
GPA: 3.72
Machine Learning
  • Probabilistic Modeling for Data Science
  • Introduction to Deep Learning
  • Unsupervised Algorithms in Machine Learning
  • Introduction to Machine Learning: Supervised Learning
Statistical Learning for Data Science
  • Trees, SVM, and Unsupervised Learning
  • Resampling, Selection, and Splines
  • Regression and Classification
Generative AI
  • Advanced in Generative AI (in development)
  • Modern Applications of Generative AI (in development)
  • Introduction to Generative AI
Natural Language Processing: Deep Learning Meets Linguistics Specialization
  • Word-Level and Low Analysis for NLP
  • Deep Learning for Natural Language Processing
  • Fundamentals of Language Processing
Data Mining Foundations and Practice
  • Data Mining Pipeline
  • Data Mining Methods
  • Data Mining Project
Statistical Modeling for Data Science
  • Generalized Linear Models and Nonparametric Regression
  • ANOVA and Experimental Design
  • Modern Regression Analysis in R
Data Science Foundations: Data Structures and Algorithms Pathway
  • Dynamic Programming, Greedy Algorithms
  • Algorithms for Searching, Sorting, and Indexing
  • Trees and Graphs: Basics
Data Science Foundations: Statistical Inference
  • Hypothesis Testing for Data Science
  • Probability Theory: Foundation for Data Science
  • Statistical Inference for Estimation in Data Science
Databases
  • The Structured Query Language (SQL)
  • Relational Database Design
Vital Skills for Data Scientists
  • Fundamentals of Data Visualization
  • Ethical Issues in Data Science
  • Cybersecurity for Data Science
  • Data Science as a Field
B.S. Mathematics
2018 – 2022
  • Advanced Research in Autonomous Intelligent Robotics
  • AI Robotics Research
  • Applied Statistics in R
  • Statistics & Probability for Computer Science
  • Matrices & Linear Algebra
  • Real Analysis
  • Advanced Calculus for Applications

UT CNS Undergraduate Research Award

High School Diploma
2014 – 2018
  • Algorithms and Data Structures in Python
  • AP Computer Science A
  • AP Calculus BC
  • AP Calculus AB
Activities and societies:
  • Computer Club Co-President
  • CyberPatriot Team Leader
  • FIRST Robotics Team Leader
  • Extemporaneous Speaker in the Hearn Speech and Debate Society
  • Catalyst (Community Service) Club Member
Data Science Foundations: Data Structures and Algorithms Specialization
Coursera
04

Projects

Quantitative Finance

Equity Prediction Models

Built predictive models for equity research using alternative data including transaction records and job postings. Processed millions of data points using PySpark on Databricks.

PySpark ML Alt Data
NLP

Research Article Processor

Deployed HuggingFace NLP models on serverless infrastructure to analyze 50k+ research articles with improved classification accuracy.

HuggingFace AWS Lambda NLP
05

Research & Foundations

Research philosophy

My work is grounded in statistical theory, machine learning research, and systems design. I actively study academic papers, textbooks, and practitioner research to inform how I design models, evaluate risk, and build production-ready systems.

I organize readings by domain to mirror how I approach problems in practice: theory first, empirical validation second, and engineering constraints always in view.

Reading status taxonomy

  • Completed Core understanding established.
  • In Progress Actively working through.
  • Revisited / Reference Used for quick refreshers.
  • Applied in Projects Directly influenced model or system choices.

Statistical Learning & Inference

The Elements of Statistical Learning
The Elements of Statistical Learning

Why it matters: Formalizes bias-variance tradeoffs and model diagnostics for high-stakes prediction.

Application: Used to justify regularization and model complexity in equity prediction pipelines.

Applied in Projects
All of Statistics
All of Statistics

Why it matters: Reinforces probabilistic foundations for inference and experimental design.

Application: Revisited for uncertainty quantification and hypothesis testing workflows.

Revisited / Reference

Machine Learning & AI

Deep Learning
Deep Learning

Why it matters: Provides the theoretical grounding for representation learning and optimization dynamics.

Application: Informs architecture selection and training stability checks for NLP models.

Completed
On Calibration of Modern Neural Networks
On Calibration of Modern Neural Networks

Why it matters: Highlights reliability gaps between accuracy and confidence.

Application: Evaluating post-hoc calibration for classification confidence in production.

In Progress

Quantitative Finance

Active Portfolio Management
Active Portfolio Management

Why it matters: Frames alpha generation and risk budgeting in a systematic way.

Application: Used to translate research factors into portfolio constraints.

Applied in Projects
Advances in Financial Machine Learning
Advances in Financial Machine Learning

Why it matters: Focuses on leakage, backtest overfitting, and financial time-series reality.

Application: Revisited when designing validation for alternative data models.

Revisited / Reference

Software Engineering & Systems

Designing Data-Intensive Applications
Designing Data-Intensive Applications

Why it matters: Clarifies tradeoffs in distributed systems and data pipelines.

Application: Influenced event-driven ingestion and storage decisions in ML pipelines.

Completed
Hidden Technical Debt in Machine Learning Systems
Hidden Technical Debt in Machine Learning Systems

Why it matters: Explains why production ML systems degrade without rigorous engineering discipline.

Application: Applied to monitoring, validation, and pipeline resilience checks.

Applied in Projects

Forward-looking research interests

  • Representation learning for structured and tabular data
  • Model interpretability under regulatory constraints
  • Time-series forecasting under non-stationarity
  • Risk-aware optimization and decision-making
06

Blog & Technical Commentary

Writing is how I pressure-test ideas, document tradeoffs, and translate research into production decisions. These posts are structured like internal design reviews and research memos.

Project Deep Dive Designing equity prediction models

Designing equity prediction models that survive data drift

A walkthrough of how I set up evaluation, leakage checks, and monitoring for alternative data signals in production research pipelines.

  • Problem framing and signal selection
  • Validation strategy for non-stationary data
  • Engineering guardrails to prevent silent failure
Drafting
Research → Practice Calibration is not optional

Calibration is not optional in financial classification

Translating neural network calibration research into decision-making thresholds for risk-aware models.

  • Why accuracy hides confidence issues
  • Temperature scaling for deployment
  • Aligning prediction confidence with action
Outline in progress
Technical Opinion Most ML pipelines fail silently

Most ML pipelines fail silently — here is how I audit them

A practical checklist for monitoring data contracts, model drift, and feature integrity in production ML systems.

  • Pipeline health as a first-class metric
  • Detecting leakage and training skew
  • Designing actionable alerts
Drafting

Let's connect

I'm currently exploring opportunities in ML Engineering, Quantitative Finance, and Data Science. If you're building something interesting, I'd love to hear about it.

nevynduarte@gmail.com