Himanshu Pant

About

Engineer at heart.

Data Engineer with expertise in large-scale data infrastructure, real-time processing, and AI/ML systems. I've migrated 1B+ insurance claims to distributed cloud architectures, eliminated a 90-minute production bottleneck, and built ML retention pipelines that cut churn 20%.

Track record of rapid impact across insurance, sports analytics, and healthcare - promoted within 6 months at Super Six Sports Gaming and awarded the SPOT Recognition Award for delivering critical data infrastructure under tight deadlines.

🏥 Healthcare⚽ Sports Analytics🏦 Insurance & Banking📊 Retention & Segmentation

100M-1B

Claims / Records Processed

M.S.

Software Engineering (AI Specialization) - Arizona State University

Tech I work with

Languages

PythonSQLJavaScriptPySpark

Data & ML

Apache SparkDatabricksAirflowLangGraphSnowflakedbtGreat Expectations

Cloud & Infra

AWS (EMR, S3, Lambda, SQS, ECR)AzureDockerJenkinsCI/CD

Frontend

ReactNext.jsNode.jsHTML/CSS

Databases

PostgreSQLSQL ServerOracleNoSQLVector DBs

Tools

GitPower BIStreamlitJira

Achievements

Highlights.

🏆

DEVHACKS 2026 - 1st Place

Won Track 1 with MeetFlow - intelligent task orchestration converting meeting transcripts into capacity-aware ticket assignments using LLM-powered analysis, competing against 100+ teams.

☁️

AWS Certified Data Engineer Associate

Passed DEA-C01 (May 2026) - validated expertise in data pipeline design, ETL optimization, AWS Glue/EMR/Redshift, and implementing data quality frameworks at scale.

📊

Microsoft Certified: Fabric Data Engineer Associate

Passed DP-700 (May 2026) - validated expertise in Microsoft Fabric data engineering, data warehousing, and cloud data solutions.

🏆

HackASU - FairCharge

Built a medical-bill audit pipeline at HackASU that uses Claude Vision to detect overcharges and billing violations - targeting the kind of errors that average $1,300+ per hospital bill across the industry.

🏅

SPOT Award - Exceptional Delivery

Recognized at Super Six Sports Gaming for exceptional delivery and cross-team collaboration on critical product features.

Certifications

Industry Credentials.

AWS Certified Data Engineer - Associate

Amazon Web Services • DEA-C01

Validated expertise in designing, building, and maintaining data pipelines using AWS services including Glue, EMR, Redshift, Kinesis, and implementing data quality frameworks at scale.

Microsoft Certified: Fabric Data Engineer Associate

Microsoft • DP-700

Certified in Microsoft Fabric data engineering, data warehousing, modeling, and implementing end-to-end analytics solutions on the Azure platform.

Experience

Where I've worked.

MyEdMaster

AI / Data Engineer (Industry Capstone)

Jan 2026 - Apr 2026

Tempe, AZ

Built a stateful, multi-agent RAG system using LangGraph and a Qdrant vector DB, achieving sub-2s query latency across 10K+ documents
Engineered a FastAPI + Node.js backend (Dockerized) with an optimized 5-node retrieval pipeline
Authored the QnA service evaluation framework, debugged the agentic graph execution layer, and shipped layered technical documentation adopted by the partner team

PythonLangGraphQdrantFastAPINode.jsDocker

EXL Services

Consultant II, Data Engineering

Jul 2023 - Mar 2024

Delivered across two client engagements: a major US auto insurer and a global toy manufacturer.

Gurugram, India

Eliminated a 90-minute production bottleneck by optimizing PySpark ETL on AWS Glue/MWAA processing 100M+ records per batch, cutting runtimes 66% via partition pruning and predicate pushdown
Architected an S3 data lake and Snowflake warehouse with a Great Expectations data-quality framework across 15+ sources, maintaining 100% SLA compliance; automated delivery via Jenkins CI/CD and Terraform IaC
Architected end-to-end ETL integrating ADLS, SharePoint, APIs, and flat files into structured mart layers using advanced dbt (Jinja macros, snapshots for SCD, incremental models) with custom data-quality checks
Built dbt documentation with column-level lineage and added vacuum/optimization hooks reducing data-processing time; mentored 2 junior engineers

PySparkAWS GlueMWAASnowflakeS3dbtGreat ExpectationsTerraformJenkins

Super Six Sports Gaming

Data Engineer

Aug 2022 - Jul 2023

Promoted within 6 months based on performance.

Gurugram, India

Built and owned the core data ingestion layer: multi-source batch ETL from 10+ REST APIs and S3, loading 500K+ daily records of sports-event data into MongoDB
Reduced user churn 20% by building a production ML retention pipeline (78% accuracy, validated against a control group) that detected behavioral anomalies in time-series user activity and triggered automated Python/SQL workflows
Added automated validation and redundancy checks that eliminated data-integrity errors in routine spot checks
Engineered ML feature pipelines with SCD Type 1/2 dimensional modeling, significantly reducing experiment cycle time through automation

PythonSQLPySparkMongoDBS3REST APIsScikit-Learn

Futurense Technologies

Associate Data Engineer

Oct 2021 - Jul 2022

Bangalore, India

Spearheaded migration of 1B+ insurance claims pipelines from legacy SAS (manual proc-SQL batches) to Apache Spark on Azure Databricks, cutting batch time from 6+ hours to 50 minutes via broadcast joins and partitioning
Automated previously manual recurring HCP-targeting reports with a Python ETL layer on AWS Athena
Validated migration with comprehensive PySpark + SQL reconciliation frameworks; zero data-integrity issues reported post-migration

Apache SparkAzure DatabricksPySparkPythonSQLAWS Athena

Koron Projects Limited

Data Analyst

Oct 2018 - Jul 2021

New Delhi, India

Built 15 Power BI dashboards with advanced analytics for executive leadership tracking project costs, timelines, and profitability across $50M+ in annual construction projects
Consolidated cost data from SQL Server, Oracle, and MySQL via SQL and stored procedures, removing 20 hours/month of manual effort

Power BISQL ServerOracleMySQLSQL

Projects

Selected work.

MeetFlowHackathon Winner

Led the LLM pipeline & orchestration layer

Intelligent task orchestration that converts meeting transcripts into actionable tickets. Analyzes via GPT-4o-mini, checks team capacity through Taiga, recommends smart reassignment for overloaded members, and notifies via Slack.

Agent 1: Transcript Analysis

1 / 3

PythonStreamlitOpenAITaiga APISlack

Code

TrustSealHackathon

VillageHacks 2026 | Notary Everyday Track

A multi-provider AI pipeline that verifies identity documents against the AAMVA 2020 standard, replacing manual ID inspection with a scored, auditable report. Claude-based agents independently analyze the document, a local PDF417 decode acts as ground truth, and a judge agent resolves conflicts before generating an APPROVE / REVIEW / REJECT recommendation.

PythonAnthropic ClaudeMulti-AgentPDF417FastAPIDocument Verification

Code

FairChargeHackathon

Solo builder | HackASU Claude AI Builder Hackathon

A medical-bill audit pipeline that reads your bill, identifies every charge, benchmarks against CMS Medicare pricing data for your state, detects overcharges and billing violations, and generates a ready-to-send dispute letter. Built to target the kind of errors that average $1,300+ per hospital bill across the industry.

Bill Upload & Analysis

1 / 3

PythonClaude VisionSapBERTChromaDBSQLiteStreamlitCMS Data

Code

MakeLifeEasyHackathon

LA Hacks 2026 | with Hemakshi Pandey

An AI assistant that executes real actions across Gmail, Calendar, Notion, Jira, and GitHub through a single plain-English conversation. It routes intent through a LangGraph workflow to pull data in parallel and take real actions like moving calendar events, sending emails, creating tasks, and prioritizing work.

PythonLangGraphFetch.aiMulti-AgentAPI Integration

Code

Serverless Face RecognitionAcademic

A serverless, multistage face recognition system using edge computing. IoT clients send video frames processed through decoupled detection and recognition stages via event-driven architecture - scalable, real-time identification without persistent servers.

AWS LambdaSQSECRDockerPyTorchOpenCVEdge Computing

LifeSyncLive

Full-stack productivity platform with Google OAuth, Focus Score tracking, daily task management, Pomodoro timer, and motivational micro-challenges.

Focus Score Dashboard

1 / 2

ReactNode.jsGoogle OAuthREST API

Live

The Pause ButtonLive

A developer wellness app with mixable ambient soundscapes, 10+ casual mini-games, and a calming UI - because even builders need a break.

Nature Sounds Mixer

1 / 2

ReactJavaScriptWeb Audio APICSS Animations

Live

Feature Store with Time TravelIndustry

Centralized feature store using Apache Iceberg with time-travel, enabling ML teams to train, test, and deploy from a single source of truth.

Apache IcebergSparkPythonML Infra

SAS to Spark MigrationIndustry

Migrated 50 legacy SAS workflows to Spark, cutting runtime from 6+ hours to under 50 minutes via partitioning, broadcast joins, and caching.

Apache SparkPySparkSASData Engineering

I build things that scale.

Engineer at heart.

Tech I work with

Highlights.

DEVHACKS 2026 - 1st Place

AWS Certified Data Engineer Associate

Microsoft Certified: Fabric Data Engineer Associate

HackASU - FairCharge

SPOT Award - Exceptional Delivery

Industry Credentials.

AWS Certified Data Engineer - Associate

Microsoft Certified: Fabric Data Engineer Associate

Where I've worked.

MyEdMaster

EXL Services

Super Six Sports Gaming

Futurense Technologies

Koron Projects Limited

Selected work.

Let's build something.