Himanshu Pant

Himanshu Pant

AWS & Microsoft Certified Data Engineer

I build things that scale.

|

Data Engineer specializing in large-scale ETL/ELT pipelines, real-time data infrastructure, and AI/ML systems. Built production platforms processing 100M-1B records with 66-86% performance improvements across insurance, sports analytics, and healthcare domains.

Open to opportunitiesTempe, AZhimanshupant.dev

Engineer at heart.

Data Engineer with expertise in large-scale data infrastructure, real-time processing, and AI/ML systems. I've migrated 1B+ records to distributed architectures, eliminated 90-minute production bottlenecks, and built real-time pipelines handling 500K+ daily events with 99.9% accuracy.

Track record of rapid impact across insurance, sports analytics, and healthcare โ€” promoted within 6 months at Super Six and received SPOT Recognition Award for delivering critical data infrastructure under tight deadlines.

๐Ÿฅ Healthcareโšฝ Sports Analytics๐Ÿฆ Insurance & Banking๐Ÿ“Š Retention & Segmentation
100M-1B
Records Processed
M.S.
Software Engineering (AI Specialization) โ€” Arizona State University

Tech I work with

Languages
PythonSQLJavaScriptPySpark
Data & ML
Apache SparkDatabricksAirflowLangChainSnowflakeGreat Expectations
Cloud & Infra
AWS (EMR, S3, Lambda, SQS, ECR)AzureDockerJenkinsCI/CD
Frontend
ReactNext.jsNode.jsHTML/CSS
Databases
PostgreSQLSQL ServerOracleNoSQLVector DBs
Tools
GitPower BIStreamlitJira

Highlights.

๐Ÿ†

DEVHACKS 2026 โ€” 1st Place

Won Track 1 with MeetFlow โ€” intelligent task orchestration converting meeting transcripts into capacity-aware ticket assignments using LLM-powered analysis, competing against 100+ teams.

โ˜๏ธ

AWS Certified Data Engineer Associate

Passed DEA-C01 (May 2026) โ€” validated expertise in data pipeline design, ETL optimization, AWS Glue/EMR/Redshift, and implementing data quality frameworks at scale.

๐Ÿ“Š

Microsoft Certified: Fabric Analytics Engineer

Passed DP-700 (May 2026) โ€” validated expertise in Microsoft Fabric analytics engineering, data warehousing, and cloud data solutions.

๐Ÿ†

HackASU โ€” FairCharge

Built a medical bill audit pipeline at HackASU that uses Claude Vision + SapBERT to detect overcharges, flagging $1,300+ in average billing errors per hospital bill.

๐Ÿ…

SPOT Award โ€” Exceptional Delivery

Recognized at Super Six Sports Gaming for exceptional delivery and cross-team collaboration on critical product features.


Industry Credentials.

AWS Certified Data Engineer โ€“ Associate badge

AWS Certified Data Engineer โ€“ Associate

Amazon Web Services โ€ข DEA-C01

Validated expertise in designing, building, and maintaining data pipelines using AWS services including Glue, EMR, Redshift, Kinesis, and implementing data quality frameworks at scale.

Microsoft Certified: Fabric Analytics Engineer Associate badge

Microsoft Certified: Fabric Analytics Engineer Associate

Microsoft โ€ข DP-700

Certified in Microsoft Fabric analytics engineering, data warehousing, data modeling, and implementing end-to-end analytics solutions on Azure cloud platform.


Where I've worked.

AI Engineer

MyEdMaster

Industry Capstone Project (ASU SER517)

Jan 2026 โ€“ Apr 2026 Legal Tech / AI

Remote

  • Developed multi-agent Stateful RAG system for personalized legal guidance using LangGraph with Qdrant vector DB, achieving sub-2s query latency across 10K+ documents and 95% relevance accuracy
  • Built FastAPI + Node.js backend with Docker supporting 100+ concurrent sessions with sub-200ms response time through optimized vector retrieval
PythonLangChainLangGraphQdrantFastAPIDocker

Data Engineer II

EXL Services

Jul 2023 โ€“ Mar 2024 Insurance

Gurugram, India ยท Remote

  • Drove 40% BI query performance improvement by architecting S3 data lake and Snowflake warehouse across 8 datasets supporting 500K+ daily users processing high-frequency insurance claim events
  • Eliminated 90-minute production bottleneck by optimizing PySpark ETL pipelines on AWS Glue/MWAA processing 100M records/batch, achieving 66% runtime reduction via partition pruning and Spark query optimization
  • Built production anomaly detection layer using Great Expectations across 15+ data sources with CloudWatch monitoring and MWAA retries, maintaining 100% SLAs
PySparkAWS GlueMWAASnowflakeS3Great ExpectationsCloudWatch

Data Engineer

Super Six Sports Gaming

Aug 2022 โ€“ Jul 2023 Sports Analytics

Gurugram, India ยท On-site

  • Reduced user churn 20% by building production ML retention pipeline with 78% accuracy using Scikit-Learn and statistical modeling that identified behavioral anomalies in time-series user activity data, deployed automated Python/SQL re-engagement campaigns
  • Built and owned core data ingestion layer โ€” multi-source batch ETL pipelines ingesting from 10+ REST APIs and S3 using cloud data solutions, loading 500K+ daily records of football, soccer, and basketball data into MongoDB with sub-hourly refresh cadence, enabling near-real-time analytics for product and marketing teams with 99.9% data accuracy via cross-source validation and data profiling
  • Engineered ML feature pipelines with SCD Type 1/2 dimensional modeling and automated feature engineering, reducing ML experiment cycle time 35% through optimized time-series data storage patterns
PythonPySparkSQLMongoDBScikit-LearnREST APIs

Associate Data Engineer

Futurense Technologies

Oct 2021 โ€“ Jul 2022 Healthcare

Bangalore, India ยท Remote

  • Spearheaded migration of 1 Billion+ record oncology pipelines from legacy SAS to Apache Spark on Azure Databricks, achieving 86% reduction in batch processing time from 6+ hours to 50 minutes through broadcast joins and strategic partitioning
  • Automated bi-weekly HCP targeting reports by building Python ETL pipeline querying AWS Athena, reducing manual effort from 4โ€“5 hours to 15 minutes, delivering 95% time savings
  • Delivered 99.5% data accuracy post-migration via comprehensive PySpark and SQL validation frameworks with statistical reconciliation
Azure DatabricksApache SparkPySparkAWS AthenaPythonSQL

Data Analyst

Koron Projects Limited

Oct 2018 โ€“ Jul 2021 Construction & Infrastructure

Gurugram, India ยท On-site

  • Built and maintained 15 Power BI dashboards with advanced analytics for executive leadership tracking project costs, timelines, and profitability across $50M+ in annual construction projects
  • Consolidated cost data from multiple enterprise sources including SQL Server, Oracle, and MySQL via SQL queries and stored procedures, automating monthly executive reporting and reducing manual effort by 20 hours/month
Power BISQL ServerOracleMySQLPythonExcel

Selected work.


Let's build something.

Open to Data Engineer, ML Engineer, and Backend Engineer roles focused on large-scale data infrastructure, real-time systems, and AI/ML platforms. Willing to relocate anywhere in the US for the right opportunity.

Download Resume