Skip to main content

Publications

Publications and writing

Peer-reviewed research at top NLP venues and in operations research and optimization. Technical writing and tutorials live on the blog.

Preprints

2026

SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data

Shashank Kapadia, Deep Narayan Mishra, Sujal Reddy Alugubelli, Ajay Kumar, Swapnil Yadav, Rishi Bhatia|arXiv preprint (cs.DC, cs.LG)

Presents SURGE, a streaming GPU encoding system deployed in production to generate embeddings for over 800M texts across 40,000 logical partitions. Contributes (i) a cost model predicting throughput within 2% across three encoders spanning a 15× parameter range, (ii) a memory-safety bound enabling a streaming two-threshold policy with peak memory O(B_min + n_max) rather than O(N), and (iii) a φ/CV decision framework characterizing when the pattern applies. On 10M texts with 4 NVIDIA L4 GPUs, SURGE delivers 26,413 texts/s — matching fixed-batch throughput while using 12.6× less memory and 68× faster time-to-first-output, with crash recovery at SuperBatch granularity.

ML SystemsGPU OptimizationDistributed ComputingEmbeddings

Conference Publications

2026

LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference

Shashank Kapadia, Deep Narayan Mishra, Sujal Reddy Alugubelli, Haoan Wang, Saipraveen Vabbilisetty, Rishi Bhatia, Anupriya Sharma|ACL 2026 (Industry Track) — 64th Annual Meeting of the Association for Computational Linguistics

Establishes that layer-aligned distillation and convergence-based early exit are fundamentally incompatible under standard deployment, and introduces LEAP — an auxiliary training objective requiring no architectural changes that augments standard distillation with a single constraint ensuring intermediate layers approximate final-layer representations. LEAP-MiniLM achieves 1.61× wall-clock speedup at θ=0.95 with 91.9% of samples exiting by layer 7, where standard distilled models achieve zero effective speedup. Validated on STS-B (0.760 ± 0.006) and BEIR retrieval benchmarks, with operational guidance for production deployment.

NLPEfficient InferenceKnowledge DistillationEarly Exit

Journal Publications

2022

Joint Robust Optimization of Bed Capacity, Nurse Staffing, and Care Access Under Uncertainty

Dominic J. Breuer, Shashank Kapadia, Nadia Lahrichi, James C. Benneyan|Annals of Operations Research, 312(2), 673–689

Develops two robust optimization models to plan effective bed and nurse resource allocation in clinical units under stochastic admission volumes and lengths of stay. Compares ellipsoidal, budgeted, and data-driven formulations, introducing uncertainty sets based on least-squares ellipsoidal fitting that produce superior solutions in practice.

Operations ResearchHealthcareRobust Optimization
2019

Determining the Optimal Collection Period for Returned Products in a Stochastic Environment

Nizar Zaarour, Emanuel Melachrinoudis, Shashank Kapadia, Hokey Min|International Journal of Logistics Systems and Management, 33(1), 42–58

Develops a mathematical framework to optimize the timing for collecting returned merchandise at initial collection points before shipment to a centralized return facility. Models daily returns as a Poisson process to minimize operational costs and uncertainty while maintaining service quality in reverse logistics.

Supply ChainStochastic OptimizationReverse Logistics