Research

Optimizing Einstein-Summation Subprograms

[Jan, 2022 - Present] Advised by Prof. Andreas Klöckner

Einstein-summations provide a simple notation to express a wide-range of Linear-Algebra primitives. Achieving roofline FlOp-throughput for these operations still remains challenging. We employ a combination of techniques from auto-tuning to pattern-matching to generate efficient OpenCL code for computational kernels containing expressions that have einsum-like memory access patterns.

Array Programming Languages and Intermediate Representations

[Sept, 2020 - Present] Advised by Prof. Andreas Klöckner

Array Programming Languages have been an important vehicle for driving scientific applications from as early as the 1960s. Besides providing a close-to-math expressibility, their intermediate representations are closer to SIMD architectures making it easier to engineer optimizing compilers targeting such hardwares. Pytato provides one such IR that lowers \(n-d\) array programs to computation graphs comprising of pure-Array Ops that can be targeted to OpenCL / CUDA / JAX.

Near-Roofline Discontinuous Galerkin Action Operators

[Sept, 2020 - Present] Advised by Prof. Andreas Klöckner

Discontinuous Galerkin operator applications comprise of many fine-grained array operations that can push them into the memory-bound regime. With kernel and loop fusion we can bump up the workload's Arithmetic Intensity, however, performing fusion might also negatively affect the kernel's performance by inhibiting device's latency hiding abilities by further introducing dependency edges and increasing the working set size of the inner loops. In the MIRGE framework we address such trade-offs for such kernels on GPU systems.

Finite Element Assembly on GPUs

[June, 2018 - Jan, 2021] Advised by Prof. Andreas Klöckner

Evaluation of Finite Element operators result in a diverse set of computational kernels making it a difficult problem to find one optimization strategy that achieves near-peak performance for all the kernels on GPUs. We solve this problem by using high level code generation tools that select the optimization strategy based on the loop structure of the kernel.

Abstractions for High Performance Code Generation

[Aug, 2017 - Present] Advised by Prof. Andreas Klöckner

Designing a programming abstraction for a high performance system is very critical in determining the performance and maintainability of the final application. We address these issues primarily by extending Loopy Intermediate Representation, which is a high-level IR targeted for user-specified kernel transformations. The main question we intend to answer here is – "*What is the minimal set of transformation primitives that can transform any scientific-computing kernel to peak its performance?*"

Solving Eikonal Equations on Unstructured Grids

[Dec, 2016 - June 2017] Advised by Prof. S Baskar, IIT Bombay

Characteristic Fast Marching Method is widely used in solving the Eikonal equations, however previous work had been only formulated for structured grids. We developed a solver that extended the algorithm for unstructured grids as well. Used the solver to solve known problems in literature with skew grids so that the activity of the solution could be efficiently observed in the region of activity

Discontinuous Galerkin Framework for solving Hyperbolic PDEs

[Dec,2015 - May 2017] Advised by Prof. Shiva Gopalakrishnan, IIT Bombay

We developed a C++ library for solving Hyperbolic Equations through Discontinuous Galerkin ("DG") methods on structured grids. Performed a series of convergence tests to verify that the framework satisfied hp−convergence. Eventually, used the framework to simulate problems in Fluid Dynamics like the dam-break problem using high order DG elements.

Flow Induced Reconfiguration of Aquatic Vegetation

[Feb, 2015 - Aug, 2016] Advised by Prof. Rajneesh Bharadwaj, IIT Bombay Corrected the existing models for Fluid Structure Interaction for a Flexible plate by including the Skin friction coefficient in the computations. Implemented a "Predictor-Corrector" based Finite Difference scheme for the computation of coefficient of drag on the plate