ICS 2024: International Conference on Supercomputing

June 4-7, 2024

International Conference Hall, Kyoto University, Kyoto, Japan

ICS 2024 accepted papers (as of May 6th)


The following paper has been accepted for publication in ICS 2024. Congratulations!

Session 2: Best Paper Nominees


  • DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs
  • Yelai Feng (College of Electronic Engineering, College of Computer Science and Technology, National University of Defense Technology); Huaixi Wang (College of Electronic Engineering, National University of Defense Technology); Yining Zhu (Ningbo Institute of Technology, Zhejiang University); Xiandong Liu (PerfXLab(Beijing) Technologies Co.,Ltd); Hongyi Lu (College of Computer Science and Technology, National University of Defense Technology); Qing Liu (College of Electrical and Computer Engineering, Technical University of Munich)

  • Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing
  • Durga Keerthi Mandarapu, Vani Nagarajan, Artem Pelenitsyn, Milind Kulkarni (Purdue University)

  • Shared Virtual Memory: Its Design and Performance Implications for Diverse Applications
  • Bennett Cooper (Clemson University); Thomas R.W. Scogland (Lawrence Livermore National Lab); Rong Ge (Clemson University)

  • FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale Systems
  • Reece Neff, Mostafa Eghbali Zarch (North Carolina State University); Marco Minutoli, Mahantesh Halappanavar, Antonino Tumeo (Pacific Northwest National Laboratory); Ananth Kalyanaraman (Washington State University); Michela Becchi (North Carolina State University)

  • An Autonomous Parallelization of Transformer Model Inference on Heterogeneous Edge Devices
  • Juhyeon Lee (Sogang University); Insung Bahk (Hallym University); Hoseung Kim (Sungkyunkwan University); Sinjin Jeong (Pusan University); Suyeon Lee (Georgia Institute of Technology); Donghyun Min (Sogang University)

Session 3A: Memory and Storage Systems


  • LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators
  • Chengtao Lai, Zhongchun Zhou (The Hong Kong University of Science and Technology); Akash Poptani (Indian Institute of Technology Dharwad); Wei Zhang (The Hong Kong University of Science and Technology)

  • HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory
  • Qi Shao (Chalmers University of Technology); Angelos Arelakis (ZeroPoint Technologies); Per Stenström (Chalmers University of Technology, ZeroPoint Technologies)

  • NUCAlloc: Fine-Grained Block Placement in Hashed Last-Level NUCA Caches
  • Raveendra Soori (Cloud Software Group); Shreyas Prabhu (Apple); Harpreet Singh Chawla (Texas A&M University); Michael Ferdman (Stony Brook University)

  • Exploiting Vector Code Semantics for Efficient Data Cache Prefetching
  • Francesc Martínez Palau, Martí Torrents (Barcelona Supercomputing Center); Adrià Armejach (Barcelona Supercomputing Center, Universitat Politenica de Catalunya); Marc Casas (Barcelona Supercomputing Center)

Session 3B: Emerging supercomputing applications


  • Real-time High-resolution X-Ray Computed Tomography
  • Du Wu (Tokyo Institute of Technology, RIKEN-CCS); Peng Chen (National Institute of Advanced Industrial Science and Technology, RIKEN-CCS); Xiao Wang, Issac Lyngaas (Oak Ridge National Laboratory); Takaaki Miyajima (Meiji University); Toshio Endo (Tokyo Institute of Technology); Satoshi Matsuoka, Mohamed Wahib (RIKEN-CCS)

  • RayJoin: Fast and Precise Spatial Join
  • Liang Geng (The Ohio State University); Rubao Lee (Freelance); Xiaodong Zhang (The Ohio State University)

  • Differentiating Set Intersections in Maximal Clique Enumeration by Function and Subproblem Size
  • Hans Vandierendonck (Queen's University Belfast)

  • Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs
  • Xiao Fu, Weiling Yang, Dezun Dong, Xing Su (National University of Defense Technology)

Session 5A: Reliability, dependability and availability


  • Minimizing Coherence Error via Dynamic Decoupling
  • Soheil Khadirsharbiyani, Movahhed Sadeghi (Pennsylvania State University); Mostafa Eghbali Zarch (NC State University); Mahmut Taylan Kandemir (Pennsylvania State University)

  • Soft Error Resilience at Near-Zero Cost
  • Jianping Zeng, Shao-Yu Huang (Purdue University); Jiuyang Liu (Huazhong University of Science and Technology); Changhee Jung (Purdue University)

  • Understanding GPU Memory Corruption at Extreme Scale: The Summit Case Study
  • Vladyslav Oles (Oak Ridge National Laboratory); Anna Schmedding (William & Mary); George Ostrouchov, Woong Shin (Oak Ridge National Laboratory); Evgenia Smirni (William & Mary); Christian Engelmann (Oak Ridge National Laboratory)

  • Input Range Generation for Compiler-Induced Numerical Inconsistencies
  • Dolores Miao (University of California, Davis); Ignacio Laguna (Lawrence Livermore National Laboratory); Cindy Rubio-González (University of California, Davis)

Session 5B: Heterogeneous software: GPUs and domain specific accelerators


  • Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs
  • Andreas L. Plesner (ETH Zurich); Hans Henrik Brandenborg Sørensen, Søren Hauberg (Technical University of Denmark)

  • RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
  • Benjamin Brock (Intel Corporation); Aydın Buluç, Katherine Yelick (University of California, Berkeley)

  • Distributed Ranges: A Model for Distributed Data Structures, Algorithms, and Views
  • Benjamin Brock, Robert Cohn, Suyash Bakshi, Tuomas Karna, Jeongnim Kim, Mateusz Nowak, Łukasz Ślusarczyk, Kacper Stefanski, Timothy G. Mattson (Intel Corporation)

  • Stencil Computation with Vector Outer Product
  • Wenxuan Zhao (Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences); Liang Yuan (Institute of Computing Technology, Chinese Academy of Sciences); Baicheng Yan, Penghao Ma (Huawei Technologies Co., Ltd); Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences); Long Wang, Zhe Wang (Huawei Technologies Co., Ltd)

Session 6A: Cloud and ML Systems Efficiency


  • YMIR: A Scheduler for Foundation Model Fine-tuning Workloads in Datacenters
  • Wei Gao (S-Lab, Nanyang Technological University); Weiming Zhuang, Minghao Li (Nanyang Technological University); Peng Sun (SenseTime & Shanghai AI Lab); Yonggang Wen, Tianwei Zhang (Nanyang Technological University)

  • DeepHYDRA: Resource-Efficient Time-Series Anomaly Detection in Dynamically-Configured Systems
  • Franz Kevin Stehle (Heidelberg University & CERN); Wainer Vandelli, Giuseppe Avolio, Felix Zahn (CERN); Holger Fröning (Heidelberg University)

  • An Efficient and Scalable Approach to Build Co-occurrence Matrix for DNN's Embedding Layer
  • Quentin Petit (Université Paris-Saclay & Huawei Technologies France); Chong Li (Paris Distributed and Parallel Technologies Lab, Huawei Technologies France); Nahid Emad (Maison de la Simulation & LI-PaRAD, Université Paris-Saclay)

  • Scheduling for Cyber-Physical Systems with Heterogeneous Processing Units under Real-World Constraints
  • Justin McGowen, Ismet Dagli, Neil T. Dantam, Mehmet E. Belviranli (Colorado School of Mines)

Session 6B: Accelerator Designs


  • SLIDEX: A Novel Architecture for Sliding Window Processing
  • Raúl Taranco, José-María Arnau, Antonio González (Universitat Politècnica de Catalunya)

  • Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
  • Zhengang Li (Northeastern University); Alec Lu (Simon Fraser University); Yanyue Xie, Zhenglun Kong (Northeastern University); Mengshu Sun (Beijing University of Technology); Hao Tang (ETH Zurich); Zhong Jia Xue (Simon Fraser University); Peiyan Dong (Northeastern University); Caiwen Ding (University of Connecticut); Yanzhi Wang, Xue Lin (Northeastern University); Zhenman Fang (Simon Fraser University)

  • CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers
  • Sungmin Yun, Hwayong Nam, Kwanhee Kyung, Jaehyun Park (Seoul National University); Byeongho Kim (Samsung Electronics); Yongsuk Kwon (Seoul National University); Eojin Lee (Inha University); Jung Ho Ahn (Seoul National University)

  • NeOCNN: NTT-enabled Optical Convolution Neural Network Accelerator
  • Xianbin Li, Yinyi Liu, Fan Jiang, Chengeng Li, Yuxiang Fu, Wei Zhang (Hong Kong University of Science and Technology); Jiang Xu (Microelectronics Thrust, Hong Kong University of Science and Technology)

Session 8A: Supercomputing Software and Security


  • sys-sage: A Unified Representation of Dynamic Topologies & Attributes on HPC Systems
  • Stepan Vanecek, Martin Schulz (Chair of Computer Architecture and Parallel Systems)

  • RTT: Reuse Time Tracking for Use-After-Free Detection
  • Yubo Du, Yanan Guo, Youtao Zhang, Jun Yang (University of Pittsburgh)

  • Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures
  • Shilpa Babalad, Shirish k Shevade, Matthew Jacob Thazhuthaveetil, R Govindarajan (Indian Institute of Science)

  • Matrix-free SBP-SAT finite difference methods and the multigrid preconditioner on GPUs
  • Alexandre Chen, Brittany A. Erickson (University of Oregon); Jeremy E. Kozdon (Naval Postgraduate School); Jee Choi (University of Oregon)

Session 8B: Interconnects and Networks


  • SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications
  • Pouya Haghi (University of Rochester); Cheng Tan (Microsoft); Anqi Guo (Boston University); Chunshu Wu (University of Rochester); Dongfang Liu (Rochester Institute of Technology); Ang Li (Pacific Northwest National Laboratory); Anthony Skjellum (Tennessee Tech University); Tong Geng (University of Rochester); Martin Herbordt (Boston University)

  • CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes
  • Mert Hidayetoglu (Stanford University); Simon Garcia de Gonzalo (Sandia National Laboratories); Elliott Slaughter (SLAC National Accelerator Laboratory); Yu Li (University of Illinois at Urbana-Champaign); Christopher Zimmer (Oak Ridge National Laboratory); Tekin Bicer (Argonne National Laboratory); Bin Ren (William & Mary); William Gropp (University of Illinois at Urbana-Champaign); Wen-Mei Hwu (Nvidia Research / University of Illinois at Urbana-Champaign); Alex Aiken (Stanford University)

  • gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters
  • Jiajun Huang (University of California, Riverside); Sheng Di (Argonne National Laboratory); Xiaodong Yu (Stevens Institute of Technology); Yujia Zhai, Jinyang Liu (University of California, Riverside); Yafan Huang (University of Iowa); Ken Raffenetti, Hui Zhou (Argonne National Laboratory); Kai Zhao (Florida State University); Xiaoyi Lu (University of California, Merced); Zizhong Chen (University of California, Riverside); Franck Cappello, Yanfei Guo, Rajeev Thakur (Argonne National Laboratory)

  • Enhanced UGAL Routing Schemes for Dragonfly Networks
  • Ram Sharan Chaulagain, Xin Yuan (Florida State University)

Session 9A: Machine learning systems


  • A Coordinated Strategy for GNN Combining Computational Graph and Operator Optimizations
  • Mingyi Li, Junmin Xiao, Kewei Zhang, Zhiheng Lin, Chaoyang Shui (Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences); Ke Meng (Alibaba Group); Zehua Wang, Yunfei Pang, Guangming Tan (Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences)

  • AUTOSCHED: An Adaptive Self-configured Framework for Scheduling Deep Learning Training Workloads
  • Wei Gao (S-Lab, Nanyang Technological University); Xu Zhang (Chongqing University); Shan Huang (Nanyang Technological University); Shangwei Guo (Chongqing University); Peng Sun (SenseTime & Shanghai AI Lab); Yonggang Wen, Tianwei Zhang (Nanyang Technological University)

  • Sylva: Sparse Embedded Adapters via Hierarchical Approximate Second-Order Information
  • Baorun Mu (University of Toronto, Vector Institute, CentML); Christina Giannoula (University of Toronto, CentML); Shang Wang, Gennady Pekhimenko (University of Toronto, Vector Institute, CentML)

  • Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment
  • Hanxian Huang (University of California San Diego); Xin Chen (Intel Corporation); Jishen Zhao (University of California San Diego)

Session 9B: Software Design for Accelerators


  • FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogeneous Graph Neural Networks
  • Keren Zhou (George Mason University); Karthik Ganapathi Subramanian, Po-Hsun Lin (North Carolina State University); Matthias Fey (Kumo.AI); Binqian Yin (George Mason University); Jiajia Li (North Carolina State University)

  • SNOOPIE: A Multi-GPU Communication Profiler and Visualizer
  • Mohammad Kefah Taha Issa, Muhammad Aditya Sasongko, Ilyas Turimbetov, Javid Baydamirli, Doğan Sağbili, Didem Unat (Koç University)

  • RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection
  • Yifei Li (Alibaba Group); Bole Zhou (Independent); Jiejing Zhang, Xuechao Wei, Yinghan Li, Yingda Chen (Alibaba Group)

  • Accelerated Auto-Tuning of GPU Kernels for Tensor Computations
  • Chendi Li, Yufan Xu, Sina Mahdipour Saravani, P. Sadayappan (University of Utah)