# **2024 IEEE 67th International Midwest Symposium on Circuits** and Systems (MWSCAS 2024)

# Springfield, Massachusetts, USA 11-14 August 2024

**Pages 1-711** 



IEEE Catalog Number: CFP24MID-POD **ISBN:** 

979-8-3503-8718-6

# Copyright © 2024 by the Institute of Electrical and Electronics Engineers, Inc. All Rights Reserved

*Copyright and Reprint Permissions*: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.

For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved.

# \*\*\* This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version.

| CFP24MID-POD      |
|-------------------|
| 979-8-3503-8718-6 |
| 979-8-3503-8717-9 |
| 1548-3746         |
|                   |

## Additional Copies of This Publication Are Available From:

Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400 Fax: (845) 758-2633 E-mail: curran@proceedings.com Web: www.proceedings.com



# **TABLE OF CONTENTS**

## Monday, August 12, 2024

#### Session A1L-A: Analog Circuits & Systems

Chair: Mengting Yan, *Analog Devices* Co-Chair: Glenn Cowan, *Concordia University* Time: Monday, August 12, 2024, 8:00 - 9:30 Location: Room 1

#### A 40Gb/S Multi-Band Wireline Receiver Analog Front-End for 50.4dB Channel Loss Compensation ...... 1

Mingzhe Liu, Yongzhen Chen, Jiangfeng Wu, Cuixia Wang Tongji University, China

Abstract: This paper presents a serial link receiver analog front-end (AFE) designed for high data rate high channel loss application. The continuous-time linear equalizer (CTLE) based AFE features three stages of peaking amplifier and two stages of variable gain amplifier (VGA). All stages in AFE are designed in transconductance–transimpedance (Gm-TIA) structure for bandwidth extension. The AFE can compensate for a 50.4dB insertion loss FR4 channel at a data rate of 40Gb/s without a feedforward equalizer (FFE) / decision feedback equalizer (DFE). The total circuit is designed in 28nm CMOS technology and achieves a FoM value of 0.045pJ/bit/dB.

#### 

#### National Institute of Astrophysics, Optics and Electronics INAOE, Mexico

**Abstract:** In this paper, a high linearity current-mode instrumentation amplifier (CMIA) with gain tunability is presented. The proposed circuit is based on complementary transconductors at the input and a class AB output stage, which increases both the slew rate and the linear range. The gain is controlled by non conventional current mirrors that allow a continuous gain adjustment by means of a control current. The circuit was designed in a standard 0.18  $\mu$ m CMOS process with a power supply of 1.8 V, and offers a wide tunable gain range from 25 to 46 dB, with 33.4  $\mu$ W power consumption at maximun gain. The proposed CMIA also provides high linearity, with a total harmonic distortion of -50 dB for a 2.5 Vpp differential output signal at 1 kHz frequency.

#### 

#### Sara Radfar, Glenn Cowan

#### Concordia University, Canada

**Abstract:** The nonlinearity of the optical receiver mainly depends on the final stages of the main amplifiers. The Cherry-Hooper (CH) amplifier is popular due to its broad bandwidth. However, CH amplifiers may not offer sufficient linearity, especially for PAM-4 optical links. Previously, the authors designed a highly linear PAM-4 optical receiver using gm/gm inverter-based amplifiers known for their superior linearity. To compensate for the lower bandwidth of the gm/gm amplifiers, the interleaving active feedback (IAFB) technique was applied. However, in smaller input voltage amplitudes, the previously proposed design did not represent a superior linearity over the open-loop structure without IAFB. This paper addresses this issue by modifying the active feedback loops using voltage dividers. Our proposed design operates at a 50 Gb/s data rate with a 1 V supply and achieves a total harmonic distortion (THD) of about 0.3% for a 600 mVpp differential output swing.

Mahmood A. Mohammed, Ahmed S. Emara, Gordon W. Roberts McGill University, Canada

Abstract: This paper presents an in-depth analysis of the Slew-Rate (SR) in scalable multi-stage CMOS Operational Transconductance Amplifiers (OTAs). These OTAs utilize Low-Frequency (LF) zeros in their frequency compensation technique to address stability issues encountered when cascading gain stages in closed-loop configurations. However, the SR analysis of such OTAs deviates from conventional SR equations due to the presence of LF zeros. Hence, this paper explores the SR behavior of scalable OTAs and develops a new set of SR equations. These equations are validated and verified through simulations and measurement results. The proposed design is fabricated using a TSMC 65 nm CMOS process.

## 

Michele Noviello<sup>1</sup>, Ruben Garvi<sup>1</sup>, Andres Quintero<sup>2</sup>, Susana Paton<sup>1</sup>

<sup>1</sup>Universidad Carlos III de Madrid, Spain; <sup>2</sup>Infineon Technologies AG, Austria

Abstract: This work describes the transient and steady-state responses of a voltage-controlled oscillator (VCO)-based analog frontend for capacitive sensors aimed at measuring the distance between the plates. A full VCO-based capacitance-to-digital converter can be obtained by processing the VCO output with a frequency-to-digital converter. The front-end primarily consists of a ring oscillator, whose input is regulated in a closed loop by means of a switched-capacitor (SC)-based feedback. The SC network is also used to couple an unbiased capacitive sensor to the readout without the need for biasing circuitry. This manuscript investigates the ripple generation due to the switching nature of the analyzed architecture, the average behaviour under small-signal variations, and the main noise contributions. The extracted model is validated through transient simulations and can be valuable for design purposes and stability studies.

## Session A1L-B: Digital Circuits & Systems 1

Chair: Randall Geiger, Iowa State University Time: Monday, August 12, 2024, 8:00 - 9:30 Location: Room 2

#### 

Khalid Alammari<sup>1</sup>, Majid Ahmadi<sup>1</sup>, Arash Ahmadi<sup>2</sup> <sup>1</sup>University of Windsor, Canada; <sup>2</sup>Carleton University, Canada

Abstract: In this paper, a Memristor-based logic design for a full adder circuit is proposed, utilizing MAGIC and MRL methods. The MAGIC-based full adder is a memristor-only based circuit with 25 memristors mapped into a memristive crossbar structure, while the MRL- based full adder is implemented with 18 memristors devices and 4 MOSFETs in CMOS-Memristor platform. The proposed designs were implemented and simulated in Cadence Virtuoso. The simulation results demonstrate the design functionality.

#### An Efficient and Configurable Hardware Architecture of Polynomial Modular Operation for

Jiahao Lu<sup>1</sup>, Jiaming Zhang<sup>1</sup>, Zhixiang Luo<sup>1</sup>, Aobo Li<sup>1</sup>, Tianze Huang<sup>1</sup>, Dongsheng Liu<sup>1</sup>, Chi Cheng<sup>2</sup> <sup>1</sup>Huazhong University of Science and Technology, China; <sup>2</sup>China University of Geosciences, China

Abstract: The globalized migration towards post-quantum cryptography (PQC) is accelerating to protect communications against the upcoming quantum threat. However, the resource-constrained IoT communication devices limits the deployment multiple PQC algorithms, which is hard to fulfill the various security requirements of IoT devices. In this paper, an efficient hardware architecture compatible with customized instruction format is proposed, which is compatible with CRYSTALS-Kyber and Dilithium. Following the methodology of maximize-reuse, a configurable polynomial modular arithmetic unit is presented to execute all the required modes of polynomial modular operations in pipeline. Implemented on UltraScale+ and Artix-7 platforms, the proposed architecture consumes 2310 and 4028 equivalent slices at a maximum frequency of 280MHz and 99MHz in two platforms. Compared to the state-of-the-art Kyber-only and Dilithium-only researches, this work supports two algorithms simultaneously and realizes the lowest area-time (AT) value for Dilithium and a competitive AT value for Kyber.

#### Seoul National University, Korea

Abstract: This paper surveys a number of noticeable recent research activities on the design and technology co-optimization (DTCO) with multi-bit flip-flop cells in design automation perspective. Multi-bit flip-flop (MBFF) cells are widely used for low power design, but they incur a number of critical limitations which should be overcome. Those are, due to the grouping of individual flip-flops as well as the sharing of internal clock inverters, the layout space waste in MBFF cells, the inflexibility in optimizing interconnects, and the inflexibility in optimizing timing. Specifically, the DTCO research activities surveyed in this paper are (1) the utilization of the diverse structures of MBFF cells in optimizing interconnect, (2) the utilization of the empty space in MBFF cells in optimizing timing, and (3) the exploitation of debanking the MBFF cell instances to trade off power against timing.

#### A Novel Multi-Error-Lock-Trace (MELT) Test Structure for SET/SEU Characterization of

 Radiation-Hardened-by-Design Cells
 37

 Rouli Fang<sup>1</sup>, Kwen-Siong Chong<sup>2</sup>, Kyaw Zwa Lwin Ne<sup>2</sup>, Wei Shu<sup>2</sup>, Sun Yang Tay<sup>1</sup>, Joseph Sylvester Chang<sup>2</sup>
 37

 <sup>1</sup>Nanyang Technological University, Singapore; <sup>2</sup>Zero-Error Systems Pte Ltd, Singapore
 37

Abstract: The design flow for a radiation-hardened-by-design (RHBD) cell library includes the verification of the logic cells therein in terms of soft error rate (SER) under (heavy-ion) irradiation. The RHBD cell library is different from the standard (non-RHBD) cell library in terms of the need to provide test structures to ascertain its radiation hardness. We propose a novel test structure design strategy. This involves a unique Multi-Error-Lock-Trace mechanism, which features the ability to temporarily lock the single-event-transient or single-event-upset error into registers, and thereafter the ability to trace back to the location of the error occurrence. We implement the aforementioned methodology with three test structures to verify 15 RHBD cells. The proposed strategy is very worth-while. Specifically, when compared to the reported approach of assigning individual I/O pads for each device-under-test (DUT), our method reduces the number of I/O pads by 80%, translating to >95% test die area reduction. Further, when compared to the other reported approach of multiplexing I/O pads for each DUT, our method reduces the number of test runs at each linear-energy-transfer level by 80%.

#### 

Maxwell Phillips, Ahmed Ammar, Firas Hassan

#### Ohio Northern University, United States

Abstract: Binary decoders are ubiquitous in digital circuits for computing, particularly within memory addressing, multiplexing, and demultiplexing. A commonly used method for constructing binary decoders is the coincident row-column structure, which uses two smaller decoders intersecting over an array of AND gates to generate the output of the decoder as a whole. While this structure works for the typical sizes of modern computers (under 128 bits), it still has room for improvement, especially at high precisions, such as 1024 bits and greater. This work proposes three multi-level architectures for binary decoders suitable for high-precision inputs, generalizing the existing row-column decoder architecture. We compare our novel structure to the aforementioned row-column method, a tree-based structure, and other single-level decoder constructs, and analyze complexity (or cost, in terms of lookup tables or transistors) as a function of input precision. Finally, we recommend decoder architectures for different input precisions and implementation technologies (FPGA or ASIC).

## Session A1L-C: Circuits & Systems for Communications 1

**Chair:** Negar Reiskarimian, *Massachusetts Institute of Technology* **Co-Chair:** Sungho Kim, *The University of Rhode Island* **Time:** Monday, August 12, 2024, 8:00 - 9:30 **Location:** Room 3

## Capacitor Selection and Sizing in Linear and Exponential Integrated Charge Pumps ...... 47

#### Masoud Askariraad, Stefano Gregori University of Guelph, Canada

**Abstract:** This paper presents a method for optimizing the sizes of the flying capacitors when using different types of devices in the same integrated charge pump. Area savings can be achieved by using multiple device types rather than a single type rated for the capacitor with the highest working voltage. Analytic models for the linear and exponential charge pumps are used to estimate the improvements and reveal design trade-offs. Simulations with parasitic effects and a case study highlight which circuit has better performance to inform the choices of the designer.

#### Saeed Zeinolabedinzadeh

#### Arizona State University, United States

**Abstract:** This paper presents a wideband fully integrated transceiver at 28 GHz in 0.13 µm SiGe BiCMOS technology. With a phase shifter, power amplifier, low noise amplifier and differential mixer the transceiver is particularly useful for trans-receive distributed beamforming. The transmitter operates over 5 GHz 3 dB bandwidth and has 22.5 dB of gain. With 360° phase rotation capability over 5 GHz and phase shift step less than 1° this transceiver can reduce the impact of the phase shift error in beamforming to near zero. At 28 GHz the transmitter P1dB and Psat output power are 6.13 dBm and 10.2 dBm respectively. The receiver with a differential IF output and 2 GHz bandwidth has a conversion gain of 21 dB for 100 MHz, LO power of 5.3 dBm and RF power of -34.5 dBm.

#### 

## Brookhaven National Laboratory, United States

**Abstract:** This paper describes a power-efficient long-range on-chip data link for wafer-scale monolithic active pixel sensors (MAPS). Inter-symbol interference (ISI) due to limited channel bandwidth is reduced by implementing a multi-bit transmitter that supports line coding and pulse shaping. Power consumption is minimized by using a DC-coupled link design with current reuse between transmitter and receiver. Comparator offset is digitally calibrated by a custom state machine to enable ultra-low signal swings. Circuit-level simulations in a 65 nm CMOS process show a total power consumption of 596 uW for a 160 Mb/s link operating across a 10 cm on-chip channel, corresponding to a link figure of merit (FoM) of 37.3 fJ/bit/mm.

Southern Methodist University, United States

**Abstract:** This paper presents a novel CAN bus transceiver. An inherent auxiliary data channel is embedded in the transmitted primary data via phase modulation for data authentication. Enhanced rail converters are implemented to provide single-rail to dual-rail data conversion and vice versa for data transmission. The rail converters not only preserve the phase information of the transmitted data across different PVT corners but also provide the capability to significantly suppress the phase errors caused by the phase mismatch between the dual-rail signals, thus making the phase information reliable for authentication purposes.

#### 

Ramin Javadi, Tejasvi Anand

#### Oregon State University, United States

**Abstract:** This paper presents a new encoding scheme for PAM-4 signaling with the aim of reducing the inter-symbol interference (ISI) caused due to the bandwidth limited channel. The proposed encoding converts any consecutive identical symbols (CIS) in the input data to an alternative transition, which helps to mitigate inter-symbol interference and results in wider eye opening at the receiver as compared to unencoded PAM-4. Performance of the proposed encoding is validated through transistor level simulations at 56Gb/s using two different PRQS-15 data patterns across two communication channels (6dB and 9dB channel loss) in 16nm FinFET technology. Simulations show more than 2x improvement in the vertical and horizontal eye opening with the proposed encoding scheme.

## Session A1L-D: Millimeter Wave Circuits

**Chair:** Najme Ebrahimi, *Northeastern University* **Co-Chair:** Junning Jiang, *Nvidia* **Time:** Monday, August 12, 2024, 8:00 - 9:30 **Location:** Room 4

#### 

Abstract: Object detection and classification have shown a great interest in the recent past. It helps to enhance efficiency, security, and safety in broad applications, including traffic management, surveillance, autonomous driving, and pedestrian safety. Camerabased detection is used, but it is not promising during adverse weather conditions. In order to address this problem, millimeter wave (mmWave) radar-based multi-class car detection and classification are used in this paper. The radar used here in hardware implementation is manufactured by Texas Instruments IWR1843BOOST. This system used 3D point cloud data from mmWave radar. Here objects are taken as three classes of cars SUV, sedan, and hatchback. Furthermore, for classification, we applied Deep Neural Network models, Convolutional Neural Network (CNN), and Residual Network (ResNet). After training both models in the dataset, ResNet achieved the highest accuracy of 80%.

## A 24–28.5GHz Compact GaN/SiC MMIC Power Amplifier with 39% Peak PAE

 Supporting 5G 400MHz Down-Link Signal
 76

 Mohammad Moussa, Ayssar Serhan, Pascal Reynier, Dominique Morche, Alexandre Giry
 76

CEA-Leti, France

Abstract: This paper presents the design and experimental characterization of a compact two-stage monolithic microwave integrated circuit (MMIC) power amplifier (PA) implemented in a 150nm GaN/SiC high electron mobility (HEMT) technology. The proposed PA achieves 31.2-31.9dBm of saturated output power (P\_sat), 36.7-39.2% peak PAE and a small-signal gain of 18.6-20.5dB across the n258 (24.25-27.5GHz) band, under CW excitation. The PA exhibits high linearity under 100/400MHz 5G QPSK downlink signals and achieves -24.5/-25.1dBc ACLR, 24.8/24.5dB EVM, for an average P\_out of 24.2/20dBm and PAE of 22.1/11%, at 26GHz without digital pre-distortion (DPD). Its compact size (2mm<sup>2</sup>) and high performance advance the state-of-the-art (SOTA) for GaN MMIC PAs operating in the 5G n258 (24.25-27.5GHz) frequency band.

#### 

Mehdi Khoee, Amin Pourvali Kakhki, Ammar B. Kouki École de technologie supérieure ÉTS, Canada

**Abstract:** In this paper, an embedded calibration structure for millimeter-wave (mm-wave) multi-probe reflectometers is presented. The proposed structure is designed using on-chip components, thereby eliminating the need for the commonly used bulky and costly mm-wave on-wafer calibration equipment. This calibration structure can operate concurrently with the reflectometer, enabling compensation for temperature and process variations. For a proof-of-concept, the reflectometer and proposed calibration units are designed and simulated for E-band in a 65 nm standard CMOS process. The proposed calibration unit, composed of three replications of the reflectometer and onchip passive components, occupies an area of 50 um  $\times$  500 um ( 0.025 mm2). The simulations for complex impedance detection using the designed structure, show a maximum magnitude and phase error of 1.8 dB and 4° at 60 GHz and 0.9 dB and 2.5° at 90 GHz, respectively.

## Development and Evaluation of ANN, RBNNs, and GRNNs based Small-Signal

#### Nazarbayev University, Kazakhstan

Abstract: This paper conducts an extensive analysis of smallsignal behavioral modeling of Gallium Nitride (GaN) High Electron Mobility Transistors (HEMTs) up to 40 GHz, utilizing Artificial Neural Network (ANN), Radial Basis Neural Networks (RBNNs), and Generalized Regression Neural Networks (GRNNs). The study focuses on enhancing accuracy, generalization capability and speed by fine-tuning hyperparameters through standard trial and error method. Additionally, the paper evaluates the developed models' ease of implementation, and fitting and error behaviors under diverse biasing conditions. The acquired results indicate an exceptional consistency between measured and modelled behaviors for ANN based models. Furthermore, RBNNs based models demonstrate subpar accuracy, whereas GRNNs based models exhibit inferior prediction accuracy compared to ANN but better than RBNNs based models.

#### 

Maryam Fatima, Peter Haring Bolívar, Bhaskar Choubey Universität Siegen, Germany

Abstract: Design of Active 300 GHz Mixer in SiGe 130 nm Technology

## Session A1L-E: Sensor Circuits & Systems 1

**Chair:** Ruolin Zhou, *University of Massachusetts Dartmouth* **Co-Chair:** Prafull Purohit, *Brookhaven National Laboratory* **Time:** Monday, August 12, 2024, 8:00 - 9:30 **Location:** Room 5

# 

Leila Sharara<sup>1</sup>, Jonas Gillner<sup>2</sup>, Roy Taylor<sup>1</sup>, Klaus Thelen<sup>2</sup>, Lubna Alazzawi<sup>1</sup>, Mohammed Ismail<sup>1</sup>

<sup>1</sup>Wayne State University, United States; <sup>2</sup>Hochschule Ruhr West, Germany

Abstract: This research presents an innovative solution aimed at preventing in-vehicle pediatric heatstroke through the integration of advanced sensors, connectivity, and intelligent algorithms. The proposed system continuously monitors crucial parameters including occupancy status, temperature, and the child's Respiratory Rate (RR). A hybrid detection approach, combining Force Sensing Resistor (FSR) sensors and motion sensors, is employed to ensure enhanced accuracy. To optimize the system efficiency, the solution utilizes Simplified Frequency Analysis (SFA), offering processing speeds up to 80 times faster than conventional methods, making it ideal for real-time applications. The processed data is seamlessly transmitted via a mobile application, enabling immediate caregiver notification and intervention. Engineered to withstand extreme environmental conditions ranging from -40 °C to +85 °C, while maintaining a minimal standby current of 104  $\mu$ A, this technology holds immense potential in not only saving lives but also establishing itself as a standard safety feature.

#### 

Brandon Gresham, Josh Blowers, Steven Sandoval, Wei Tang

New Mexico State University, United States

**Abstract:** This paper presents a novel event-driven speech signal sensing, pre-processing, and compression method using Dynamic Predictive Sampling. The Dynamic Predictive Sampling method converts the input analog waveform into a non-uniform sampled event sequence with both amplitude and timestamp data of each event, which is the turning point of the analog signal. The event sequence can be reconstructed without losing the morphology of the input analog signal. Since the selection of turning points is performed during the analog-to-digital conversion process, the circuit generates much less data throughput. This paper studies the trade-off between the compression factor and the performance in speech recognition accuracy of the proposed method. Based on the simulation result, the total data throughput can be reduced by 87% while keeping the quality of the speech signal for speech recognition. An integrated circuit of Dynamic Predictive Sampling has been designed and simulated for the speech-sensing task. The proposed method saves computing overhead and data throughput, which is ideal for future low-power embedded voice recognition systems.

# 

Jonas Gillner<sup>1</sup>, Roy Taylor<sup>2</sup>, Klaus Thelen<sup>1</sup>

<sup>1</sup>Hochschule Ruhr West, Germany; <sup>2</sup>Wayne State University, United States

Abstract: This study presents a novel sensor system aimed at preventing child heatstroke in unattended vehicles. The developed system integrates a 24 GHz Continuous Wave (CW) radar, engineered to detect the subtle breathing signals of infants. A custom-designed 4-by-1 patch array antenna, optimized for this specific application, is seamlessly embedded into a designed PCB. To enhance efficiency and accuracy, the system incorporates a low-cost 8-bit Microcontroller (MC) and a dedicated hardware filter. These components facilitate a tailored signal processing algorithm specifically designed for efficient and rapid computation of the child's Respiratory Rate (RR). The system's performance evaluation involved monitoring a sleeping infant's respiratory patterns in real-time. Collected data was then compared against established reference standards to assess the system's accuracy in detecting the child's RR, providing compelling evidence of its efficacy. Furthermore, the system has demonstrated compliance with automotive industry standards, assuring seamless integration into existing vehicle frameworks. This innovation holds significant potential to proactively enhance automotive safety standards.

## Design and Optimization of Robust Process Monitors ...... 108

Shiva Sharma<sup>1</sup>, Koushik De<sup>1</sup>, Bhartipudi Sahishnavi<sup>1</sup>, Khanh M. Le<sup>2</sup>, Zia Abbas<sup>1,2</sup> <sup>1</sup>International Institute of Information Technology Hyderabad, India; <sup>2</sup>Analog Intelligent Design Inc., United States

**Abstract:** This research paper presents a pioneering approach to efficient process monitor design, focusing on robust P and N device process tracking efficiency while minimizing coupling to external factors like die temperature variations and local supply voltage drop. The proposed design method introduces a comprehensive set of process monitor (PMON) libraries tailored for specific device types supported in a target technology node. Central to this method is a novel design automation routine encompassing elementary Zero Temperature Coefficient (ZTC) biasing of a ring oscillator-based process monitor. This paper demonstrates the efficacy and scalability of the proposed method across multiple process nodes. PMON structures generated shows 50x better sensitivity with process tracking compared to state-of-the-art solutions.

<sup>1</sup>Baylor University, United States; <sup>2</sup>University of Rhode Island; <sup>3</sup>Naval Undersea Warfare Center, United States

Abstract: This study presents an innovative approach to image recognition by combining an Event Camera (EC) with a Stochastic Computing (SC) neural network. The novel aspect of this work lies in the fusion of EC, known for its temporal resolution and motion blur reduction, with SC, which offers efficient arithmetic using simple logic processes. This pairing offers the potential for power-efficient neural network systems. The EC data, which captures pixel-level intensity changes asynchronously, drives the SC neural network. A sample-and-hold system and checkerboard filter are incorporated to overcome data sparsity issues and enhance event locality. The experimental results show significant improvements, adding 20% to performance when event frequency is increased, suggesting the promise of this approach in achieving more efficient image recognition systems.

## Session A1L-F: Artificial Intelligence, Internet of Things & Systems 1

**Chair:** Jose de la Rosa, *IMSE Sevilla Spain* **Co-Chair:** Tian Xia, *University of Vermont* **Time:** Monday, August 12, 2024, 8:00 - 9:30 **Location:** Room 6

computing advancements.

**An AI-Based Approach for Accurate Fall Detection and Prediction using Wearable Sensors** ...... **118** Muhammad Azeem Sarwar<sup>1</sup>, Brandon Chea<sup>2</sup>, Max Widjaja<sup>2</sup>, Wala Saadeh<sup>2</sup>

<sup>1</sup>Lahore University of Management Sciences, Pakistan; <sup>2</sup>Western Washington University, United States

**Abstract:** Falls are a paramount concern in elderly care and injury prevention, necessitating accurate and timely interventions. This work contributes to transforming fall detection and prevention in elderly care by combining Convolutional Long Short-Term Memory (ConvLSTM) networks for real-time detection and Exponential Smoothing for early prediction. The proposed solution achieves an impressive F1-score of 0.991 when tested on various public datasets of 81 subjects, showcasing its effectiveness. In addition to real-time detection, the approach introduces proactive prediction, forecasting falls before they occur and significantly reducing response time to 1100-1250 ms with an accuracy of 98.3%. This integration of real-time detection and early prediction addresses the gap in traditional systems, improving patient safety and lessening healthcare burdens.

## Deep Learning-Based Architecture for RF Frame Detection for CR Applications using Spectrograms ...... 122

A. Rojas<sup>1</sup>, G. Liñán-Cembrano<sup>2</sup>, G. Jovanovic Dolecek<sup>1</sup>, J.M. de la Rosa<sup>2</sup> <sup>1</sup>National Institute of Astrophysics, Optics and Electronics INAOE, Mexico; <sup>2</sup>Instituto de Microelectrónica de Sevilla, IMSE-CNM (CSIC/Universidad de Sevilla), Spain

Abstract: This paper presents a deep learning-based architecture for radio frequency (RF) frame detection using a lightweight object detector originally intended to computer vision tasks. The proposed solution was implemented using an ADALM-PLUTO software-defined radio (SDR) and a Raspberry Pi 4 Model B – an affordable single board computer (SBC). First, a synthetic spectrogram dataset composed of multiple Wi-Fi and Bluetooth signals is utilized to perform transfer learning using the latest (v8) YOLO detection model. The training process was executed over a NVIDIA RTX 3060 GPU. Then, the trained neural network was transfered to the Raspberry Pi connected to the SDR for benchmarking using over-the-air signals captured by the in the 2.4 GHz band. A brief comparative analysis with recent RF frame detection works is presented, demonstrating that our approach features smaller complexity while providing acceptable performance in terms of different object detection performance metrics and processing time.

**Abstract:** Recent developments in neuromorphic computing focus on creating integrated circuits and simulating analog circuits via SPICE. However, SPICE falls short in handling microcontroller-based algorithms and circuits, essential for on-chip training in inmemory computing. Our study introduces SPICE-compatible designs in the Proteus circuit simulator, incorporating a memristor and a neuron model with a sigmoid activation function. We developed a neuromorphic accelerator with 30 memristors and 4 neurons, demonstrating on-chip training and inference with an Arduino microcontroller for binary classification of 'O' and 'X'. The research presents an efficient algorithm for memristor training and synaptic weight adjustment. The peripheral circuits manage memristors during training and inference, introducing a method to test data for network accuracy evaluation. Our hardware-software co-design achieves 97% accuracy with 5% noise and 87% accuracy with 20% noise in input images, highlighting its potential in neuromorphic

#### University of Virginia, United States

**Abstract:** Researchers are increasingly turning their attention to spiking neurons as a means to enhance the energy efficiency of edge machine learning (ML) models. Spiking neural encoding has evolved from traditional methods like Integrate and Fire and Time to First Spike to techniques such as Delta and Sigma-delta modulation (SD), which enable sparser and energy-efficient representation of features. In this work, we introduce the Sigma-Delta-Sigma neuron, a noise-invariant spiking neural encoding technique. Our objective with the SDS encoding is to develop noise-resilient spiking neural network models capable of being trained with ideal features, while still being able to extract features from low-quality data during inference. To assess the robustness of our technique, we implement an ensemble model that demonstrates a 6.2x improvement in robustness when the SNR of the incoming signal is 1 dB. Furthermore, we implement a Liquid State Machine (LSM), where the model exhibits a 3.87% improvement in the predictive accuracy against the baseline model when tested against input features with SNR degradation from 55 to 1 dB.

Throughput Optimization for Time-Domain Neuromorphic Computing ...... 136

Karsten Bergthold, Hagar Hendy, Cory Merkel, Tejasvi Das Rochester Institute of Technology, United States

**Abstract:** We present a low-power, low-area delay cell topology for time-domain (TD) neuromorphic computing. Each cell computes partial dot product operations using current summation and the result is accumulated by chaining delay cells together. An alternative clocking scheme for the delay chains is proposed that requires no additional area and can nearly 2x the throughput. In addition, the application of TD wave-pipelining is proposed to further improve throughput. These techniques were applied to a 64-input delay chain and to the design of a 3-input TD XOR gate. With the delay chain, up to a 11.54x speedup was observed. The proposed wave-pipelining provides larger speedups for longer delay chains, such as would be utilized by neural networks.

## Session A2L-A: Other Analog, Mixed-Signal & High-Frequency Circuits & Systems

Chair: Negar Reiskarimian, *Massachusetts Institute of Technology* Co-Chair: Najme Ebrahimi, *Northeastern University* Time: Monday, August 12, 2024, 11:20 - 12:32 Location: Room 1

#### 

#### North Carolina State University, United States

**Abstract:** Physical layer security is gaining a significant prominence, offering a robust alternative to encryption-based methods for emerging 5G/6G wireless networks. It has the potential to not only supplement but also potentially supplant encryption thus enabling secure, high-speed, and low-latency wireless communication. Physical layer security capitalizes on leveraging the inherent features of the wireless channel, such as noise, interference, fading, spatial diversity, and other physical channel conditions to guarantee successful data decoding for authorized users while thwarting eavesdropping attempts . In this attempt, we showcase demonstration of spatio-temporal modulation techniques to achieve physical layer security at millimeter-Wave frequencies. This invited article will present two such related works, the first one operating as a 4-element 71-76 GHz space-time transmitter array and the second work is a 3x2 element 57-64 GHz space-time reflector array. In both these work we present and discuss the ability of these arrays to securely establish wireless links to intended receivers in space while garbling the data-constellation elsewhere.

## Invited Paper: Dual-Keying Hybrid Modulation for Physical Layer Security in 5G/6G Scalable Array ...... 146

#### Haoling Li, Najme Ebrahimi

## Northeastern University, United States

Abstract: This paper introduces a novel security method for conventional time-modulated arrays, achieving security through the time modulation of antenna elements, which creates distortions in amplitude and phase at the side lobes. Previous state-of-the-art approaches have overlooked hardware non-idealities, such as switch leakage, which notably impacts harmonic power and diminishes security. To address this, the paper proposes a new dual-keying hybrid modulation technique that merges direct 1-bit amplitude modulation with time modulation. This approach enhances physical layer security by effectively utilizing the switch leakage of the off elements to evaluate the harmonic power and embed the amplitude modulation between two constellations of QPSK to distort the overal data at sidelobe.

## Invited Paper: Comparison and Analysis of On-Chip Broadband Integrated Balun ...... 152

Xu Zhu<sup>1</sup>, Hanzhong Xu<sup>1</sup>, Tian Xia<sup>2</sup>, Xudong Wang<sup>1</sup>

<sup>1</sup>Nankai University, China; <sup>2</sup>University of Vermont, United States

Abstract: This paper conducts a comprehensive compar-ison and analysis of popular on-chip passive broadband baluns, including transformer voltage/current balun, Marchand Balun, etc., emphasizing their topologies, working performance, circuit models, and bandwidth limitations. Subsequently, a novel ultra-wideband on-chip balun is introduced, and its broadband performance is demonstrated through theoretical analysis, simulation, and measurement. The measurement results validate the proposed topology's superior bandwidth and excellent performance in terms of amplitude and phase imbalances.

## 

Running Guo<sup>1</sup>, Stefan Pechmann<sup>1</sup>, Amelie Hagelauer<sup>1</sup>,

<sup>1</sup>Technical University of Munich, Germany; <sup>2</sup>Fraunhofer EMFT, Germany

Abstract: In this paper, 2T2R RRAM devices is utilized as one memory unit to provide 9 different states, in order to avoid the narrow margin between neighbouring states of the conventional 1T1R RRAM device. Parasitic capacitors and resistors are extracted from the layout version and are taken into consideration and put into simulation in a 16 × 16 RRAM array. A corresponding reading circuit based on the Strong-Arm latch is implemented and analysed. It utilizes various reference resistors for uniquely coding the 6 read out bits in 48ns of the 9 states. By regulating the references, the circuit has a tolerance to the variation of devices and technologies. The power consumption for one complete reading process is 6.7uW on average.

## Session A2L-B: System Security

Chair: Bayoumi Magdy, University of Louisiana Time: Monday, August 12, 2024, 11:20 - 12:32 Location: Room 2

Jitter-Based True Random Number Generator with Dynamic Selection Bit Reconfiguration ...... 162

Prosen Kirtonia, Shelby Williams, Magdy Bayoumi University of Louisiana at Lafayette, United States

Abstract: A compact, lightweight jitter-based True Random Number Generator (TRNG) is proposed in this paper. The key idea behind this design is to use a combination of high-entropy noise sources to select those same noise sources. This is achieved through cross-coupled connections and dynamic selection bit reconfiguration, reducing hardware consumption without compromising the quality of the generated bits. In cryptography and security, a TRNG plays a vital role in generating unique keys for encryption, thereby ensuring the integrity and protection of the system. To introduce jitter as a noise source, the design utilizes various combinations of odd numbers of inverters to create Running Oscillators (RO). For dynamic noise source selection, the design employs multiplexers and a single XNOR logic gate. Finally, the area, speed, power, and energy of the proposed TRNG are measured and calculated using Synopsys Design Compiler (SDC) for 14nm, 32nm, and 45nm technology nodes. The randomness of this design is tested and validated using NIST SP800-22 tests. All 16 statistical tests for randomness outlined in the NIST test suite pass without requiring any post-processing circuitry.

## Bistable Physically Unclonable Function with Dynamic Threshold Voltage ...... 167

Sesibhushana Rao Bommana<sup>1</sup>, Sreehari Veeramachaneni<sup>2</sup>, Srinivas MB<sup>1</sup> <sup>1</sup>Birla Institute of Technology and Science, Pilani, India; <sup>2</sup>Gokaraju Rangaraju Institute of Engineering and Technology, India

Abstract: The Physically Unclonable Function (PUF) is a crucial security module within System on Chip (SoC) architecture. Essentially, PUF serves as the unique identifier of the SoC chip, leveraging CMOS fabrication variations. This study introduces a novel approach called Dynamic Threshold Voltage (Vt) bistable PUF (DVT PUF), offering superior randomness compared to conventional bistable PUFs like SRAM. The innovation lies in replacing conventional bistable PUF gates with a dynamic Vt Schmitt gate, such as Schmitt NAND/NOR, thereby facilitating the utilization of an advanced Challenge-Response Pair (CRP). Unlike traditional bistable PUFs employing reset (RST) as the challenge to trigger metastability, the proposed method adjusts the Schmitt gate's hysteresis based on input challenges from a Host, augmenting randomness beyond CMOS fabrication variations. This enhancement results in a 33% improvement in the Hamming Weight (HW) of the PUF compared to traditional bistable PUFs.

Sara Alahmadi<sup>1</sup>, Kasem Khalil<sup>2</sup>, Magdy Bayoumi<sup>1</sup>

<sup>1</sup>University of Louisiana at Lafayette, United States; <sup>2</sup>University of Mississippi, United States

Abstract: Physically Unclonable Functions (PUFs) have been promoted as a lightweight security solution for IoT devices. They have the potential to provide authentication without the need to store a secret key or perform computational cryptography. However, PUFs are vulnerable to modeling attacks, where an adversary can clone the PUF using machine learning. Research efforts focus on improving PUF designs to enhance their resilience against these attacks. In this work, we proposed an innovative method to enhance the security of PUFs against modeling attacks. Our approach involves encoding PUF responses into constant weight vectors using the enumerative method and then obfuscating these responses through XOR operations. By transmitting random parts of the encoded responses, we add complexity to the process, making it significantly more challenging for adversaries to model the PUF using machine learning techniques. The experimental results demonstrate the efficacy of our proposed method, showing at most 56% accuracy of modeling attacks.

Ultralightweight PUF-Based Authentication Protocol: Eliminating RNGs and

Throttling for Enhanced Security ...... 178

Eric Clark, Majd Safi, Tarek A. Idriss Western Washington University, United States

**Abstract:** Physical Unclonable Functions (PUFs) are a promising technology that offers lightweight security for IoT devices. However, lightweight PUF-based solutions often throttle the device to prevent adversaries from collecting enough information that could compromise the device's security. Other solutions utilize cryptographic primitives or true random number generators to enhance the device's security. Moreover, implementing a True Random Number Generator (TRNG) on a small device can be impractical. Additionally, the device's environment can introduce external factors that may affect the randomness of the generated numbers. This work presents a novel, ultralightweight PUF-based authentication protocol that offers unlimited authentication and requires no cryptographic primitives or RNGs. The protocol utilizes challenge transformation and stores secret masks to hide the PUF output. A thorough security analysis is performed by testing the security against machine-learning-based attacks. The analysis results show that the protocol can attain high resilience against various threat models without using RNGs or throttling techniques.

## Session A2L-C: Wireless Wireline Communications Circuits & Systems

Chair: Sleiman Bou-sleiman, *Intel* Time: Monday, August 12, 2024, 11:20 - 12:32 Location: Room 3

## A Contingent Decision Equalizer using Analog-in-Time Processing for PAM-4 Wireline Receiver ...... 183

Ahmed Abdelaziz<sup>1,2</sup>, Kevin Du<sup>3</sup>, Mohamed Abouzeid<sup>4</sup>, Tawfiq Musah<sup>1</sup>

<sup>1</sup>The Ohio State University, United States; <sup>2</sup>Intel Corporation, United States;

<sup>3</sup>Advanced Micro Devices, United States; <sup>4</sup>Skyworks Solutions, United States

**Abstract:** This paper presents a new hybrid feedforward equalization technique that moves wireline receiver speed constraints from architectural limitations to circuit realization. This paves the way for continual data rate enhancement with improved circuit performance. The feedforward approach uses multi-stage mixed-mode signal processing to perform incremental equalization. To ensure reliable performance in advanced CMOS processes, time-domain analog signal processing is used. This choice also enables linear energy scaling with data rate and allows proportional power scaling with degree of equalization. A 3-tap realization of the receiver fabricated in 28 nm CMOS process achieved a data rate of 52 Gb/s with an efficiency of 1.0 pJ/b and BER below 1e-4 with moderate ISI.

## 

Sydney Lang, Tawfiq Musah The Ohio State University, United States

**Abstract:** This paper presents a relative jitter measurement circuit for use in wireline authentication receivers. Design considerations and trade-offs are discussed for a 10 Gbps realization in a 28nm CMOS process. The results show high accuracy jitter measurement over different channel profiles, with the error less than 1.25% of the bit period. Simulated power consumption indicates minimal overhead to adding the proposed jitter measurement circuit for authentication in a wireline receiver

#### Hanieh Ashrafirad, Virgilio Valente Toronto Metropolitan University, Canada

**Abstract:** The advancement of mm-scale wireless biosensors is crucial to the continuous monitoring of key health metrics that can support more efficient chronic disease monitoring and individualized treatment. CMOS potentiostats are highly advantageous for developing miniaturized, ultra-low power biosensors. In this paper, we proposed the design of a novel wireless CMOS potentiostat that combines a time-based, current input SAR readout with a pulse-harmonic modulation wireless transmitter. The potentiostat readout is designed to accommodate input current in a range between 100 pA and 10µA with a maximum energy consumption of less than 100 pJ/bit.

#### An Energy-Efficient Analog-to-Pulse Encoding for Short-Range High-Density Wireless Telemetry ...... 198 Mohammad R. Haider, Adam Farr, Showmik Singha, Syed K. Islam

#### University of Missouri, United States

Abstract: The prevalence of sensors and sensor networks has resulted in the Internet of Things (IoT), transforming modern lifestyles. However, resource-constrained IoT edge devices require inventive sensing, computing, and wireless telemetry strategies to support the massive number of devices in an IoT network. Traditional telemetry and data encoding schemes for short-range sensor networks are limited by network bandwidth, data processing capability, and power constraints imposed by the battery energy density. To mitigate these challenges, an energy-efficient architecture is presented for analog orthogonal pulse (AOP) generation and AOP-based data encoding for high-density spectrum-efficient wireless telemetry. Unlike conventional digital pulse-based encoding, higher-order analog orthogonal pulses are used for spectrum-efficient data encoding. The orthogonal pulse generator contained a reduced number of functional blocks for energy efficiency. The MATLAB-Simulink package is used to design and simulate the proposed encoding architecture and, finally, embedded into a microcontroller unit. Test results show the successful generation of analog orthogonal pulses and pulse-based signal encoding.

## Session A2L-D: Signal Processing for Wireless Communications

Chair: Jose de la Rosa, *IMSE Sevilla Spain* Co-Chair: Fred Harris, *San Diego State University San Diego* Time: Monday, August 12, 2024, 11:20 - 12:32 Location: Room 4

#### Novel Design of Depth-L Nyquist (N) Filters for Linear and Time Variant Channel

 Fractionally Spaced Equalizer
 203

 Alfonso Fernandez-Vazquez<sup>1</sup>, Gordana Jovanovic Dolecek<sup>2</sup>
 1

<sup>1</sup>National Institute of Astrophysics, Optics and Electronics INAOE, Mexico; <sup>2</sup>Instituto Politecnico Nacional, Mexico

**Abstract:** This paper addresses the design of FIR depth-L Nyquist (N) filter and its corresponding biorthogonal partners. The proposed approach is based on the equalization of the comb filters. The resulting biorthogonal filters have larger attenuation at the stopband than the original comb filters. As an application of the proposed filter, we consider the design of fractional spaced equalizers considering linear and time variant channels.

Exploring Digital Communication Concepts to Improve Audio Watermarking System207Carlos J. Santin-Cruz, Gordana Jovanovic Dolecek

#### National Institute of Astrophysics, Optics and Electronics INAOE, Mexico

Abstract: This paper investigates the application of digital communication concepts, specifically spread spectrum and channel coding techniques, to an audio watermarking system that embeds messages in the time domain, with the objective of enhancing system performance. We focus on a spread spectrum-based watermarking system and conduct a thorough examination of the data payload, computational complexity, and imperceptibility of the system using various channel coding techniques. Our approach involves the implementation of three distinct channel coding techniques: extended Golay block code, convolutional code, and Low-Density Parity Check (LDPC) code. The results of our MATLAB simulations are presented in the concluding section, providing a comparative analysis of the benefits and drawbacks of each coding technique.

<sup>1</sup>University of California, San Diego, United States; <sup>2</sup>Ramon Space, Israel

**Abstract:** The M-path polyphase analysis channelizer is a filter bank formed by a collection of coupled multirate digital signal processing algorithms. The multirate channelizer uses aliasing, caused by embedded resampling, to translate multiple center frequency bands to baseband. The filter bank is formed by three core sub-processes: an input commutator, an M-Path partition of an oversampled low-pass filter, and an M-point IFFT. Its simplest realization is the maximally decimated filter bank. It outputs M baseband time series from M equally spaced Nyquist zones separated by fs/M. Each time series has bandwidth and sample rate also equal to fs/M. Modifications to the channelizer are many and include inputto-output resampling ratios of M-to-1, of M/2-to-1, and of 3M/4-to-1. The corresponding output sample rates are fs/M, 2fs/M, and (4/3)fs/M. The different resampling ratios require minor modifications to the channelizer structure. We present and compare implementations of channelizers with the three resampling ratios cited above and address the motivation and benefits of each option.

## Session A2L-E: Sensor Circuits & Systems 2

**Chair:** Ruolin Zhou, *University of Massachusetts Dartmouth* **Co-Chair:** Prafull Purohit, *Brookhaven National Laboratory* **Time:** Monday, August 12, 2024, 11:20 - 12:32 **Location:** Room 5

| Providing Cadence Feedback in Real-Time to Guide Cardiovascular Workouts | 218 |
|--------------------------------------------------------------------------|-----|
| Levi Rash, Vladimir Prodanov                                             |     |

California Polytechnic State University, San Luis Obispo, United States

Abstract: Cardiovascular workouts offer numerous health benefits, yet beginners often find it challenging to initiate them. Existing wearable technologies, although providing valuable feedback such as heart rate zones, often disrupt the workout flow and distract users due to the need for interaction with the wearable display. In response, we propose an alternative feedback mechanism: cadence, measured in steps per minute. This feedback mechanism uses multiplicative control to produce the correct cadence for the user's target heart rate (HR). To model the HR and cadence relationship, a first-order system was used. The prototype implementation of this system was completed in Arduino, using a force sensitive resistor (FSR) to measure the user's cadence and a POLAR HR strap to measure the user's HR. The cadence is updated every 12 seconds to allow the user to sync to the cadence metronome provided by an attached buzzer. This proposed system has shown promising experimental results for both moderate-intensity and high-intensity workouts, skipping the transition zone between them. This allows the user to avoid awkward workout intensities between a fast walk and a jog.

Abstract: Current-mode analog front ends for reading biosen- sors have emerged as a prominent design approach in recent times. This method involves processing input current signals directly with current-mode blocks instead of converting them to voltage and utilizing voltage-mode blocks. The primary advan- tages include reduced power consumption, minimized area usage, and increased dynamic range, particularly at low voltage levels. In current-mode design, the current conveyor plays a pivotal role. Therefore, this work focuses on the design and fabrication of a second generation current conveyor (CCII) to serve as a current buffer in low-LED photoplethysmography (PPG) or any bio- potential readout analog front end with input currents ranging from 1.5nA to 1000nA. Our fabricated CCII consumes  $1.79\mu$ W and the  $\beta$  dependency to the load is improved 33% in compared to previous design. Also, X node follows Y node voltage with 10mv variation for full input current range. The area usage of our CCII is  $3400\mu$ m2.

Graphite based Flexible Thermocouple Array for PCB Thermal Monitoring ...... 228

## Kishan Kartha, Alex James

#### Digital University Kerala, India

Abstract: This paper describes the development and validation of an experimental, cost-effective, and flexible thermocouple crossbar array enabled with the help of deep learning. The thermocouple is constructed using graphite traces and cellulose paper, with individual graphite thermocouples exhibiting a thermopower of 4\$\mu\$V/\$^{\circ}\$C. Integrating a deep learning-based regression model enables the crossbar array to achieve promising performance in localized temperature measurements, with an RMS temperature error close to 0.17. Additionally, the potential of this novel sensor array to streamline sensor circuitry is discussed alongside its application in PCB temperature monitoring and mapping. Results demonstrate its capability to accurately predict temperature variations among different PCB components. The focus on affordability, flexibility, and disposability positions these scalable crossbar arrays as potential solutions for real-time thermal mapping across various industries.

| Water Current Velocity Measurements by a Magnetometer-Based Tilt                        | 233 |
|-----------------------------------------------------------------------------------------|-----|
| Juan Montiel-Caminos, Nieves G. Hernandez-Gonzalez, Javier Sosa, Juan A. Montiel-Nelson |     |

Universidad de Las Palmas de Gran Canaria, Spain

Abstract: A novel instrument based on the tilt-drag principle is developed for measuring the instantaneous velocity of the water current. Instead to obtain the tilt by an accelerometer, a magnetometer is used. For a range of underwater velocities from 5 cm/s to 50 cm/s, the resolution of the instrument is 8 1/2 bits. Edge computation of the water velocity is executed onboard by a microcontroller with a core based on an ARM CORTEX M0+. The error is less than a 5% and the sensitivity 0.12 cm/s/LSB

## Session A2L-F: Artificial Intelligence, Internet of Things & Systems 2

Chair: Juan A. Montiel-Nelson, Institute for Applied Microelectronics **Co-Chair:** Sahil Shah, University of Maryland, College Park Time: Monday, August 12, 2024, 11:20 - 12:32 Location: Room 6

## 

Arnab A. Purkayastha<sup>1</sup>, Shobhit Aggarwal<sup>2</sup>

<sup>1</sup>Western New England University, United States; <sup>2</sup>University of North Carolina at Charlotte, United States

Abstract: Predictive maintenance plays a crucial role across diverse domains, ensuring operational efficiency and reliability. However, the increasing need for securing maintenance processes, coupled with the vulnerability of these systems to various attacks, presents significant challenges. This paper explores the application of federated learning (FL) in predictive maintenance, focusing on its advantages in bandwidth efficiency, privacy preservation, reduced power consumption, lower latency and faster training times. We examine how FL can enhance predictive maintenance systems while address security concerns and mitigate potential hardware and software attacks. By analyzing existing works, we aim to highlight the potential of FL in revolutionizing predictive maintenance practices and fostering efficient and robust maintenance strategies across various domains.

#### Woohong Byun, Jongseok Woo, Saibal Mukhopadhyay Georgia Institute of Technology, United States

Abstract: As large language models continue to advance, handling extended context lengths becomes crucial for improving natural language processing tasks. However, it also introduces significant computational challenges, particularly in managing KV cache memory. To address this, we present a novel Hessian-aware KV cache quantization method for LLMs. This method employs a weightaware head-wise Hessian analysis and a three-tiered mixed-precision quantization strategy. It assigns high precision to magnitudebased outliers, intermediate precision to Hessian-based important elements, and low precision to the rest, achieving an optimal balance between memory compression and accuracy. Our extensive testing across various LLM configurations and datasets shows that this method achieves significant KV cache compression rates, ranging from 8.21× to 9.7× while maintaining minimal deviation in accuracy compared to the FP16 baseline. This enhancement enables a remarkable 3.8× higher throughput (tokens/second) than the FP16 baseline, highlighting the practical utility of our approach in addressing computational challenges in LLMs.

#### Randy Essikpe, Jennifer Blain Christen

Arizona State University, United States

Abstract: Many remote powerlines do not have enough wildfire surveillance to enable preventive or mitigation measures, resulting in massive destruction in the incidence of wildfires hitting powerlines. This project seeks to build a multi-sensor-based embedded system that monitors wildfire-related weather conditions to assess the risk and alert the appropriate fire management team, via a wireless data transfer protocol in case of outbreaks. The design of the system will prove useful at power stations where other safety features are incorporated to reduce the occurrences of fires. The embedded system works based on a Hot-Dry-Windy index that monitors fire weather conditions that directly affect the spread of wildfires.

#### 

Michael Muller, Alexander Tyshka, Max Theisen, Darrin Hanna Oakland University, United States

Abstract: Large language models (LLMs) provide unique scalable solutions and/or alternatives to many areas in the natural language space such as language translation, text generation and content summarization. However, they require significant computing power to achieve high token throughput, which is costly. This work describes the design space exploration (DSE) performed and the resulting optimal architecture. Based on the DSE, we demonstrate an adapted implementation of the TinyStories LLM that is quantized and implemented using programmable logic and software on a ZYNO FPGA running Python on the ARM core. This TinyLLM retains the ability to produce comprehensive text, generating ~3.78 tokens/s, with an observed 37-times speedup over a CPU-based implementation on the same hardware while consuming just under 1.5W of power.

## Session A3L-A: Industrial Session 1: MathWorks

**Chair:** Ramnarayan Krishnamurthy, *Math Works* **Time:** Monday, August 12, 2024, 14:00 - 15:30 **Location:** Room 1

#### **MathWorks Presentation**

Ramnarayan Krishnamurthy

## Session A3L-B: Digital Circuits & Systems 2

**Chair:** Sungho Kim, *The University of Rhode Island* **Co-Chair:** Juan A. Montiel-Nelson, *Institute for Applied Microelectronics* **Time:** Monday, August 12, 2024, 14:00 - 15:30 **Location:** Room 2

## SYCL-Based Acceleration of Canny Edge Detector Algorithm using DPC++ ...... 258

Mahdi Hashemi, Mohammed A.S. Khalid, Roche Christopher University of Windsor, Canada

Abstract: Edge detection is a fundamental task in image processing which has proliferated into various fields. This paper presents a SYCL-based Data Parallel C++ (DPC++) implementation and evaluation of Canny Edge Detection (CED) in CPU-GPU and CPU-FPGA heterogeneous computing platforms. The CED accelerator was implemented and evaluated on two state-of-the art Intel FPGAs, Arria 10 GX-1150 and Stratix 10 SX-2800 and an Intel GPU, UHD P630 with the Intel's CPU, Xeon Gold-6128 . Along with a mathematical optimization, design optimizations specific to parallel computing were leveraged for efficiency and speedup. The evaluation of the implementation reports speedups of 11.6X and 11.7X for Arria 10 and Stratix 10 FPGAs respectively with respect to the CPU-only implementation. Also, both FPGAs completed the algorithm more than 4 times faster than the GPU. Furthermore, Arria 10 FPGA is 31 times more energy efficient than the GPU. With 0.214 milliseconds execution time to detect edges in an image with size 256x256 pixels, the recommended dimension for real-time image processing applications, our CPU FPGA implementation is best suited for real-time application

#### 

Dyk Chung Nguyen, Yuriy V. Pershin

University of South Carolina, United States

Abstract: We present a fully parallel digital memcomputing solver implemented on a field-programmable gate array (FPGA) board. For this purpose, we have designed an FPGA code that solves the ordinary differential equations associated with digital memcomputing in parallel. A feature of the code is the use of only integer-type variables and integer constants to enhance optimization. Consequently, each integration step in our solver is executed in 96 ns. This method was utilized for difficult instances of the Boolean satisfiability (SAT) problem close to a phase transition, involving up to about 150 variables. Our results demonstrate that the parallel implementation reduces the scaling exponent by about 1 compared to a sequential C++ code on a standard computer. Additionally, compared to C++ code, we observed a time-to-solution advantage of about three orders of magnitude. Given the limitations of FPGA resources, the current implementation of digital memcomputing will be especially useful for solving compact but challenging problems.

| Software-Hardware Codesign of Ray-Tracing Accelerator for Edge AR/VR with        |     |
|----------------------------------------------------------------------------------|-----|
| Viewpoint-Focused 3D Construction and Efficient Data Structure                   | 267 |
| Shiyu Guo <sup>1</sup> , Sachin S, Sapatnekar <sup>2</sup> , Jie Gu <sup>1</sup> |     |

<sup>1</sup>Northwestern University, United States; <sup>2</sup>University of Minnesota, United States

Abstract: Physics Based Ray-Tracing (PBRT) rendering is a process to generate synthesized images by simulating real environment using the spatial, reflection, refraction, diffusion information from the scene to achieve photorealism. Although PBRT is widely used in offline rendering, the ray-tracing technique in PBRT for realistic modeling of light transport suffers from major difficulties on resourcelimited edge devices due to its overwhelming workload of computing and irregular memory access pattern limiting to its real-time usage only on the state-of-the-art GPUs. In this paper, we proposed a hardware acceleration solution which incorporates novel techniques including customized hardware data structure acceleration, scene background information clustering, 3D Construction and adaptive mixprecision computing scheme leading to a low-cost ray-tracing rendering solution suitable for edge devices. Experimental results on opensource dataset show a 6X reduction of computing workload, 56X saving of background memory requirements and 33% power saving compared with baseline design, enabling hardware adaptation of PBRT accelerator on a resource-limited edge device.

## Low-Latency Deterministic Multiplier for Stochastic Computing ...... 272

Anwar K. Hussein, N. Sertac Artan

New York Institute of Technology, United States

Abstract: The cost, power consumption, and availability of large-scale computing resources dampen the progress in computing. Stochastic Computing (SC) aims to mitigate this issue with more efficient primitives for approximate computing. Yet, high latency and low accuracy prevent the adoption of SC. SC accuracy can be improved by replacing Random Number Generators with generators of Low Discrepancy (LD) sequences, such as Sobol. The large area requirement of these generators are mitigated with Finite-State Machine (FSM) based methods. FSM-based multipliers led to significant gains in accuracy and latency. Nevertheless, FSM-based methods are still impractical, as multiplication still takes up to 2^2N cycles. Another approach for improving latency is resolution splitting (RS). However, RS does not take advantage of the properties of the operands. In this paper, we propose to merge these two leading approaches for reduced latency. The speed advantage of the proposed approach is demonstrated with an image-filtering task. The proposed approach speeds up multiplications up to a 604X compared to conventional SC, and a 2.35X increase compared to the state-ofthe-art SC at the cost of increased area.

#### CharLib: An Open Source Standard Cell Library Characterizer ...... 277

Marcus Mellor, James E. Stine

Oklahoma State University, United States

**Abstract:** With the current rise in popularity of open source silicon design there is a need for low-cost high-quality SoC design tools. This paper presents an open-source python-based standard cell library characterizer compatible with combinational and sequential cells. This characterizer can be configured from a single YAML file, reducing the complexity of the characterization process for silicon designers and enabling repeatable characterization through easy integration with existing version control tools. The characterizer using an open-source PDK. Timing results are within 1 nanosecond mean absolute error compared to commercial characterization tools.

## Session A3L-C: Circuits & Systems for Communications 2

Chair: Mengting Yan, *Analog Devices* Time: Monday, August 12, 2024, 14:00 - 15:30 Location: Room 3

**Abstract:** As wireline communication links transition from 100 Gb/s to 200 Gb/s per lane, new and complex forward error correction (FEC) architectures have been proposed to achieve acceptably low bit error rates (BERs). It is essential to understand how these architectures impact link performance under various channel conditions. However, existing methods of post-FEC BER analysis are unsuitable for this application. To address this, we present a flexible platform for FPGA-accelerated post-FEC BER analysis of 200 Gb/s wireline systems. This platform can accurately demonstrate post-FEC BERs at the 1e-12 level within a day of simulation time, a speed improvement of 10,000X over software based simulation platforms.

#### Multi-Channel Nonlinearity Mitigation using Neural-Network-Based Feedback

Texas A&M University, United States

Abstract: This research delves into feedback cancellation for multi-channel receivers using neural networks (NNs) via the channel decision passing (CDP) algorithm, targeting the mitigation of receiver nonlinearities. Traditional approaches largely depend on polynomial models to estimate harmonic distortions and intermodulation products arising from analog front-end (AFE) nonlinearities. This paper introduces a more accurate method employing trained NNs. Through extensive simulations on a 3-channel receiver setup, the NN-based CDP algorithm demonstrated superior signal-to-noise ratio (SNR) improvements over conventional multi-channel decision feedback cancellation (MCDFC) and transversal feedforward NN methodologies. When comparing bit-error-rate (BER) performances, the NN-based CDP achieves a similar BER to its polynomial-model-based counterpart for moderate AFE nonlinearities without a significant complexity increase. Importantly, it provides better BER performance under pronounced AFE nonlinearities with a manageable complexity rise, showcasing the potential of NNs in enhancing receiver performance while keeping computational demands in check.

| Relay Selection Machine Learning-Based for a DF Cooperative System with                                       |   |
|---------------------------------------------------------------------------------------------------------------|---|
| Energy Harvesting and Signal Space Diversity                                                                  | 1 |
| Ahmed Ammar <sup>1</sup> , Ahmed Oun <sup>2</sup> , William Martell <sup>1</sup> , M. Ajmal Khan <sup>1</sup> |   |
| <sup>1</sup> Ohio Northern University, United States; <sup>2</sup> Ohio University, United States             |   |

Abstract: Machine learning techniques have been employed in communication systems to offer resilient and low-complexity solutions. Accordingly, this paper explores the utilization of machine learning algorithms for real-time relay selection in a cooperative communication system with signal space diversity and multiple energy harvesting relays. The relay selection criteria involve successful decoding from the source, sufficient energy, and the best channel to the destination. Conventional machine learning algorithms, including Decision Tree (DR), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Support Vector Machines (SVM), are implemented, and the algorithm with the highest accuracy is used for real time relay selection. Performance evaluation is in terms of outage probability, and the exact outage probability performance is determined via Monte Carlo simulation. Results indicate that KNN exhibits the highest accuracy ; but its performance falls short of closely approximating the exact outage probability due to data complexity, necessitating the need for more advanced classification models such as artificial or deep neural networks to accurately approximate the exact performance.

## 

Jawhar S, Sankaran Aniruddhan

#### Indian Institute of Technology Madras, India

**Abstract:** This paper presents a 6-bit passive switched-type phase shifter operating from 11 to 14 GHz in the Ku frequency band. The phase shifter comprises six phase shifting blocks with a phase coverage of  $354.375^{\circ}$  and a  $5.625^{\circ}$  phase resolution for the LSB. The  $180^{\circ}/90^{\circ}$ ,  $45^{\circ}/22.5^{\circ}$ , and  $11.25^{\circ}/5.625^{\circ}$  phase shifting blocks are implemented using switch-type high-pass/low-pass, bridged T-type, and band-pass/low-pass topology, respectively. The phase shifter is designed in a TSMC 28nm CMOS technology node. Post-layout simulation results over all 64 states show that insertion loss is  $9.7\pm2.3$  dB, and the RMS phase error is less than  $2^{\circ}$  in the 11-14 GHz range. The input/output return loss is maintained better than 7.5 dB across all states, and the noise figure of the phase shifter remains below 11.3 dB across all states throughout the entire bandwidth. The overall size of the phase shifter core is  $1.3\text{mm} \times 0.4\text{mm}$ .

## Session A3L-D: RF Circuits

Chair: Tejasvi Das, *Rochester Institute of Technology* Co-Chair: Mengting Yan, *Analog Devices* Time: Monday, August 12, 2024, 14:00 - 15:30 Location: Room 4

#### 

<sup>1</sup>Indian Institute of Technology Madras, India; <sup>2</sup>Molex India Pct Ltd, India; <sup>3</sup>Texas Instruments India Pvt Ltd, India

**Abstract:** An ultra wideband direct-conversion transmitter for software defined radios is presented in this work. The proposed transmitter is designed to operate with up to 200 MHz modulated signals and carrier frequencies ranging from 100 MHz to 12 GHz. A voltage-mode passive mixer performs relatively linear up-conversion with wide bandwidth. Inductive peaking further enhances the bandwidth of several circuit blocks to enable operation up to 12 GHz, with the capability of delivering up to +6 dBm saturated power over the entire band. The transmitter is fabricated on a TSMC 65-nm CMOS process and measurement results up to 2.5 GHz show good match with simulated performance. It occupies an overall die area of 1.5 mm X 1.1 mm, including pads.

#### Performance Analysis of SiGe HBT-Based N-path Receiver without Nonoverlapping LO Signal ...... 305

Pujan K. C. Mishu, Waleed Ahmad, Ebrahim Al Seragi, Saeed Zeinolabedinzadeh *Arizona State University, United States* 

Abstract: This paper presents the performance analysis and comparison between silicon germanium heterojunction bipolar transistor (SiGe HBT)-based and field effect transistor (FET)-based N-path receiver architectures under the application of sinusoidal local oscillator (LO) signals. Although nonoverlapping LO signals are typically advised for the N-path structure to minimize noise figure (NF), generating these signals at mm-wave frequencies is challenging, and can be quite costly due to high power consumption, activated parasitic capacitance, and tuned resonances at millimeter wave (mmW) frequencies, and design complexity. Conversely, implementing overlapping sinusoidal LO signals at high frequencies can be achieved with relative ease. In the case of easily implementable sinusoidal LO, SiGe HBT-based N-path architecture shows better resilience against the NF degradation mostly caused by overlapping LO signal, compared to FET-based architecture. A SiGe HBT-based N-path receiver was fabricated in 130 nm SiGe BiCMOS technology which shows the P1dB of -3.9 dBm, NF of 16.5 dB at 28 GHz when utilizing standard sinusoidal LO signal.

Rizwan Shaik Peerla<sup>1</sup>, Bibhu Datta Sahoo<sup>2</sup>

<sup>1</sup>Green PMU Semi Private Limited, India; <sup>2</sup>University at Buffalo, United States

Abstract: This paper proposes a modular extended range divider structure for multi-band phase locked loop. The design of the proposed 2/3 divider cell is such that the power consumption and chip area of the extended range divider and the conventional multi modulus divider are the same. The divider is designed in TSMC 65 nm CMOS using a 1.2 V supply and has been verified to work reliably across process, voltage, and temperature variation. The worst case current, rise time, and fall time at 2.5 GHz is 420  $\mu$ A, 12.2 ps, and 10.8 ps, respectively

Anindita Roy Chowdhury, Kamlesh Badiyari NXP Semiconductors, India

**Abstract:** This paper presents a passively gain-boosted twoport N-path bandpass filter. In the proposed filter, gain-boosting is achieved by commutating a charge-pump. The harmonic transfer function of the proposed filter has been derived and verified through simulations. A prototype four-path passively gain-boosted BPF is designed in a CMOS 65 nm technology and the simulation results are presented. The center frequency of the filter is tunable from 0.2 GHz to 1.2 GHz. Over the tuning range, the filter has a peak gain in-between 7.6 dB and 9.8 dB, NF < 4:7 dB, and IIP3 > 6 dBm.

<sup>1</sup>Purdue University, United States; <sup>2</sup>Norwegian University of Science and Technology, Norway

**Abstract:** This paper explores the potential of interfacing a Gallium Nitride High Electron Mobility Transistor (GaN HEMT) with Josephson Junction technology at cryogenic temperatures to achieve high-frequency signal generation. A high frequency GaN transistor is developed and modeled using Synopsys, and integrated into a cascade amplifier using Y-Parameters extracted from the Synopsys toolkit. A voltage amplification is demonstrated as high as 1.2THz, and power amplification up to 360GHz. The development of high frequency generation and amplification method

## Session A3L-E: Sensor Circuits & Systems 3

**Chair:** Ruolin Zhou, *University of Massachusetts Dartmouth* **Co-Chair:** Prafull Purohit, *Brookhaven National Laboratory* **Time:** Monday, August 12, 2024, 14:00 - 15:30 **Location:** Room 5

#### An Impedimetric Oscillator based Analog Front-End System and Its Application in

Indian Institute of Technology Kharagpur, India

Abstract: This paper introduces a novel Impedimetric Oscillator based Analog Front End system. It is essentially a variation of a phase shift oscillator, tailored for measuring phase changes in impedimetric sensors, thereby serving as a sensor acquisition system. It captures the relative phase between two sensors in the form of its output frequency, effectively compensating multiplicative non-idealities of the sensor pair, resulting a good linearity in its characteristic. Furthermore, the versatility of this proposed oscillator architecture lies in its adaptability to any frequency range by tuning components values, ensuring optimal sensor operation. The system's performance has been thoroughly analyzed and simulated using LTspice software with the electrical equivalent of a PMMA sensor, employed for measuring milk adulteration concentration. Additionally, a hardware prototype has been implemented and tested on PMMA-coated impedimetric sensors in adulterated milk, demonstrating the practical utility of the proposed AFE system for impedimetric sensors.

**Abstract:** This paper describes the design and development of a prototype laboratory test instrument. The instrument is designed to move a special type of camera called an event camera. The embedded system uses a microcontroller to move an Hbot gantry system. The performance of the instrument was evaluated using a VICON motion capture system and an iniVation DAVIS346 COLOR event camera.

Ahmed A. Zakaria<sup>1</sup>, Ahmed Allam<sup>1</sup>, Tanemasa Asano<sup>2</sup>, Adel B. Abdel-Rahman<sup>1</sup>

<sup>1</sup>Egypt-Japan University of Science and Technology, Egypt; <sup>2</sup>Kyushu University, Japan

Abstract: This research explores the impact of resonator proximity on sensitivity in sensor design utilizing Rogers 4360G2 as the substrate material with (\$\varepsilon\$) of 6.15. Noninvasive blood glucose detection is a critical area of research, offering benefits such as improved patient comfort and reduced infection risks. By comparing the sensitivity of near and far resonators using S21 measurements, the study evaluates their ability to detect and measure changes in the system. The results indicate that near resonators exhibit higher sensitivity due to improved energy storage and transmission efficiency. In contrast, far resonators positioned at a greater distance demonstrate reduced sensitivity, posing challenges in detecting subtle variations. This research emphasizes the importance of selecting resonator configurations based on specific application requirements. Leveraging the advantages of near resonators enables the development of highly sensitive and accurate sensing systems, enhancing performance and expanding capabilities across various fields.

| Improving Glucose Sensor Sensitivity with Dual Resonator Placement                                                         | 340 |
|----------------------------------------------------------------------------------------------------------------------------|-----|
| Ahmed A. Zakaria <sup>1</sup> , Ahmed Allam <sup>1</sup> , Tanemasa Asano <sup>2</sup> , Adel B. Abdel-Rahman <sup>1</sup> |     |

<sup>1</sup>Egypt-Japan University of Science and Technology, Egypt; <sup>2</sup>Kyushu University, Japan

**Abstract:** This study presents a novel design of a planar microwave sensor for non-invasive blood glucose monitoring. The sensor utilizes two circular-shaped complementary split ring resonators (CSRRs) fabricated on a Rogers substrate with an epsilon value of 6.15 to achieve a characteristic impedance of 50 ohms. The impact of varying placement distances between the CSRRs on the sensor's sensitivity to different blood glucose concentrations is investigated. Simulation results show that a placement distance of 15 mm between the CSRRs yields the highest sensitivity. These findings highlight the importance of optimizing the proximity of the resonators for enhanced sensor performance. The proposed sensor design offers a promising approach for accurate and non-invasive glucose monitoring, contributing to developing advanced biosensing technologies for diabetes management.

## Session A4L-A: Industrial Session 2: Red Pitaya

Chair: Ziga Agostini, *Red Pitaya* Time: Monday, August 12, 2024, 16:00 - 17:30 Location: Room 1

**Red Pitaya Presentation** Žiga Agostini *Red Pitaya, Slovakia* 

## Session A4L-B: Digital Circuits & Systems 3

Chair: Ruolin Zhou, University of Massachusetts Dartmouth Co-Chair: Love Sah, Western New England University Time: Monday, August 12, 2024, 16:00 - 17:30 Location: Room 2

## 

Reyad El-Khazali Khalifa University, U.A.E.

**Abstract:** This work introduces a new class of fractional-order digital filters of complex orders. The complex-order dynamics are first approximated by frequency-dependent minimum-phase continuous transfer functions within a frequency band. The indirect discretization method is then used to discretize the approximated form of the fractional-order Laplacian integro-differential operators of complex orders,  $s^{(\pm \alpha \pm j\beta)}$ , to obtain a rational z-transfer function  $H(z,\pm \alpha,\pm \beta)$ . The positive (negative) complex orders of the fractional-order differentiators (integrators) provide a leading phase at high frequency, respectively, which add new features for a new class of controllers of complex orders. The main points of this work are verified via numerical simulations.

# Juneeth Kumar Meka, Ranga Vemuri

University of Cincinnati, United States

Abstract: To ensure the efficiency of hardware security applications, it is crucial to conduct a thorough evaluation that includes both regular and edge cases of various circuit features. This diligent approach helps to identify any potential weaknesses or vulnerabilities that could compromise the security of the application. Attributed graph grammars have proven useful in creating interesting and constraint-compliant structures in various design fields. In this paper, we introduce a new framework called Attributed Circuit Transformation (ACT). The ACT framework comprises a new language named the ACT language and the ACT system which are based on attributed graph grammars. The ACT framework can generate synthetic circuits that comply with specific constraints on multiple design parameters, while also being flexible and scalable. This paper provides a detailed discussion of the ACT framework and also demonstrates its usefulness through an example hardware security application involving SCOAP value analysis of the generated circuits.

Aymen-Alaeddine Zeghaida, Dinesh Daultani, J.M. Pierre Langlois, Jean Pierre David Polytechnique Montréal, Canada

Abstract: Modern neural network models consume a large proportion of power in matrix-vector multiplications. At inference time, the trained network parameters are constant and organized in matrices. Knowing in advance the constant matrices can greatly benefit the throughput, hardware utilization, and energy consumption of neural network implementations. This paper seeks to achieve hardware optimizations and reduce the computational cost of constant matrix multiplication by exploiting redundancies in sub-expressions common to several products and outputs. For FPGA implementation, our results show that the number of lookup tables can be decreased by up to 62% compared to a standard baseline version already optimized by vendor tools.

| AN  | ovel <b>E</b> | )elay- | Aware   | Packing Alg | gorithi | n for F | PGA Ar | chitecture us | ing RFET | •••••• | 362 |
|-----|---------------|--------|---------|-------------|---------|---------|--------|---------------|----------|--------|-----|
| She | ng Lu,        | Liutin | ng Shan | g, Sungyong | Jung,   | Yichen  | Zhang, | Chenyun Pan   |          |        |     |
| 771 | <b>T</b> T .  | • .    | C TT    |             | TT      | 10.     |        |               |          |        |     |

#### The University of Texas at Arlington, United States

Abstract: Reconfigurable devices are gaining increasing attention as a viable alternative and supplementary solution to traditional CMOS technology. In this paper, we develop a more efficient 2-input look-up table (LUT) based on the reconfigurable field-effective transistors (RFETs), leading to a smaller transistor usage and a smaller critical path delay. The cells are organized into regular matrices, known as MClusters, with a fixed interconnection pattern to replace LUTs in field-programmable gate arrays (FPGAs). To improve the efficiency of utilizing this structure, we design a SAT-based delay-aware packing algorithm to better utilize logical gates for the MCluster structure. Finally, we combine this algorithm with FPGA simulation tools to form a comprehensive benchmarking flow. A series of benchmark tests show that under the optimal design, up to 35% and 30% reduction can be achieved in delay and energydelay product (EDP), respectively, compared to the traditional CMOS FPGAs.

## Session A4L-C: CMOS RFIC

Chair: Najme Ebrahimi, Northeastern University Co-Chair: Jian Shao, Infineon Technologies Time: Monday, August 12, 2024, 16:00 - 17:30 Location: Room 3

Bruno Barajas<sup>1,2</sup>, Reza Molavi<sup>2</sup>, Shahriar Mirabbasi<sup>2</sup> <sup>1</sup>Synopsys Inc, Canada; <sup>2</sup>University of British Columbia, Canada

Abstract: This work introduces a design technique to enhance the second-harmonic output in push-push LC VCOs through the use of a custom-engineered nonlinear varactor array. By leveraging the array nonlinearity, the amplitude of the second harmonic is significantly increased while care has been taken to maintain the spectral purity at the output. Post-layout simulation results confirm the boost in the second-harmonic amplitude, suggesting a promising approach for high-frequency CMOS VCO design. This technique can potentially simplify circuit design and improve efficiency in systems requiring enhanced harmonic signals. Using the proposed technique, a low-power LC-VCO is designed, laid out, and simulated in a 65-nm CMOS technology. Based on the post-layout simulation results, the VCO has a frequency tuning range of 17.7% centered at around 60 GHz. It exhibits a relatively large amplitude across the tuning range with a maximum of 130 mVpp at the highest frequency of operation while consuming 4.2 mW from a 1.2-V supply.

Chaoyu Yang<sup>1</sup>, Benqing Guo<sup>1</sup>, Haishi Wang<sup>1</sup>, Yao Wang<sup>2</sup>, Jun Chen<sup>3</sup>

<sup>1</sup>Chengdu University of Information Technology, China; <sup>2</sup>Zhengzhou University, China;

<sup>3</sup>Huawei Technologies Co., Ltd., China

**Abstract:** This paper introduces a positive feedback post-distortion (PD) linearization technique. Using a weakly inverse auxiliary transistor, compensates for the third-order non-linear current in the input CG transistor, enhancing linearity and increasing overall first-order transconductance for improved gain. The introduced additional noise is minimal. Leveraging on-chip inductance between the auxiliary and cascaded Common Gate (CG) transistors, in resonance with circuit parasitic capacitance, further reduces noise and enhances linearity. A 30-39 GHz CMOS linear Low-Noise Amplifier (LNA) is designed using this technique, implemented in 65 nm CMOS technology with a compact area of 0.15 mm<sup>2</sup>. Operating at 1.2 V, it consumes 11.87 mW. The LNA achieves 12.3-12.7 dB gain, 3.1-3.4 NF, and 6.6 dBm IIP3 within this frequency range.

#### A 14.5 GHz Dual-Core Noise-Circulating CMOS VCO with Tripler Transformer Coupling,

Xingyue Liao<sup>1</sup>, Benqing Guo<sup>1</sup>, Huifen Wang<sup>2</sup>

<sup>1</sup>Chengdu University of Information Technology, China; <sup>2</sup>Henan University of Technology, China

**Abstract:** A low phase noise voltage-controlled oscillator topology is proposed in this paper. Based on three-coil transformer coupling, the oscillator is kept in class-F low noise mode; The coupling of the three-coil transformer forms a noise-circulating structure at the source of the tail current transistor to reduce the noise of the transistor. The VCO is extended from single-core to dual-core to achieve further reduction of phase noise. In addition, transformer three-coil coupling structure is not only used for pMOS tail transistor current control, but also can eliminate the common mode parasitic capacitance of traditional common source structure to avoid common mode oscillation. Based on the standard TSMC 65 nm CMOS process, the simulation results show that the phase noise obtained at 1 MHz offset is -123.6 dBc/Hz, and the FoM is 191 dBc/Hz. The core area of the circuit is 0.06 mm2, and the current consumption is 30.75 mA under the supply voltage of 1.2 V.

#### 

<sup>1</sup>Chengdu University of Information Technology, China; <sup>2</sup>Huawei Technologies Co., Ltd., China

**Abstract:** In this paper, we propose a broadband balun low-noise amplifier (LNA), employing a two-stage capacitor cross-coupled common gate (CCCG) and complementary n/pMOS configuration. Based on the traditional common-source(CS) active feedback to-pology, the two-stage CCCG structure employed reduces the gain and phase errors of the differential output. Additionally, the complementary n/pMOS configuration is utilized to achieve doubled current efficiency while alleviating the voltage headroom at the output. By using the series peaking technique, the gate-inductor extends the bandwidth of S11. After applying these techniques, we design a prototype of the LNA in a standard 65 nm CMOS technology. Simulation results show that within the bandwidth of 1-11 GHz, we achieve a gain of 19.5 dB, a noise figure (NF) of 1.9-2.7 dB. Gain and phase errors are within 0.15 dB and 0.9° while the IIP3 reaches -1.4 dBm. The proposed circuit consumes only 9.7 mW and occupies a core area of only 0.104 mm2.

#### X-Band Phased-Array Transmitter Architecture with Power Transfer Optimization

**Abstract:** Space-based solar power is a proposed source of uninterrupted renewable energy, with power generated in space and beamed to an earth-based receiver. This paper describes a scalable transmission and calibration system for space-based solar power. The transmitter is composed of a grid of individually controlled subarrays, each containing a 400-element uniform phased array of microstrip patch antennas. Each subarray is further equipped with a phase locked loop to generate the 10 GHz transmission signal, programmable phase shifters to modify beam direction, power amplifiers to drive antennas, and supporting control logic. The control logic provides the interface for a specialized calibration algorithm that utilizes power transmission data to continuously calibrate subarrays for correcting beam direction and minimizing beam width. A behavioral model of the system was then produced, with results indicating over 84% of the radiated power reaching a 2 km by 2 km target 36000 km away. Additionally, the transmitter model was capable of converging the beam to points with an angular resolution far exceeding the phase shifter resolution.

## Session A4L-D: LNAs & Phase Shifters

Chair: Soner Sonmezoglu, Northeastern University Co-Chair: Ruolin Zhou, University of Massachusetts Dartmouth Time: Monday, August 12, 2024, 16:00 - 17:30 Location: Room 4

## A Digitally Reconfigurable Low-Noise Amplifier with Robust Input Impedance for

 Machine Learning-Based Receiver Optimizations
 392

Minghan Liu, Diptashree Das, Mohammad Abdi, Francesco Restuccia, Marvin Onabajo Northeastern University, United States

Abstract: The surge in demand for wireless connectivity has strongly incentivized advancements in reconfigurable radio frequency (RF) circuits. Although these circuits offer promising opportunities for machine learning (ML)-based optimization when devices are operating in the field, there is still an increasing need to adjust performance and power consumption over wider ranges, especially to dynamically minimize receiver power consumption when possible. In this paper, we present a novel low-noise amplifier (LNA) topology to dynamically scale power and performance to facilitate the realization of real-time ML methods for receiver optimization. This LNA is designed to avoid any significant input impedance matching degradation despite of a wide bias current tuning range to scale the gain, noise figure (NF) and input third-order intermodulation intercept point (IIP3). Simulations of the 2.4 GHz LNA design in 65 nm CMOS technology show its digitally-programmable gain from 17.07 dB to 28.15 dB, NF from 2.56 dB 5.18 dB, and IIP3 from - 14.98 dBm to -9.85 dBm, while maintaining consistent input impedance matching with S11 < -13 dB.

#### A 2-Bit Differential Phase Shifter in 180-nm CMOS with 1.85 DB Insertion Loss for

Pallav Kumar Sah, Ifana Mahbub The University of Texas at Dallas

**Abstract:** This paper introduces a novel phase shifter (PS) utilizing a standard 180-nm CMOS process, distinguished by its differential 2-bit switch-type design for V-band phased array application. Due to its switch-type design, this PS exhibits the advantageous characteristic of using no power. The proposed PS utilizes a differential PS unit that incorporates both high-pass and low-pass states to significantly reduce the inductor values achieving a compact dimension of 700 X 750  $\mu$ m2. The floating body terminal transistor-based 2-bit PS has been introduced resulting in the lowest average insertion loss of < 1.85 dB for the frequency range 57.18-61.92 GHz.

#### Design of a 9.75 – 10.25 GHz Phase Shifter in 180 nm CMOS Process with 360° Phase

 Tunability for Phased Array System
 401

Adnan Basir Patwary, Ifana Mahbub The University of Texas at Dallas

**Abstract:** To direct the far field radiation beam, phased array antenna systems require electronic phase shifters to supply various input signal phases in the individual elements. In this paper, a three-stage differential phase shifter that is compatible with phased array antenna systems operating at 9.75-10.25 (X band) GHz is proposed. The proposed phase shifter is designed in 180 nm CMOS process and consists of three identical varactor loaded LC stages. A Pi-network topology is used for the single stage phase shifter which consists of a fixed inductor and four varactors. The stages are connected via 2-bit PMOS switches which controls the signal flow routes between the stages. In order to improve the return loss, input and output matching networks are also designed based on the impedance value from the post layout simulation. The phase response of the phase shifter is controlled using the 6-bit switching of the three stages and fine tuned by controlling the varactor control voltages. The proposed phase shifter achieves  $0^{\circ} - 360^{\circ}$  phase shift with almost flat phase response over the 500 MHz bandwidth.

#### 

NXP Semiconductors. India

**Abstract:** This paper presents a Passive, Gain-boosted N-path band-pass filter achieving sub-1dB noise figure. It uses discretetime parametric amplifier (DTPA) to achieve amplification. To achieve higher gain, the DTPAs are cascaded. Voltage-gain achieved by cascading DTPAs helps in impedance translation to achieve matching at the input of the filter. The design and simulation results of the proposed filter are presented. The proposed filter is tunable from 0.5-1.7 GHz with voltage-gain >28 dB and NF ranging from 0.91-1.4 dB.

#### Parasitic Beam-Switching Antenna Array for mmWave Energy Harvesting in IoMT Application ...... 409

Azamat Bakytbekov, Jawad Ahmad, Mohammad Hashmi

Nazarbayev University, Kazakhstan

Abstract: Millimeter-wave (mmwave) 5G band is very promising for Radio Frequency (RF) energy harvesting application to power fast growing billions of IoT devices due to its unprecedented high Effective Radiated Power. Typically, medium to high gain antenna arrays are used in mmwave 5G applications, however, due to their large feeding networks such as beamforming networks, power loss can become critical. Ideally, antennas for RF energy harvesting applications must be simple to avoid unwanted power losses, small in size to be compatible for seamless integration, and have wide angular coverage and circular polarization for diverse reception. In this study, we present a parasitic antenna array with a simple beam- switching technique that eliminates the need for complex feeding networks. We also augment and validate the investigation by developing equivalent circuit models of the array and demonstrate the effectiveness of the proposed design. The antenna is designed and optimized at 28 GHz, fabricated and characterized. The antenna system radiates three independent beams with  $\pm 52^{\circ}$  beam-switching range, gain of 7.2 dB and 140° of total angular coverage.

## Session A4L-E: Biomedical Sensors & Data Acquisition

Chair: Tian Xia, University of Vermont Co-Chair: Mengting Yan, Analog Devices Time: Monday, August 12, 2024, 16:00 - 17:30 Location: Room 5

#### **Dynamic Slope Detection: A High-Compression Fidelity-Preserving Approach for ECG Signal Acquisition ...... 414** Jorge J. Sáenz-Noval<sup>1,2</sup>, Juan Antonio Leñero-Bardallo<sup>2</sup>, Lionel C. Gontard<sup>1</sup>, Wei Tang<sup>3</sup> <sup>1</sup>University of Cádiz, Spain; <sup>2</sup>CSIC, Universidad de Sevilla, Spain; <sup>2</sup>New Mexico State University, United States

**Abstract:** This paper introduces a novel Dynamic Slope Detection (DSD) system for acquiring electrocardiogram (ECG) signals. DSD addresses the critical challenge of balancing data storage requirements with signal fidelity, particularly in resource-constrained environments like wearable devices. The system leverages the slope information of the ECG signal to guide efficient and adaptive data sampling. Validation using ten samples from the publicly available MIT-BIH Arrhythmia Database confirmed significant data reduction compared to traditional sampling. The proposed method achieves a compression ratio of up to  $12.5 \times$  while maintaining RR interval estimation error below  $\pm 0.1$  msec.

#### 

Derek Goderis, Nelson Sepúlveda, Anna Inohara, Andrew J. Mason Michigan State University, United States

**Abstract:** A long-standing challenge in lab-on-chip and bio-microfluidic sensing modes is the formation of reliable, leak- free bonding on the surface of wafers or CMOS chips having sensing electrodes typically formed by thin film metal deposition. This challenge is particularly evident in the design of polydimethylsiloxane (PDMS) flow channels for electrochemical analysis where the need for high-density electrodes increases the number of metal-PDMS interface points. This work presents a fabrication method for creating leak free bonding of PDMS on Au electrodes by coating the substrate in a low-temperature plasma enhanced chemical vapor deposited dielectric material, thereby exposing only the sensing area within the channel. Furthermore, this fabrication process was used to create the first-known impact electrochemistry flow cell capable of continuous measurement of air-borne particulate matter using microfluidics over a surface containing microfabricated gold electrodes. Functionality of the device for air pollution monitoring was validated by detecting black carbon particles using impact electrochemistry at 0.8V

| Planar RF Sensor for Single-Cell Measurements | 424 |
|-----------------------------------------------|-----|
| h d = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1   |     |

## Abdulrahman Alghamdi<sup>1</sup>, Saeed Mohammadi<sup>2</sup>

## <sup>1</sup>King Saud University, Saudi Arabia; <sup>2</sup>Purdue University, United States

Abstract: An RF parallel Co-planar waveguide (CPW) Transmission-line (T.L.) for characterization of individual cells is presented. The planar capacitive device designed and microfabricated on a fused-silica substrate measures cell impedance inside a micro-fluidic channel with a narrow region (<50  $\mu$ m wide). The narrow region is equipped with micro-pillars to catch cells in the flow and to enable visual inspection of cells. Measured forward transmission coefficient (S21) of the device when four cells are trapped in between the proposed CPW parallel microelectrodes over a wide-band leads to 0.5 fF capacitance difference between the case of Phosphate Bufference Saline (PBS) solution and the case of suspended cells in the same PBS solution in the frequency range 1-6 GHz and 13-16GHz.

<sup>1</sup>Stony Brook University, United States; <sup>2</sup>Binghamton University, United States; <sup>3</sup>Western University, Canada

**Abstract:** We propose a peak detection and frequency mea- surement circuit for integration at interface with a triboelectric nanogenerator (TENG). By measuring the frequency and the peak voltage at the output of TENG, the applied force is predicted. We propose a sample and hold peak detector circuit with a reset signal for continues peak detection. The control logic also generates a trigger signal for analog to digital conversion of the peak voltage to the digital domain. At the same time, the frequency is recorded by counting the number of peaks in a defined time period. The circuit simulations in 180 nm CMOS technology demonstrate peak conversion with 8-bit resolution in a 1 V voltage range with 500 nW power consumption. The proposed design is amenable for integration in a self-powered load sensing system in smart knee implant after total knee replacement(TKR) surgery.

**A Reconfigurable Bandpass Filter with Ferroelectric Devices for Intracardiac Electrograms Monitoring .......... 433** Jianwei Jia<sup>1</sup>, Zhenge Jia<sup>2</sup>, Omkar Phadke<sup>1</sup>, Gihun Choe<sup>1</sup>, Yiyu Shi<sup>2</sup>, Shimeng Yu<sup>1</sup> <sup>1</sup>Georgia Institute of Technology, United States; <sup>2</sup>University of Notre Dame, United States

**Abstract:** This paper introduces a novel three-stage bandpass filter for intracardiac electrograms monitoring (IEGM), employing ferroelectric field-effect-transistor (FeFET) technology to allow bandwidth adaptation for personalized medicine. By utilizing FeFET's channel and gate stack as programmable resistor and capacitor respectively, the filter achieves precise cardiac signal isolation tailored to individual's physiological needs. Based on Globalfoundries (GF) 28 nm SLPe process that features FeFET, the design offers a broad continuous gain tuning range (22 dB to 82 dB) and bandwidth tuning range (0.1 to 25 Hz for low cut-off frequency and 10 to 120 Hz for high cut-off frequency), with an average power consumption of 393 nW, showcasing a significant stride in low-power cardiac monitoring. Moreover, input sensitivity to FeFET threshold voltage mismatch and noise characteristics are also evaluated.

# Tuesday, August 13, 2024

## Session B1L-A: Linear & Non-Linear Analog & Mixed-Signal Circuits & Systems

Chair: Fei Yuan, *Toronto Metropolitan University* Co-Chair: Mahmoud Ibrahim, *MediaTek* Time: Tuesday, August 13, 2024, 8:00 - 9:30 Location: Room 1

#### 

<sup>1</sup>University of California, Los Angeles, United States; <sup>2</sup>Cairo University, Canada

Abstract: A new architecture for a pseudo-differential lowpower voltage-to-time converter (LP-VTC) circuit is proposed. This LP-VTC architecture is compatible with a wide range of applications such as biomedical implants and integrated DC-DC voltage converters. In addition, it is used for time-based analogto-digital converters (T-ADCs). T-ADC converts the analog noisy input signal into a time delay in form of either a pulse position modulated (PPM) signal or a pulse width modulated (PWM) signal through VTC circuit. Then, this time delay is converted into a digital code utilizing a time-to-digital converter (TDC) circuit. Low power consumption is achieved by using timemultiplexing, low supply voltage with supply control signals, and high threshold voltage (Vt) transistors to eliminate the short circuit power (PSCT) through the proposed design. A prototype of the proposed LP-VTC circuit is implemented in 0.13 µm CMOS technology where the analog blocks operate with an 0.8 V supply, whereas the digital blocks operate at a 0.6 V supply. Moreover, the proposed LP-VTC exhibits a wide linear input range with a 2 MS/s sampling rate.

#### CMOS-Memristor Hybrid Design of a Neuromorphic Crossbar Array with

Integrated Inference and Training ...... 442

Sarah Johari, Arghavan Mohammadhassani, M.L. Varshika, Anup Das

Drexel University, United States

Abstract: We present a CMOS-Memristor hybrid analog design of a neuromorphic crossbar array with integrated inference and training. Each crosspoint on the crossbar includes a memristor to store synaptic weights. Integrate-and-fire (IF) neurons are designed using CMOS transistors and placed along the rows and columns of the crossbar. Learning of synaptic weights is facilitated using the tracedriven spike timing-dependent plasticity (TrSTDP) rule, where the trace collected during forward propagation (i.e., inference) is used to compute weight updates. The key novelty of our design is an interface circuit that captures the trace during inference and autonomously controls the learning circuit (designed using memristor) to generate the appropriate voltage pulse width necessary to update the synaptic weight, without requiring any software/system support. Our interface circuit consists of a voltage-to-time converter (VTC), adder, and a voltage amplifier, all of which are designed using CMOS transistors. We implement the proposed design using Synopsys HSPICE at 90nm technology node and thoroughly evaluate the accuracy, latency, area, and power overheads of the interface circuit.

#### 

#### Toronto Metropolitan University, Canada

Abstract: This paper provides a comparative study of dynamic comparators in low-voltage successive approximation register (SAR) analog-to-digital converters (ADCs). The architecture, comparison time, and power consumption of four widely used dynamic comparators are studied first. It is followed with an investigation of kickback in these comparators. We show although clock kickback is common-mode, the impedance asymmetry of the digital-to-analog converter (DAC) of SAR ADCs arising from the different channel resistances of DAC switches and manifesting itself only when supply voltage is low gives rise to a differential clock kickback that occurs earlier than output regeneration kickback with more strength and dictates the kickback of dynamic comparators. The dependence of the clock kickback on the input of SAR ADCs in least significant bit (LSB) conversion where the input is the smallest is investigated.

#### A 3-MHz-3-GHz 8-Phase Reset-Free Anti-Harmonic Delay-Locked Loop using

Abstract: This paper reports an 8-phase reset-free anti-harmonic delay-locked loop (DLL). The proposed design overcomes the false and harmonic lock problems by applying the phase difference composition technique, which broadens the range of the phase detector to  $14\pi$ . This range can be further expanded with more delay cells. The voltage-controlled delay cells are designed symmetrically to maintain the duty cycle, so there is no need to attach a duty cycle correction circuit to each clock output. This design does not need external reset or start signals, harmonic lock detectors, false lock detectors, auxiliary loops, replica delay lines, or multiple control voltages. The proposed DLL is implemented in 65-nm CMOS, and its operating frequency ranges from 3 MHz to 3 GHz. At 3 GHz, simulation results indicate a mean phase offset of 0.47 ps, and the DLL-contributed rms jitter is 0.328 ps.

#### 

<sup>1</sup>University of Idaho, United States; <sup>2</sup>Worcester Polytechnic Institute, United States; <sup>3</sup>Military Technical College, Egypt

Abstract: A new low-power, high-precision inverter-based CMOS comparator is introduced in this paper. Inherent high-power consumption of CMOS inverter is controlled by a current-limiting circuit. A current-modulation technique is proposed to maximize the gain of the comparator before making its decision. The comparator uses two unfolded inverters and a sampling circuit to complete its operation. The current-modulation circuit uses a current mirror and a switch without adding much power or complexity to the comparator. The circuit was simulated in a standard 180nm CMOS process. Simulations shows up to 70\% improvement of the comparator's gain with very high precision. Only one calibration circuit is required for any number of comparators hence this technique can be used to achieve high resolution flash ADCs

## Session B1L-B: Digital Circuits & Systems 4

Chair: Negar Reiskarimian, *Massachusetts Institute of Technology* Co-Chair: Cheol-Hong Min, *University of St. Thomas* Time: Tuesday, August 13, 2024, 8:00 - 9:30 Location: Room 2

## Novel Energy-Efficient and Latency-Improved PVT Tolerant Read Scheme for

SRAM Design in Video Processing and Machine Learning Applications ...... 460

Soumitra Pal, Jiyue Yang, Stephen Bauer, Puneet Gupta, Sudhakar Pamarti University of California, Los Angeles, United States

University of California, Los Angeles, United States

Abstract: SRAMs consume a significant area and, thereby, a substantial portion of the total energy in modern memory-hungry processors. We propose a novel scheme for reducing the read energy consumption and latency for video or image processing and machine learning applications in which the data stored in neighboring SRAM rows is mostly similar. We read two rows at a time using a new PVT tolerant sense amplifier that decodes 3-levels – "00", "11", or "01"/"10", with the last case triggering a single row re-read. This reduces the overall number of bitline charge/discharge and, hence, overall read energy consumption and latency. A  $128 \times 256$  SRAM block is simulated in 14-nm FinFET technology, and it is observed that the proposed scheme consumes up to a 49.4% lower average energy and shows a 50% shorter latency than a conventional read when the data in both rows are identical.

#### MSPARQ: A RISC-V Vector Processor Array Optimized for Low-Resolution Neural Networks ...... 464

Elisabeth Humblet, Théo Dupuis, Yoan Fournier, MohammadHossein AskariHemmat,

François Leduc-Primeau, Jean Pierre David, Yvon Savaria

Polytechnique Montréal, Canada

**Abstract:** This paper explores using a multicore RISC-V vector processor to enhance the performance of convolutional neural networks (CNNs) quantized to low and ultra-low precisions. Optimizing CNNs' efficiency is crucial in resource-constrained environments such as embedded systems. To this end, we use Sparq, a 64-bit RISC-V RVV1.0 processor derived from ``Ara". This paper introduces MSPARQ, its multicore version developed using the OpenPiton framework and implemented in GlobalFoundries 22FDX FD-SOI technology. The performance of a CNN conv2d kernel is evaluated by emulating MSPARQ on a Xilinx Alveo U280 FPGA board. Results show that through an adequate mapping of this kernel, a 4-core configuration can achieve a speedup of 3.6 to 3.8 times over the single-core configuration, while sub-byte optimizations included in the Sparq core deliver an additional speedup by 2.1 and 1.8 times respectively for 1-bit and 2-bit precisions, over the 16-bit implementation of the conv2d operation.

New Resource Efficient Multi-PUF Techniques ...... 469

Sai Bonagiri, Mohammed A.S. Khalid University of Windsor, Canada

**Abstract:** The Physical Unclonable Function (PUF) is a lightweight hardware solution that offers affordable hardware-based security for electronic devices and systems. However, a significant concern lies in the vulnerability of PUFs to popular machine learning attacks. To mitigate these vulnerabilities, numerous Multi-PUFs (MPUFs) have been proposed to effectively resist these attacks. However, MPUFs demand considerable amount of resources and exhibit relatively low PUF metric values. To increase resource efficiency further, two new MPUF technique has been proposed in this paper, aiming to achieve a balance between resource utilization and challenge obfuscation complexity. The key idea involves reducing the number of resources used in a PUF line while concurrently increasing complexity during challenge obfuscation. The proposed design is implemented and verified on the Arty A7 100t FPGA board using AMD Vivado and Vitis Environment.Experimental results show that resource usage has been significantly reduced, by 9 and 46.5 percent respectively, for the two proposed techniques.

Rachana Erra, James E. Stine

Oklahoma State University, United States

Abstract: This work presents an analysis of the power consumption of different radix implementations of Montgomery multiplication (MM). Different MM architectures are discussed and a new modified architecture for the radix-\$2\$ processing element (PE), where conversion from carry save representation to conventional representation is accomplished without using a Carry Propagate Adder (CPA) is presented in this paper. The proposed radix-2 architecture also eliminates race conditions by avoiding incorrect data selection of the multiplier bit by a processing element, by using a register. Also, the overall architectural power consumption is reduced by applying the popular clock gating technique to the proposed radix-2 architecture and also the high radix architecture used in this paper. Power, Performance, and Area (PPA) analysis is performed using TSMC 28hpc+ \$28\$ nm to understand the trade-off requirements between the implementations. The results indicate that radix-2 architecture occupies lower area compared to high radix i.e. radix-16 and radix-\$2^{16}\$ implementations but the high radix architectures take less number of clock cycles than the radix-2 architecture.

Toronto Metropolitan University, Canada

**Abstract:** This paper presents edge-triggered bi-directional gated delay line (BDGDL) up-down counters. The proposed counters remove the constraint imposed on pulse-triggered BDGDL counterparts where the pulse width of the triggering signal must be identical to the per-stage delay of the BDGDL, a condition that is difficult to satisfy due to the impact of PVT uncertainty, while preserving their superior area efficiency. The principle, design, operation, and design considerations of the proposed counters are detailed. An 8b positive edge-triggered DBGDL up-down counter is designed in a TSMC 130 nm 1.2 V CMOS technology with a reduced supply voltage of 0.8 V. Simulation results show the counter performs well at key process / temperature corners.

## Session B1L-C: Power & Sensory Circuits & Systems 1

**Chair:** Sahil Shah, *University of Maryland, College Park* **Time:** Tuesday, August 13, 2024, 8:00 - 9:30 **Location:** Room 3

## An MPPT-Based Boost Converter with Irridiance-Aware Frequency Selection and

**Digital ZCS Control for Solar Energy Harvesting** 483 Yidong Yuan<sup>1</sup>, Jie Pan<sup>1</sup>, Shaoshuai Huang<sup>2</sup>, Chengqian Shen<sup>2</sup>, Menglian Zhao<sup>2</sup>

<sup>1</sup>Beijing Smartchip Microelectronics Technology Co. Ltd., China; <sup>2</sup>Zhejiang University, China

**Abstract:** This paper presents a solar energy harvesting circuit based on a boost converter. An irradiance-aware frequency selection (IAFS) scheme collaborating with maximum power point tracking (MPPT) is implemented to improve transmission efficiency. By adopting successive approximation logic, the MPPT control can reach the maximum power point only in several times of comparison. Additionally, low-power digital zero-current-switching (ZCS) technique is proposed to ensure high conversion efficiency. The propose converter is implemented in 180nm CMOS process, the ZCS control accuracy reached 98% at 356mV input voltage and the peak power efficiency reached 92% with 380mV input voltage, according to the simulation results.

## 

Anjali Singh<sup>1</sup>, Mst Shamim Ara Shawkat<sup>2</sup>, Anshu Sarje<sup>1</sup> <sup>1</sup>International Institute of Information Technology Hyderabad, India; <sup>2</sup>Florida International University, United States

Abstract: This paper presents the design and development of heater control for a flexible embedded microheater (7000  $\mu$ m x 9000  $\mu$ m) for a lab-on-a-chip system. Our work focuses on advancing the PWM-based control system with the goal of reducing the system time constant. We also present the fabrication of an embedded flexible microheater in a PDMS substrate. We validate the efficacy of the flexible microheater and its control system by using it for glucose detection. The novelty of the work lies in that we have made a small, flexible, low-cost chip without the use of sophisticated equipment or conventional fabrication techniques.

### Mutriku's Offshore Wave Power Plant Capture Chamber Model Validation ...... 491

Aitor Garrido, Amparo Villasante, Izaskun Garrido

University of the Basque Country, Spain

Abstract: EU Energy Road Map highlighted the potential of ocean wave energy to meet 15% of the EU's energy demand by 2050, reducing CO2 emissions by 136 million metric tons per megawatt-hour. Similarly, the Spanish Renewable Energies Plan underscores the importance of wave energy, particularly in Spain. Within this context, Oscillating Water Column (OWC) converters emerge as highly promising wave energy conversion technologies, able to harness ocean energy from on-shore and floating structures. This paper presents an analytical model of the wave capture chamber designed for a fixed on-shore OWC wave power plant, specifically adapted and parameterized for the Mutriku Marine Offshore Wave Power Plant (Mutriku's MOWC), located on the Spanish Basque Country coast. The validation of the model is carried out using both real wave entry data measured on-site and experimental output power data generated from the plant

Mina Adel Azmy, Mostafa G. Ahmed, Sameh Assem Ibrahim Infinilink, Egypt

**Abstract:** A single-inductor, dual-output converter is proposed in this paper. By using flying capacitors, each output is guaranteed to be provided with continuous current flow in any switching phase, minimizing the output ripples and improving the efficiency. Hysteretic control is used for fastest transient response. The converter is implemented in 130nm technology with wide ranges for both the input and output and its performance (Ripple, Efficiency, Transient Response, etc.) is verified using post layout simulations.

## Session B1L-D: Power Amplifiers & Filters

Chair: Yuanming Zhu, *Intel* Time: Tuesday, August 13, 2024, 8:00 - 9:30 Location: Room 4

#### Comparison of Various ML and Optimization Algorithms for the Design of Microwave Filters ...... N/A

Kadyrzhan Tortayev, Mohammad Hashmi Nazarbayev University, Kazakhstan

**Abstract:** Performance of microwave and millimeter-wave filters constructed with Homotopy, Particle Swarm Optimization (PSO), and Genetic Algorithm (GA) optimization techniques is examined in this work. Types of filters include microstrip low-pass (LPF), band-stop (BSF), and band-pass (BPF), and measured S-parameters are compared with goal values. Results show that the optimized filters match target requirements and achieve lower minimum transmission coefficients than default configurations. Especially, the GA optimization approach shows better results in obtaining the intended filter properties for all LPF, BSF, and BPF types.

## Modeling and Optimization of Peaking Inductors for X-Band Circulator in Standard 0.18 µm CMOS ...... 505

Sakib Reza, Ifana Mahbub

#### The University of Texas at Dallas, United States

Abstract: Circulator plays a crucial role in the operation of full-duplex radio systems. The major challenge in achieving full-duplex architecture is to isolate the transmitter (TX) channel from the receiver (RX) one. In this paper, an active circulator based on three bridged-T networks which consist of several peaking inductors exploiting a standard 0.18 µm CMOS technology is designed. It is capable of boosting the isolation between the TX and RX channel over a wide frequency range. As part of the proposed circulator design, a tuning platform is devised to explore the impact of various design variables of the peaking inductors on radio frequency benchmarks. It regulates the isolation and insertion loss of the proposed circulator through the assessment of scattering parameters. The simulation results indicate that between the TX and RX channel a maximum isolation of 43.38 dB at 9.6 GHz is achieved while maintaining at least 30.08 dB of isolation over the whole X-band. The proposed circulator exhibits promising characteristics as a potential choice for duplexers where substantial isolation and wide bandwidth are of paramount importance.

#### Spoof Surface Plasmon Polariton based Ultra-Wideband Bandpass Filters ...... 509

K. M. Daiyan<sup>1</sup>, Shaiokh Bin Abi<sup>2</sup>, A. B. M. Harun-Ur Rashid<sup>2</sup>, MST Shamim Ara Shawkat<sup>1</sup>

<sup>1</sup>*Florida International University, United States;* <sup>2</sup>*Bangladesh University of Engineering and Technology, Bangladesh* 

**Abstract:** This paper presents Spoof Surface Plasmon Polariton (SSPP) based eight different designs of bandpass filters with various geometric parameters and structural configurations. Optimal performance is achieved with a groove height mismatch of 1 mm, resulting in an ultra-wide transmission band of 37.5-174 GHz with a bandwidth of 136.5 GHz and significantly low signal reflection of below -10 dB. Moreover, the impact of change in the geometric entities of the SSPP interconnect on the transmission characteristics has also been explored. Through Finite Element Modeling (FEM) simulations, we further investigated how the curvature, bending, and structural mismatches of the SSPP connector affect the bandwidth of these filters.

## 

Rafsan Mahin, Ifana Mahbub

#### The University of Texas at Dallas, United States

**Abstract:** This paper presents the design of a high-gain, high power added efficiency wideband power amplifier in the standard 180nm CMOS process for wireless power transfer applications. The proposed PA design utilizes a single stage common-source architecture, working in the UWB frequency range (3.1 - 10.6 GHz). An optimization of the transistor scaling factor is performed to generate an optimal balance between achieving a high transconductance while also minimizing the insertion loss from the quality factor of the on-chip inductor so that a higher overall gain can be achieved. The PA design in this work integrates an LC shunt inductor and series capacitor as part of the input and output matching network for optimal impedance matching and to generate class-C drain voltage and current waveforms. The designed PA is capable of generating a -3 dB S21 gain bandwidth of 2.19 GHz (5.29 - 7.48 GHz). The PA is able to yield an output power of 24.4 dBm at the P1dB point for a 3.3 V drain supply voltage. The proposed PA is able to achieve a peak PAE performance of 65.6% at the 16 dBm input power while also generating > 60% PAE performance over a wide bandwidth of 2 GHz from 5.5 - 6.5 GHz.

## Flexible Multi-Band Reconfigurable Modified Microstrip Antenna ...... 519

Swarnim Sinha, Deeksha, Nitin Shrinivas

International Institute of Information Technology Hyderabad, India

Abstract: A metal patch mounted at a ground level with a flexible di-electric material in-between constitutes a flexible Micro strip or Patch Antenna. The proposed antenna is a modified version of a microstrip antenna of 33x55 mm2 on a flexible Kapton polyimide substrate. The antenna is tested using Vector Network Analyzer for different antenna parameters that provides multiband frequency at: 7.27 GHz ,10.61 GHz & 25.50 GHz covering C, X & K band respectively. A slot has been carved out of the antenna to divide the microstrip patch in two, which is connected using copper tape that acts as a switching diode to enable frequency reconfigurability of the multiband frequencies that have been measured. The impact of frequency peaks has been seen and studied with the reduction in bending radius from 50 mm to 30 mm.

## Session B1L-E: Neural & Musculoskeletal Signal Interfacing

**Chair:** Junning Jiang, *Nvidia* **Time:** Tuesday, August 13, 2024, 8:00 - 9:30 **Location:** Room 5

## On-Chip Active Pulse-Clamp Stimulation (APCS) for Rapid Recovery, Charge-Balanced Neural Stimulation ...... 523

FNU Tala, Benjamin C. Johnson Boise State University, United States

Abstract: Panid and abarga balanced electrical

Abstract: Rapid and charge-balanced electrical stimulation is imperative for neurostimulation implants aimed at chronic safety and closed-loop usage. We present an innovative stimulation technique, Active Pulse-Clamp Stimulation (APCS), designed to ensure dependable charge balance with rapid recovery. The APCS technique has two distinctive modes, linear and slewing modes, both incorporated into the on-chip APCS system. APCS employs discrete-time feedback to sense the residual voltage across the electrode's double-layer capacitance, expediting the settling of the electrode interface by either grounding (slewing) or clamping with an amplifier (linear). APCS combines the strengths of both biphasic stimulation and passive recharge, with a customizable recovery time constant set by the user while offering a guaranteed charge balance for safety. To showcase the proof-of-concept for APCS, we implemented the on-chip APCS using a 180nm CMOS process.We demonstrated combined APCS functionality using a benchtop electrode model and a real clinical deep brain stimulation (DBS) electrode in vitro.

## A 311 nW Integrated Neural Amplifier and Spike Enhancement Filter Achieving

## <sup>1</sup>University of California, San Diego, United States; <sup>2</sup>San Diego State University, United States

**Abstract:** In implantable neural monitoring, handling increasing data volumes from numerous channels is a challenge for transmission. A viable solution is on-chip data spike detection. This study introduces a low-power circuit integrating an analog front-end, spike enhancement filter, and detector. The amplifier adopts a two-stage operational transconductance design to both perform linear filtering of the biopotential recordings and convert them into current. The spike enhancement filter is designed as a current-mode analog signal processing circuit, utilizing translinear loops to emulate the underdamped dynamics of a particle in a monostable potential well, implemented via a second-order differential equation. The filter's output, enhanced with spikes, undergoes a spike detector stage employing hard thresholding. This circuitry is designed using TSMC 65nm CMOS technology. Through simulations utilizing a public database, the proposed system demonstrates an average spike detection sensitivity of 98.99% while consuming 311 nW when powered by a 1 V supply, with a compact footprint of 0.0348 mm2.

Abstract: We developed a miniaturized ( $8 \times 8 \text{ mm2}$ ) wireless and battery-free implant for musculoskeletal stimulation. The implant generates an monophasic voltage of up to 11.9 V in a benchtop test with an air link, and it can produce any desired stimulation protocol by responding to the reception of a 2.4 GHz wireless protocol from an external device. The in vivo test demonstrated that the implant can trigger a synchronized limb movement when targeting the gastrocnemius muscle in a rodent, with a measured limb deflection of 15 mm from resting position. The flexible substrate and ability to adjust stimulation parameters externally allow the implant to be used for a variety of applications in muscle therapy and cardiac pacing.

## A 4-Channel 0.23mm<sup>2</sup> Voltage-to-Time Converter AFE with 3.7µVrms Noise and

480nW Galvanic Impulse Uplink ...... 538

Mehdi Bandali, Morgan Riley, Benjamin C. Johnson

Boise State University, United States

Abstract: We present a 4-channel 15 kS/s Voltage-to-Time Converter (VTC) analog front-end (AFE) with a 0.49  $\mu$ W impulse-based galvanic uplink for a peripheral nerve interface. Multiple, low-noise, high-data-rate channels are needed to sense compound action potentials and measure their conduction velocity as they propagate down a peripheral nerve. To achieve high energy efficiency for these constraints, the AFE encodes and transmits data with time-domain charge-balanced impulses through an implantable galvanic link. Each channel consists of an integrator with charge-based sampling and amplification for rapid multiplexing. A shared VTC encodes the amplitude-domain outputs of each integrator into differential time-domain impulses. Since the timing can be synchronized with stimulation, this AFE achieves instant artifact recovery after rail-to-rail stimulation events. We designed this AFE in a 180nm CMOS process, and the simulation results show an SNDR of 60dB and noise of  $3.7\mu$ Vrms. Thanks to the new galvanic uplink protocol, this front-end only consumes 11.28  $\mu$ W including wireless data transmission for four channels.

#### 

Ardavan Javid, Rudra Biswas, Sheikh Jawad Ilham, Mehdi Kiani

The Pennsylvania State University, United States

**Abstract:** An ultrasound (US) phased array with electronic steering and focusing capability can enable high-resolution, large-scale US neuromodulation in basic research and clinical experiments. For such applications in different subjects (animals and humans), phased array electronics should provide flexibility in generating waveforms with different patterns (stimulation parameters), fine delay resolution between channels, and high voltage across US transducers (generating high US pressure output) over an extended duration. This paper presents 16-channel high-voltage phased array electronics for neuromodulation, capable of driving US transducers with up to 150 V pulses and 5 ns delay resolution while providing a wide range of sonication waveforms. The electronics has been integrated with a custom-made 2 MHz, 16-element US transducer array  $(4.3 \times 11.7 \times 0.7 \text{ mm3})$ . In measurements, the phased array system achieved up to 6 MPa peak-to-peak US pressure output at a focal depth of 10 mm with a lateral/axial resolution of 0.6/4.67 mm.

## Session B1L-F: Artificial Intelligence, Internet of Things & Systems 3

**Chair:** Shiuh-hua Chiang, *Brigham Young University* **Time:** Tuesday, August 13, 2024, 8:00 - 9:30 **Location:** Room 6

#### 

#### Marcus F. Nicolas, Dalila B. Megherbi

#### University of Massachusetts Lowell, United States

Abstract: As the need arises for edge intelligence, which combines edge-computing with artificial-intelligence, efficient and accurate algorithms are increasingly needed. Edge-computing allows for AI applications executed on edge devices/systems, which are closest to hardware devices/systems without traversing the cloud. The aim is to aid in reducing latency and reducing bandwidth due to needed real-time information, among other benefits. Efficientdet, a Deep-Learning (DL) object-detection-and-recognition model, was deployed on a-Raspberry-Pi-4 microprocessor. Notwithstanding advantages of Edge-AI, a challenge not generally addressed in many such real-time detection and recognition DL models is presented. Even though the DL model seemed to be invariant to scaling, the recognition of rotated objects was not as invariant-friendly. This paper aims to highlight this challenge, focus on this issue, particularly on microprocessor deployment, and tender possible solutions in hopes that future researchers and engineers can design DL models on edge computing that will greatly diminish the impacts of the object rotational handicap seen in this real-time object-detection and recognition case study.

## 

Armin Gooran-Shoorakchaly, Sarah Safura Sharif, Yaser Mike Banad University of Oklahoma, United States

Abstract: Memristors, crucial for nonvolatile memory and neuromorphic computing, use resistive switching (RS) based on ion redistribution. Despite efforts to understand RS, especially its set/reset cycles, a comprehensive model explaining its dynamics from formation to cycling is lacking. We introduce a Ta2O5/TaOx device model that accurately predicts RS behaviors during formation, set, and reset cycles. Validated against DC and pulse measurements in 1R configurations, our model aligns closely with experimental data, proving its effectiveness in capturing RS dynamics. This breakthrough in memristor modeling represents a significant step forward, enabling improved device control and optimization.

| Enhancing Neuromorphic Computing: A High-Speed, Low-Power Integrate-and-Fire Neuron Circuit |
|---------------------------------------------------------------------------------------------|
| Utilizing Nanoscale Side-Contacted Field Effect Diode Technology                            |
| Savedmahamadiaved Mateman Sarah Safura Sharif Vacar Miles Danad                             |

Seyedmohamadjavad Motaman, Sarah Safura Sharif, Yaser Mike Banad University of Oklahoma, United States

Abstract: Addressing power efficiency and performance in neuromorphic computing systems is crucial, particularly in minimizing the power usage of neuronal circuits without compromising their ability to rapidly generate spikes. To this end, we introduce the Nano-scale Side-contacted Field Effect Diode (S-FED), a promising solution for reducing power consumption and improving circuit speed, thereby facilitating the implementation of neuron circuits. This paper presents a novel integrate-and-fire (IF) neuron model that harnesses the capabilities of the S-FED. Our model lowers power usage to approximately 44 nW and reduces the energy per spike to 0.964 fJ, setting a new benchmark below current state-of-the-art neuron circuits. Additionally, our model's spiking frequency reaches 20 MHz, surpassing those of the existing neuronal circuits. Finally, we compare the performance of the proposed IF neuron circuit against the forefront of neuron circuit design.

## Prediction of Remaining Useful Life and Cell Temperature for Li-Ion Batteries using TinyML ...... 562

Yuqin Weng<sup>1</sup>, Wenkai Guan<sup>2</sup>, Cristinel Ababei<sup>1</sup>

## <sup>1</sup>Marquette University, United States; <sup>2</sup>University of Minnesota, United States

Abstract: In this paper, we develop new tiny machine learning (tinyML) temporal convolutional network (TCN) models for prediction of remaining useful life (RUL) and of cell temperature for lithium-ion batteries. The proposed models are developed, trained, optimized and verified in Python using TensorFlow. Extensive simulation experiments, using datasets from the Battery Archive website and from Sandia National Lab (SNL), show that the proposed models provide better results compared to previous models. Furthermore, the proposed models are converted to TensorFlow lite for microcontroller models, which are deployed on IoT hardware devices, specifically the popular Arduino Nano 33 BLE Sense board. We conduct hardware experiments that show that the tinyML models are very efficient and provide satisfactory prediction accuracy. Therefore, the proposed optimized tinyML models could be easily deployed in real practical scenarios, such as electric vehicles (EVs), to continuously monitor in real-time the health and temperature of batteries.

#### Real-Time Classification of Imperfect Gesture Data Utilizing Convolutional Neural Network for

# Kourosh Rahnamai, Jacob Rollins, Luke Moisan, Ryan Kayfus, Neeraj Magotra

#### Western New England University, United States

Abstract: This paper deals with the development and application of an Artificial Intelligence (AI) algorithm developed for implementation in IoT edge devices to make them smarter and more efficient in triaging input information to be uploaded to the network for further analysis and utilization. It presents the development of an image classification Convolutional Neural Network (CNN) that is geared towards processing efficiencies for ultimate implementation in the new generation of powerful, compact, energy efficient embedded systems being developed that are suitable for IoT edge applications. The algorithm was trained to identify skeleton images in different poses captured using a Xbox 360 Kinect. Various techniques were employed to train and test the CNN, including the use of various datasets and different configurations of the network's architecture. The results provide valuable insights into the potential of using CNNs for image classification and gesture identification, highlighting the importance of careful design and optimization of the network architecture to achieve optimal performance suitable for IoT edge implementations for medical, surveillance and other ultimate applications.

## Session B2L-A: Testing & Calibration

Chair: Tejasvi Das, Rochester Institute of Technology Co-Chair: Sungho Kim, The University of Rhode Island Time: Tuesday, August 13, 2024, 11:00 - 12:12 Location: Room 1

Isaac Bruce, Emmanuel Nti Darko, Ekaniyere Oko Odion, Matthew Crabb, Degang Chen Iowa State University, United States

Abstract: This paper presents a novel 3-segment interpolating resistor string ladder DAC with sub-radix/redundancy characteristics in the DAC transfer curve. A simple all-digital calibration algorithm is presented, which trades off resolution for linearity. The design reduces the area overhead significantly compared to the conventional resistor string DAC. A 15-bit version of the DAC using polyresistors is designed in the TSMC 180nm process, and the calibration algorithm is implemented in MATLAB to obtain a 13-bit linear DAC with a worst-case INL of 0.78LSBs across 100 Monte-Carlo iterations with resistor-matching at only the 5-bit level.

#### A Trim-Free PVT Invariant Current Reference with 0.48% Process Inaccuracy using

| V <sub>th</sub> Tracking App | roach |                                    | <br>•••••• | 576 |
|------------------------------|-------|------------------------------------|------------|-----|
|                              |       | V.S.S. Pradhith <sup>1</sup> , Ani |            |     |

Chetan Mittal<sup>1</sup>, Khanh M. Le<sup>2</sup>, Zia Abbas<sup>1,2</sup>

<sup>1</sup>International Institute of Information Technology Hyderabad, India; <sup>2</sup>Analog Intelligent Design Inc., United States

Abstract: The prevalence of MOSFETs in diverse electronic applications highlights their significance. However, the sensitivity of MOSFET threshold voltage to process variation poses a substantial challenge in achieving consistent device performance. This paper proposes a novel approach to overcome this challenge by employing a threshold voltage (Vth) tracking circuit that tracks the variation in Vth. Our primary objective is to develop a current reference that remains invariant across process, voltage, and temperature (PVT) variations, eliminating the need for an external trimming circuit. This is achieved by the meticulous integration of Vth tracking circuit, CTAT generator (a beta-multiplier with an NMOS as load), beta-multiplier, and a current adder. By employing this design, a process variation ( $\sigma/\mu$ ) of 0.43%, a temperature coefficient (Tc) of 210ppm/°C for a temperature range of -40°C to 120°C, and a line sensitivity of 1.6% is achieved over a voltage range of 1.6V – 3V.

## A 3.04-fs FOM Hybrid LDO Regulator with Fast Transient Algorithm and Wide Load Range ...... 581

Pierre Leduc<sup>1</sup>, Ximing Fu<sup>2</sup>, Yushi Zhou<sup>1</sup>

<sup>1</sup>Lakehead University, Canada; <sup>2</sup>Dalhousie University, Canada

**Abstract:** This paper presents a novel approach to fast response low clock frequency hybrid LDO. The proposed design uses charge distribution to estimate the size of power transistor required to achieve the correct output voltage. This method significantly reduces the amount of clock cycles to achieve a steady state. In conjunction with the proposed digital algorithm, an analog error amplifier is added to the system to achieve zero current capability as well as improved voltage regulation at low load currents. In presence of the analog-assisted loop, an inverter based droop detector is implemented to reduce the undershoot and further increase conversion speed. The proposed LDO is design in a TSMC 180-nm 1.8 V standard CMOS process and analyzed using Spectre with BSIM3 device models. Post-layout simulation confirms that with a 10 MHz clock, the LDO is capable of providing up to 120 mA at a stable 1.2 V output. A 1.17 us transient response for a full range load step with a maximum droop voltage of 253 mV is obtained.

<sup>1</sup>International Institute of Information Technology Hyderabad, India; <sup>2</sup>Analog Intelligent Design Inc., United States

**Abstract:** Voltage-Controlled Oscillator (VCO) is a crucial component of a Phase-Locked Loop (PLL) system. The frequency of Ring-Oscillator-based VCO varies by about 2-3 times across process, voltage, and temperature (PVT) conditions while maintaining a constant control voltage setting. The Gain of a Ring- VCO at a specified target frequency is also a crucial parameter that is significantly affected by PVT variations, thereby impacting the stability and loop bandwidth of the PLL. This paper describes a compensation technique to minimize variations in Ring-VCO gain at the specified target frequency across PVT. A charge-pump PLL is implemented in a 28nm CMOS process operating at 4 GHz. The gain of the Ring-VCO varies by  $\pm 27\%$  across PVT at 4 GHz. This leads to a deviation of  $\pm 60\%$  in loop bandwidth and  $\pm 21\%$  in phase margin (PM). Process trimming reduces the VCO gain variation percentage to  $\pm 25\%$ . Subsequently, the addition of a static current with CTAT nature performs temperature compensation over a range of  $-40^{\circ}$ C to  $125^{\circ}$ C. The variations in VCO gain, loop bandwidth, and PM are reduced to  $\pm 8\%$ ,  $\pm 34\%$ , and  $\pm 15\%$ , respectively.

## Session B2L-B: Neural Signal Acquisition & Processing

**Chair:** Satish Nair, *University of Missouri* **Time:** Tuesday, August 13, 2024, 11:00 - 12:12 **Location:** Room 2

| Mitigating Effects of Packet Loss in Wireless Neural Data Transfer: Exploring the                                           |     |
|-----------------------------------------------------------------------------------------------------------------------------|-----|
| Influence of Low-Cost Recovery Methods on Spikes and HFOs in iEEG                                                           | 591 |
| Amir Hossein Ayyoubi <sup>1,2</sup> , Behrang Fazli Besheli <sup>1</sup> , Chandra Prakash Swamy <sup>1</sup> ,             |     |
| Jhan L. Okkabaz <sup>1,2</sup> , Michael M. Quach <sup>3</sup> , Daniel J. Curry <sup>3</sup> , Nuri F. Ince <sup>1,2</sup> |     |

<sup>1</sup>Mayo Clinic, United States; <sup>2</sup>University of Minnesota, United States; <sup>3</sup>Texas Children's Hospital, United States

**Abstract:** The wireless transmission of neural data may pose the risk of packet loss (PL), potentially compromising signal quality or, in extreme cases, causing complete data loss. Addressing lost packets is essential to ensure data integrity and preserve vital neural patterns. This study investigates the effect of PL interference on epilepsy neuro biomarkers, focusing specifically on interictal epileptiform spikes and high frequency oscillations (HFOs), and the performance of the low computational cost interpolation methods. We observed that 95% of spikes and 81% of HFOs could be recovered with linear interpolation at 5% PL, while 97% and 86%, respectively, with spline interpolation. Linear interpolation has the potential to recover neural events and reduce the noise floor with modest packet loss levels of up to 5% at a lower computational cost. However, for a higher level of PL, utilizing more intricate methodologies such as spline interpolation becomes imperative.

## Methodology and Toolkit for Spectral Analysis of Neural Signals ...... 596

# Ziao Chen, K.C. Ho, Satish S. Nair

University of Missouri, United States

Abstract: We developed a toolkit that implements the algorithms to estimate the oscillatory bursts embedded in in vivo neural signals. The in vivo signals are decomposed into two components, the  $1/f\beta$  background activity and oscillatory bursts that ride on them. The bursts themselves can vary in amplitude, frequency, and number of cycles, making it challenging to characterize them. The first algorithm determines the statistical characteristics of the two components. The second one synthesizes each component individually using the estimated characteristics. The third algorithm captures the best integration of the two components using the Kullback-Leibler divergence criterion to generate an 'in silico' version of the in vivo local field potential (LFP) by matching its probability density function with that of the in vivo signal. The methodology and algorithms are illustrated using an in vivo dataset in the public domain. The software tool with the algorithms is available for public download.

### 

Tasnim Zaman Adry<sup>1</sup>, Steven D. Gardner<sup>2</sup>, Sazia Afreen Eliza<sup>2</sup>, Mohammad Rafiqul Haider<sup>1</sup> <sup>1</sup>University of Missouri, United States; <sup>2</sup>University of Alabama at Birmingham, United States

Abstract: The exploration of neuromorphic computing, driven by the demand for efficient data-intensive operations, relies on memristive devices for versatile information storage. In this study, a low-cost, nonlinear, current-controlled memristor using hexagonal boron nitride (hBN) and graphene inkjet-printed materials is proposed for neuromorphic computing applications. The memristor device, designed with an Epson XP-960 piezoelectric printer, demonstrates the feasibility of inkjet printing technology for flexible and cost-effective computing solutions. The research investigates the use of nanoparticle-based inks and explores the potential of inkjetprinted circuits as an eco-friendly and power-efficient alternative. An empirical model is developed using MATLAB cftool to analyze the behavior of the inkjet-printed memristor device, with insights complemented by a physical simscape model. This work contributes to the advancement of flexible and low-cost memristor devices, facilitating their integration into neuromorphic computing architectures and other applications requiring versatile information storage.

#### 

Michael Benusa, Cheol-Hong Min University of St. Thomas, United States

Abstract: The proposed system measures the attentiveness of the driver using the developed wearable device with electrodes placed on the temples of the driver. By using EEG signals, the accuracy of attentiveness detection may be increased compared to standalone camera-based systems. Preprocessing was used to remove noise from the measured signals. Features were extracted from the signals, and an optimal feature set was determined by analyzing the feature's effectiveness. A Support Vector based machine-learning algorithm was used to classify the driver's states. To gather data for each state, experiments were conducted on an indoor driving simulator with four subjects. The proposed model's classification accuracy was 84.4% for the five-fold cross-validation of the data. In a subjectindependent test, the accuracy was 84.2% for the four states (city attentive, city distracted, highway attentive, and highway distracted). When adding a third classifier category to detect drowsiness from inattentiveness, the proposed model's accuracy was dropped to 72.6%. In this preliminary study, abnormal conditions of the drivers were detected and classified using the proposed system.

## Session B2L-C: Power & Sensory Circuits & Systems 2

Chair: Keith Corzine, UC Santa Cruz Time: Tuesday, August 13, 2024, 11:00 - 12:12 Location: Room 3

#### Fanyang Li, Wenren Deng, Kaiji Liu

<sup>1</sup>Fuzhou University, China; <sup>2</sup>Tsinghua University, China

Abstract: To address the power supply challenges of high performance microprocessors, a highly digitized switching LDO structure has been proposed, enabling LDO operation under low voltage, high current load conditions. The ripple is a significant limiting factor in the development of switching LDOs. Therefore, the proposed LDO incorporates a multiphase pulse width modulation circuit (MPWM) to control the ripple within 1mV and achieve high-precision regulation and tunable active voltage positioning (AVP) function. Additionally, to further improve the transient response of the LDO and expand the range of output load, an additional analog auxiliary loop is designed. Fabricated in 55-nm CMOS, the proposed switching LDO can operate in the 0.7-1.2V, and support a maximum load of 1.2A. The simulation result show that the proposed LDO measures a maximum 61-mV overshoot with a 0.6A load current step with 40-ns edge in high-precision mode. The measured load regulation is 0.4 mV/A and maximum output voltage ripple is 0.87mV.

#### A Multi-Step Charge Transfer Interface Circuit Extracting Maximum Input Power from

Raghav Bansal, Shouri Chatterjee

Indian Institute of Technology Delhi, India

Abstract: We present a power-efficient interface circuit to extract the maximum possible input power from a piezoelectric energy harvester using a multi-step charge transfer (MSCT) technique. In this proposed article, MSCT pre-charges the piezoelectric capacitance, flips the piezoelectric output voltage, and extracts the stored charge in multiple stages. It improves the end-to-end power efficiency of the rectifier by reducing the conduction loss of the power converter stage while extracting the maximum possible stored energy across the piezoelectric intrinsic capacitance for a given CMOS process. The proposed rectifier has been designed and simulated in a 180 nm CMOS node. It extracts 14.39  $\mu$ W output power in one step and 16.34  $\mu$ W output power in four steps for the same input power. The MSCT achieves a maximum power efficiency of 68.5% and 83% with a 47  $\mu$ H and 220  $\mu$ H inductor, respectively. The simulation results show that the MSCT with four steps achieves an improvement of 13.85% compared to the single-step charge transfer in terms of power efficiency.

#### A Multi-Objective Optimization Problem for Sizing and Siting of Power System with the

Saeed Aliamooei-Lakeh<sup>1</sup>, Keith Corzine<sup>1</sup>, Leila Parsa<sup>1</sup>, Ricardo de Castro<sup>2</sup>

<sup>1</sup>University of California, Santa Cruz, United States; <sup>2</sup>University of California, Merced, United States

Abstract: In this paper, a multi-objective optimization problem of sizing and siting power systems with the integration of electric vehicles in a distribution system is addressed. Vehicle-to-grid contribution, charging stations, renewable energy sources in the form of photovoltaic and wind turbines, and energy storage systems are considered in the proposed approach. A mixed-integer linear programming model is presented for this problem, which minimizes the total cost associated with infrastructure development and power generation while considering operational efficiency and sustainability considerations. A strategic placement of such infrastructural arrangements and their sizes can improve grid resiliency and optimally host a high penetration of electric vehicles, along with integrating renewable energy sources, is also proposed. The suitability of the proposed model is proven through a case study presented for an IEEE 24-bus distribution power system, drawing improvements in cost effectiveness, energy efficiency, and renewable energy utilization within the power grid.

## Optimal FPGA Implementation of Dense Extended Kalman Filter for Simultaneous Cell State Estimation ...... 623

Luke Nuculaj, Adam Kidwell, Connor Homayouni, Alex Fillmore, Darrin Hanna, Jun Chen

Oakland University, United States

Abstract: This work considers hybrid embedded implementations of the dense extended Kalman filter (DEKF) -- a novel state estimation technique which reduces the traditional Kalman filter's time complexity from  $O(N^3)$  to O(N) -- to estimate the state-of-charge (SOC) of 16 serially-connected lithium-ion battery cells. Design space exploration (DSE) is used to recommend an optimal hybrid binding considering execution time and energy as the primary objectives. High-level synthesis (HLS) tools are used to inform the DSE and to aid in implementation. Furthermore, each implementation's performance is gauged and compared to our MATLAB model of the DEKF and our DSE. Our results indicate close precision and accuracy to MATLAB with RMS near 0.2% with execution times as low as 52 us.

## Session B2L-D: Phase Locked Loops

Chair: Mengting Yan, *Analog Devices* Time: Tuesday, August 13, 2024, 11:00 - 12:12 Location: Room 4

**Abstract:** When dealing with millimeter wave communications, one way to mitigate the path loss contribution is in the use of beam steering and beamforming techniques, which allow to improve the gain of the antenna system. Particularly interesting is the case of adaptive beamformers, which are able to steer the antenna radiation pattern towards the position of a given target in real-time. In this work, we propose a novel architecture for adaptive beam steering based on real-time Angle-of-Arrival (AoA) estimation. The phase shift is introduced in the Local Oscillator (LO) path thanks to a modified DDS-PLL structure. The main equations are presented, along with a validation campaign involving Hardware-in-the-Loop (HiL) simulations.

**Abstract:** This paper presents a low-power low phase noise voltage-controlled oscillator (VCO) in X-band frequencies fabricated in IHP's 130nm SiGe BiCMOS process (ft/fmax of 250/340 GHz). The core of the VCO is based on a cross-coupled (CC) pair. The phase noise performance of the VCO improved by utilizing a passive LC filter in the tail. The VCO employs a varactor linearization technique to increase the tuning range (TR) and further widen TR achieved by using a thick gate oxide MOS varactor and switched capacitor banks. The VCO core is isolated using cascode buffers, allowing proper driving capability. The design achieves phase noise of -119.25 dBc/Hz at 1 MHz offset at the center frequency and TR of 15.88%. The circuit occupies a die area, including the output balun and pads, of 2.08mm2 and consumes 28.7mW power.

**Abstract:** This paper presents a type II charge pump PLL operating at 25.6 GHz in 0.13 µm SiGe BiCMOS technology for distributed beamforming applications. The PLL includes a differential Colpitts VCO that generates a highly reliable phased-locked signal. It provides a good locking range of 3.5 GHz and a loop bandwidth of 1 MHz with a phase margin of 77 degrees. Additionally, at a 1 MHz offset, the PLL exhibits a measured phase noise of -94 dBc/Hz.

#### A Fractional Spur Cancellation Technique for Fractional-N Frequency

*Texas A&M University, United States* 

**Abstract:** A fractional spur cancellation technique for fractional-N frequency synthesizers is presented in this paper. A time domain quantitative analysis is performed to obtain an intuitive understanding of the origin of fractional spurs. By utilizing a dual loop architecture that generates two feedback phases leading and lagging to the reference phase, respectively, the proposed technique can clamp the reference phase in the middle of the two feedback phases, and realize the spur cancellation after combining the complementary outputs of the two charge pumps. Simulation results show that the proposed technique can improve the fractional spur performance of a fractional-N frequency synthesizer by 49.3 dB.

## Session B2L-E: Neural Interfacing & Systems Modelling

Chair: Negar Reiskarimian, *Massachusetts Institute of Technology* Co-Chair: Mahmoud Ibrahim, *MediaTek* Time: Tuesday, August 13, 2024, 11:00 - 12:12 Location: Room 5

Hunter Bushnell, Isabel Banks, Vinay Guntu, Gregory Glickert, Satish S. Nair University of Missouri, United States

**Abstract:** We report a biophysical network model of the mammalian lower urinary tract including the neural circuit that controls its closed loop function. The model was able to reproduce the firing rates and bladder dynamics during a normal fill and void cycle. We were then able to adapt the model to produce the changes and effects of spinal cord injury, which causes severe dysfunction in micturition function and control.

**Abstract:** Advanced neural implants demand a 10x increase in input channels for diverse neuroscience applications, necessitating meticulous top-level design decisions to attain optimal noise-power-area efficiency. This paper underscores the pivotal role of analog front-ends (AFEs) and the influence of input-referred noise (IRN) on the energy efficiency and area performance of intelligent, high-channel-count neural implants. We delve into the intricacies of IRN, highlighting its impact on signal-to-noise (SNR) ratio and power/area performance. Our study provides insights into selecting the optimal IRN, offering a pathway for the development of energy-efficient and precise neural implants by introducing four operating regions on how to target the IRN to achieve at least 7x power and area reduction on AFEs.
<sup>1</sup>Carleton University, Canada; <sup>2</sup>University of Windsor, Canada

Abstract: Neural Assembly Computing (NAC) is a framework that employs spiking neural networks (SNNs) to model the computational processes of biological neural cell assemblies. In this paper, we propose a method to design state machines based on NAC, capable of generating complex and intelligent behaviors. This straightforward algorithm is crafted to discern the sequence of three distinct geometric shapes—circle, triangle, square, and star—using four states aligned with the activity of two neural assemblies. Based on the simulation results, we conclude that NAC can offer a biologically plausible and effective approach to deploy state machines and algorithms.

### 

Rafael Maria, Bruno Sanches, Wilhelmus van Noije

University of São Paulo, Brazil

**Abstract:** This work proposes the design of an optimized area and low group delay (GD) ultra-wideband (UWB) power amplifier (PA) in 65 nm TSMC CMOS technology. The PA design employs resistive shunt-shunt feedback amplifier topology to provide broad bandwidth characteristic. This topology was also adopted at the amplifier to achieve good input matching, low group delay, low power and high linearity. The measurement results show that the proposed PA design has an average gain of 6.14 dB from 4–10.6 GHz. The PA design achieves excellent phase linearity (i.e., group delay variation) of  $\pm 21$  ps, consumes only 20 mW power from a 1.6 V supply voltage and occupies only an area of 0.06 mm2. A good output 1-dB compression point OP1 dB of 4.25 dBm was obtained. By using this method, the proposed design has the smallest area, the lowest group delay variation and power among the recently reported UWB CMOS PAs applications.

## Session B2L-F: Smart, Secure IoT

**Chair:** Arnab Purkayastha, *Western New England University* **Time:** Tuesday, August 13, 2024, 11:00 - 12:12 **Location:** Room 6

Suraj Kumar Sah<sup>1</sup>, Love Kumar Sah<sup>2</sup>

<sup>1</sup>Kathmandu University, Nepal; <sup>2</sup>Western New England University, United States

Abstract: In this work, we instrumented runtime instructions to extract a program's variable real-time memory address information and created a table to use as a tool to enforce back-edge CFI. Variables declared in a function are an integral part of a program, and such information itself can be used to verify if control returns to the same function after finishing its job, serving as an indication if control is on a legitimate path or not. We introduce an adaptive pipeline architecture to modify the decode and execute stages of the Sim-Outorder simulator in the SimpleScalar toolset. We validate the technique with 6 selected MiBench bench- mark programs. The proposed technique requires 0% additional instructions and detects path bending in all test cases. The memory space requirement is up to 13KB to maintain VRT for a program with 324 variables, incurring a 1.98\% additional area overhead and power consumption of 11.65uW

Fernando Garzon<sup>2</sup>, Donna Koechner<sup>1</sup>, Tony Quinones<sup>3</sup>

<sup>1</sup>SensorComm Technologies, Inc., United States; <sup>2</sup>University of New Mexico, United States;

<sup>3</sup>University of New Mexico, United States; <sup>4</sup>Bright Path Laboratories, Inc., United States

**Abstract:** The global energy transition is forcing the movement away from fossil fuels. There is a significant need to develop low-cost monitoring solutions that provide early warning for trigger events. This paper will provide examples of early warning systems that have been developed. Results from the AI-powered, IoT-driven system built on the multi-gas emission sensor platform will be presented.

Robert L. Brennan, Taylor Lee ON Semiconductor, Canada

Abstract: The computation of local data remotely creates a bottleneck to the cloud resulting in long latency. Decisions may arrive back too late to determine the best course of action. Furthermore, remote servers must share their resources according to a strategy that may not be beneficial to the critical task being controlled. Untimely computation breakdown may make critical computations difficult or completely unavailable if out-of-range highlighting the need to provide computing intelligence and decision making on the edge. Recently, edge processing has been proposed and may be the only reasonable answer. With sufficient computing capability it can provide decisions with local data quickly bypassing the latency of a cloud connection. Even in the larger context where cloud computing is required, local computation preprocesses the data resulting in better utilization of the edge-cloud transmission link. As an illustration of this type of capability, an asset tracking demonstration with real hardware was generated at ON Semiconductor.

## Session B3L-A: Select Topics in Analog Testing

Chair: Mengting Yan, *Analog Devices* Co-Chair: Jian Shao, *Infineon Technologies* Time: Tuesday, August 13, 2024, 14:00 - 15:30 Location: Room 1

## 

Ke Huang<sup>1</sup>, Xinqiao Zhang<sup>1,2</sup>, Farinaz Koushanfar<sup>2</sup>

<sup>1</sup>San Diego State University, United States; <sup>2</sup>University of California, San Diego, United States

Abstract: The increasing complexity and miniaturization of semiconductor devices, driven by the forces of globalization, have introduced new vulnerabilities to malicious attacks in the realm of analog integrated circuits (ICs). In this paper, we explore a new type of malicious attacks for analog ICs, namely Aging Trojan (ATs), which exploit the inherent physical degradation mechanisms present in analog ICs and pose a significant threat to the reliability, security, and integrity of analog ICs.We investigate the underlying mechanisms of ATs, detection methods, and mitigation strategies. The physical degradation phenomena, such as Negative Bias Temperature Instability (NBTI), that render analog circuits susceptible to accelerated aging effects were studied. We discuss unsupervised machine learning techniques for detecting ATs. By analyzing the complexities of ATs on analog ICs and offering insights into effective detection strategies, this paper aims to advance understanding and awareness of this emerging threat landscape. Experimental results show effectiveness of the proposed machine learning scheme for detecting analog ATs.

## 

Daniel Adjei, Bryce Gadogbe, Degang Chen, Randall Geiger Iowa State University, United States

Abstract: This paper presents a single temperature trimming technique for temperature sensitive circuits with a special emphasis on bandgap voltage references. We first present an approach for estimating the temperature profile of the reference output,  $V_{REF}(T)$  using information obtained from single temperature measurements of multiple node voltages in the circuit-under-test (CUT). Based on the results of this estimation technique, we further propose two single temperature trimming techniques. We validate our proposed techniques on a typical Kuijk bandgap voltage reference in a standard 180nm CMOS process. Simulation results show that our estimation technique is able to accurately predict the thermal profile of  $V_{REF}$  from only room temperature node voltage measurements with a maximum estimation error of 200ppm over a  $165^{OC}$  temperature range. With this estimation performance, the proposed trim techniques yield post-trim temperature coefficients comparable to that obtained by costly traditional two-temperature trim methods.

# Testing and Tuning of RRAM-Based DNNs: A Machine Learning-Assisted Approach688Kwondo Ma, Anurup Saha, Suhasini Komarraju, Chandramouli Amarnath, Abhijit Chatterjee

### Georgia Institute of Technology, United States

Abstract: Resistive random access memory (RRAM) is highly attractive for use in deep neural networks (DNNs) due to its ability to perform matrix-vector multiplications with ultra-low power consumption, which is essential for implementing DNNs. However, testing and tuning RRAM-based DNNs is challenging due to the vulnerability of RRAM crossbars to manufacturing process variations. This difficulty is compounded by the complexity of modern DNNs and the large number of test stimuli (e.g., images) required to assess performance, such as classification accuracy. Post-manufacture tuning is further complicated by the need to test the DNN through each tuning iteration to ensure convergence to acceptable performance levels. This research proposes a machine learning-assisted alternative testing and tuning framework for DNNs that: (a) allows DNNs to be tested with a limited number of test stimuli (few test images) and (b) enables post-manufacture tuning of complex DNNs to be performed in tens of seconds, as opposed to hours or more. The core methodology, along with associated algorithms and test cases, is described.

Nikhil Sagar Modala<sup>1</sup>, Lakshmanan Balasubramanian<sup>2</sup>, Rubin Parekhji<sup>2</sup>, Sule Ozev<sup>1</sup> <sup>1</sup>Arizona State University, United States; <sup>2</sup>Texas Instruments Inc., India

**Abstract:** Due to increasing defect rates and increasing complexity of mixed-signal circuits, evaluation of fault coverage for a given input, also known as fault simulation, has become essential. Existing fault simulation tools inject and analyze fault responses at this level of detail. However, extending fault simulation to large circuits, especially when digital signals and/or frequency translation is involved, can be difficult due to the nature of simulations. In this paper, we aim to increase the fault simulation efficiency further by dropping some faults to be simulated through functional equivalency at the sub-block level. We use a fault clustering approach that is based on the fault response model parameters and simulate only a representative fault model for each cluster. Experimental results show that the hierarchical fault simulation time can be reduced by a factor of 8-10 without sacrificing the fault simulation accuracy.

### On Adaptive Feedback Active Noise Control Systems with Online Adaptation of

### Nazarbayev University, Kazakhstan

Abstract: This paper focuses on identifying the secondary path (SP) in feedback active noise control (FBANC) systems during online operation. In FBANC, it's not feasible to install a reference microphone for capturing the reference signal needed for the (adaptive) active noise control (ANC) filter. Consequently, an appropriate reference signal is required to be electronically generated, necessitating an accurate estimation of the SP. The accuracy of SP estimation (SPE) directly impacts the stability and noise reduction capabilities of ANC systems. Building upon prior work that utilized two adaptive filters; one for ANC and another for SPE; this paper presents a method with a similar structure. The key idea in this paper is to tailor a generalized sigmoid function for developing a variable stepsize (VSS)-based adaptive algorithm for (adaptive) ANC filter. The proposed method exhibits improved noise reduction performance as well as good SPE performance, as evidenced by the detailed numerical results provided in the paper.

## Session B3L-B: Image Processing

**Chair:** Giovanni Schembra, *University of Catania* **Time:** Tuesday, August 13, 2024, 14:00 - 15:30 **Location:** Room 2

## <sup>1</sup>Shanghai Jiao Tong University, China; <sup>2</sup>Yiwu Zhiyuan Research Center of Electronic Technology, China

Abstract: In the process of synthetic aperture radar (SAR) imaging, inaccuracy of imaging parameters is a common issue that affects image quality. This paper investigates SAR image quality assessment under the condition of inaccurate imaging parameters. We analyze the effectiveness of several existing image quality assessment methods and propose a new image quality assessment method - Weighted Minimum Deviation Evaluation Function (WMDEF). WMDEF is defined as weighted sum of four image quality evaluation functions, and the weights are obtained by particle swarm optimization. Experimental results demonstrate that, under this circumstance, the evaluation performance of WMDEF surpasses that of several other image quality assessment methods.

### **Improving Classification Accuracy in Paint Surface Inspection using Image Processing and Deep Learning ....... 707** Motoki Sawada, Yuta Fuji, Naoyuki Aikawa

### Tokyo University of Science, Japan

**Abstract:** In recent years, research has been conducted on defect detection and classification using image processing and deep learning as alternatives to visual inspection. This paper proposes a method to detect defects by applying embossing and histogram equalization to images of painted surfaces and classify defects using ResNet50. The proposed method is capable of simultaneously classifying multiple defects in an image. Furthermore, we show that the proposed method can improve the accuracy of defect extraction compared to the conventional method.

Andrea Caruso, Giovanni Schembra University of Catania, Italy

Abstract: Nowadays, Virtual Reality (VR) is becoming a crucial technology in various sectors, emerging as one of the fastest- growing market segments in recent years. Moreover, continuous advancements in networking and multimedia technologies are in- creasing the interest regarding several VR applications. Given that 360° video transmission is bandwidth hungry, video compression plays a key role for the deployment of applications based on it. Since users have a partial view of the whole 360° image stream, that is the portion they are focusing, while the part that is behind them is not seen, a widely adopted way to compress this kind of stream is by using a differentiated compression, with higher quality for the image areas that mostly influence the perceived quality. However, at the best of our knowledge, perceived quality has been never correlated to the grade of interaction between the user and the viewed scene, and to the network delay. To this purpose, the main objective of this paper is proposing a 360° video streaming framework with differentiated encoding, and analyze the impact of the user's movements while interacting with the viewed scene, in different cases of network delay.

| Hardware Implementation of an α-Trimmed Adaptive Median Mean Filter | 719 |
|---------------------------------------------------------------------|-----|
| Nathan Clarks, Matthew Schweinfuss, Enass Hriba, Firas Hassan       |     |

### Ohio Northern University, United States

Abstract: In this paper, we provide and describe a hardware implementation of an  $\alpha$ -trimmed adaptive median mean filter capable of restoring images affected by both Gaussian and impulse noise without causing blurriness in the reconstructed image. The algorithm was implemented on an FPGA device and achieved real-time processing capabilities of 50 frames/second for a frame of 1024×768 pixels while utilizing 3363 LUTs, 392 FFs, and 1 DSP block.

### Concordia University, Canada

Abstract: This paper introduces the concept of combining wavelet scattering with attention mechanisms in ResNet for improved image feature extraction. The new proposed architecture seamlessly integrates translation-invariant scattering module with an attentionbased residual neural network, which is very useful for supervised learning tasks. The framework utilizes scattering priors, enhancing network stability against deformations and preserving high-frequency detail preservation for effective object classification. Attention Residual Learning blocks within the network merge attention and residual learning to create interpretable and discriminative representations. The experimental results on subsets of CIFAR10 dataset validate the effectiveness of proposed work, especially in handling small datasets. This paper highlights the potential of proposed network in advancing image classification, particularly in limited-data scenarios.

## Session B3L-C: Advancements in Power & Energy

Chair: Michael Sherif, *NPSU* Time: Tuesday, August 13, 2024, 14:00 - 15:30 Location: Room 3

<sup>1</sup>Ashton Consulting LLC, United States; <sup>2</sup>University of California, Santa Cruz, United States; <sup>3</sup>Naval Postgraduate School, United States;

Abstract: Heat pumps are often mandated and installed in new construction. Further, existing air conditioning units are now being replaced with heat pumps for winter heating use. There are a variety of options from straight heat pump use for heating to augmentation with carbon-based fuel heaters as well as simple resistive heating. The temperature data and energy price for a particular region will determine the basic cost profile and breakpoints for transitioning or augmentation. However, carbon minimization will likely not align with the minimizing of operational costs. This paper first examines a number of scenarios utilizing data gathered from two facilities which gives the reader some options based on desired parameters of optimization. This may be particularly useful if there are government energy credits available for all-electric and solar or carbon mandates that must be met. The paper places much of its emphasis on practical solutions.

<sup>1</sup>Ashton Consulting LLC, United States; <sup>2</sup>Naval Postgraduate School, United States

**Abstract:** Diode "oring" (or steering) is often used to provide battery backup power for many consumer products. For higher power dc microgrids such as those found on commercial or military maritime vessels, auctioneering diodes may be used for no-break power from multisource paths to vital loads. This paper addresses the following two issues: (a) the observed uneven distribution of negative bus current when employing only positive auctioneering diodes with multiple loads on the same circuit, and (b) the observed double load voltage discovered during one possible double ground fault scenario. An arrangement with anti-parallel diodes on the negative rail provides both balanced rail impedances and a path for fault current.

### A Very High-Power Density Multilevel Inverter using Silicon Carbide (SiC)

<sup>1</sup>University of California, Santa Cruz, United States; <sup>2</sup>Scintillating Solutions LLC, United States

**Abstract:** The objective of this work was to design a motor drive for electric aircraft propulsion using commercial-off-the-shelf components with a power density above 50 kW/kg. A three-level flying capacitor topology was selected and co-simulated with the motor design in detail. The mechanical design was carried out including the thermal management system. This design involved custom-built aluminum laminated buswork. A single-phase prototype was constructed with SiC MOSFET modules and tested up to the designed dc voltage of 1.8kV. It was necessary to increase the gate resistance to improve the electromagnetic compatibility. The thermal management system design involved a loop of the cryo-cooled motor and thus the MOSFET junction temperature could be regulated to room temperature at rated operation. The power density of the motor drive was calculated to be 60 kW/kg.

## 

Osman Saleem<sup>1</sup>, Keith Corzine<sup>1</sup>, Leila Parsa<sup>1</sup>, Ricardo de Castro<sup>2</sup>

<sup>1</sup>University of California, Santa Cruz, United States; <sup>2</sup>University of California, Merced, United States

**Abstract:** The rapid increase in the adoption of Electric Vehicles (EVs) and Zero-Emission Vehicles (ZEVs) in California has led to significant changes in transportation infrastructure, power systems, and power distribution networks. In the first quarter of 2024 alone, 102,507 EVs were sold in California, with total sales for the year projected to exceed 410,000 vehicles if current trends continue. This shift necessitates policy changes, particularly in evacuation procedures. Traditionally, evacuation plans have assumed that evacuees use gasoline vehicles, ignoring the unique challenges posed by EVs and ZEVs, such as the need for charging and potential impacts of power outages. This study introduces the ZEV Evacuation Readiness Score (ZEV Score), a metric designed to evaluate how well a community can respond to an evacuation scenario involving ZEVs. The effectiveness of the ZEV Score is demonstrated through a case study of Santa Cruz County California, highlighting its potential application for stakeholders and policymakers.

### 

HeeCheol An, SoYoung Kim

### Sungkyunkwan University, Korea

**Abstract:** There has been a rapid increase in smartphone applications and features, which require high performance to support them. This causes current consumption to increase, resulting the system voltage to drop and leading to problems with the power of the smartphone. A representative example is the sudden power off problem caused by UVLO, which is considered a serious problem because it can be immediately recognized by the user. To solve this problem, lowering the clock of the main processing unit is mainly used, but this has the problem of decreasing performance. This paper proposes a circuit that compensates for voltage drop caused by momentary peak current by using a boost converter to drive the gate of the buck converter. To verify the proposed circuit, the DC resistance of the smartphone was modeled. Simulation results confirm that the circuit reduces a drop of 298mV when a peak current of 10A occurs. This improvement ensures power stability by preventing UVLO without limiting the performance of the main processing unit.

## Session B3L-D: Recent Advances in Circuits & Systems in Sub-THz

**Chair:** Sudipto Chakraborty, *Texas Instrument* **Time:** Tuesday, August 13, 2024, 14:00 - 15:30 **Location:** Room 4

## Circuit Innovations for D-Band Phased Array Transceivers ...... 754

Syed Mohammad Ashab Uddin, Mohammadreza Abbasi, Liwen Zhong, Wooram Lee *The Pennsylvania State University, United States* 

Abstract: This article presents circuit innovations for the implementation of silicon D-band phased array transceivers. A D-band passive phase shifter, using two parallel transmission lines connected with switches, is introduced for low insertion loss, precise phase control, and compact chip area. A D-band bi-directional common-gate amplifier based on a symmetric inter-stage matching network and current-reuse technique is proposed to provide gain for transmit and receive modes within a single amplifier footprint and low DC power consumption. The proposed building blocks are expected to motivate further development of compact, low-power D-band transceivers for next generation communications and sensing.

### Pushing the Performance Boundaries of mmWave and Sub-THz Transceiver Circuits through

Rice University, United States

**Abstract:** Passive networks play a crucial role in mmWave transceiver circuit design. Their design innovations can significantly enhance circuit performance and even enable new hardware functionalities that were previously impractical. This paper presents the design of three transceiver building blocks to demonstrate the critical impact of passive network innovations. These designs include a broadband 27--46-GHz LNA, a 47-GHz four-way Doherty PA with over-GHz modulation bandwidth, and a 100-GHz LO generator with a FoM over 190 dBc/Hz. All these designs achieve state-of-the-art performance without modifying the basic circuit topology or transistor operations, highlighting the importance of passive network designs in overcoming the limitations of current transistor technologies.

## 

Aritra Banerjee

University of Illinois Chicago, United States

**Abstract:** Integrated circuits at sub-THz frequencies enable many applications including high data rate wireless communication and high resolution radar. This paper reviews recent advances in sub-THz IC design and discusses different challenges and approaches in technologies, circuits, systems, packaging, measurements, and compares the performance of state-of-the-art implementations in different technologies.

Abstract: Design of RF and mmWave circuits and systems is a tedious process that requires an expert team to balance out many requirements and trade-offs. This requires numerous sub-tasks to be assigned to different engineers, who then design based on their knowledge, intuition and experience. On the other hand, we are observing a paradigm shift in many other fields, thanks to artificial intelligence (AI). It is not only possible to automate many tasks, but AI can come up with solutions on demand. In this regard, there is ample reason to investigate AI assisted methods for design and automation of RF and mmWave circuits and systems. This paper introduces an algorithmic synthesis approach that relies on deep convolutional neural networks (CNN) for the modelling of templatefree electromagnetic (EM) structures. It is worth noting that once a CNN model is trained, synthesis time is measured within minutes. Moreover, this model can be repeatedly used for different design targets, such as antennas, matching networks, and filters. These points are exemplified with synthesis and measurement results.

### **Reconfigurable Matched Filtering using Wideband Margin-Computing Correlators:**

**Abstract:** Correlators are fundamental building blocks in radar and communication signal-processing applications. In this paper, a wideband RF correlator based on the margin-computing (MC) paradigm and its applications to radar and communication signal processing are presented. The proposed analog correlator replaces traditional multiply-and-accumulate correlators with analog addition and-thresholding to enable energy-efficient correlation, making it hardware-friendly, scalable, and power-efficient. The prototype IC in 65nm CMOS supports 5GS/s inputs, a large correlation length of 1024, and 8-bit computing accuracy with a high energy efficiency of 152TOPs/J. Measured results of the baseband and direct-RF wavelet processing of impulse radar signatures across varying bandwidths and carrier frequencies are presented. Measured results of code modulated communication systems with correlator-based signal processing for code-synchronization and code-despreading are also reported.

## Session B3L-E: AI & Machine Learning in Biomedical Systems

Chair: Junning Jiang, *Nvidia* Co-Chair: Sungho Kim, *The University of Rhode Island* Time: Tuesday, August 13, 2024, 14:00 - 15:30 Location: Room 5

### 

Masoumeh Kalantari-Khandani, Yaser P. Fallah, Azadeh Vosoughi

University of Central Florida, United States

**Abstract:** We examine and propose a super resolution technique for Magnetic Particle Imaging (MPI) developed using transfer learning and sparse transforms. MPI is a new modality for medical imaging that relies on tracking of super-paramagnetic nanoparticles and offers a superior alternative to medical imaging methods. Like other imaging modalities, and despite being faster than other methods, MPI has limitations in spatial and temporal resolution. Super resolution techniques can considerably widen the applications of MPI, for real-time and in-vivo scanning and analysis. The current lack of MPI data prevents direct development of machine learning and deep neural network (DNN) based methods for super resolution. To overcome this issue, we study the application of transfer learning and show that utilizing some existing large datasets of natural images, and then retraining on a small dataset of MPI images allows for successful development of a super resolution DNN. We also propose to use sparsifying transforms in the early stages of the DNN, for improved quality of the inferred increased resolution image. We show that for such networks a 4-5 dB improvement is achieved

### Brain MRI Tumour Localization and Segmentation through Deep Learning ...... 782

Somayeh Davar, Thomas Fevens Concordia University, Canada

**Abstract:** The segmentation of magnetic resonance imaging (MRI) is an important task in medical imaging, particularly for brain MRIs, where accurate segmentation of anatomical structures, often challenged by uneven shapes and fuzzy boundaries of tumours, is difficult to achieve. Thus, automating reliable segmentation of the region of interest (ROI) is essential in medical imaging. This work introduces an automated approach for brain tumour segmentation, leveraging a Deep Convolutional Neural Network (CNN) with a focus on effectively segmenting brain tumours within T1-weighted contrast-enhanced MRI (CE-MRI) images. The proposed method is first performed on the classified images to localize the tumour regions of interest. In the next stage, the algorithm contours the concentrated tumour boundary for the segmentation process, which contains the network and attention module, both spatial and channel. To evaluate the overall system's performance, precision, recall, and dice similarity coefficient (DSC) were calculated, where we achieved 0.8757, 0.8804, and 0.8995, respectively. Our approach demonstrates promising results compared to previous methods using the same database.

| Triangular Prism-Based Omnidirectional Imaging System | 787 |
|-------------------------------------------------------|-----|
| Ali Ali Moin Diwan Paolo Reggiani Bhaskar Choubey     |     |

### Universität Siegen, Germany

**Abstract:** Omnidirectional images have a significantly broader field of view (FOV) compared to standard cameras. Various methods have been used to achieve an omnidirectional view. Ideally, this can be achieved by merging multiple images taken by one or more cameras into a single merged image. An application example of this is endoscopy, which is used for visualizing the internal structures of the human. However, despite the numerous techniques and methods available for achieving an integrated 360-degree view, there are limitations in processing this in real time due to the number of cameras and data size. In this paper, we present a novel vision system tip design to reduce the processing time by using less number of cameras compared with the current one in the literature.

### <sup>1</sup>Lahore University of Management Sciences, Pakistan; <sup>2</sup>Western Washington University, United States

**Abstract:** This paper presents an area and power-efficient design of a dual-channel electroencephalogram (EEG) based digital processor that can identify the onset of seizures with high accuracy. Multiple modules, including feature extraction and classification, constitute the hardware design. Neural networks are employed as classifiers because of their exceptional performance. These networks are data-driven algorithms that cause the system's performance to deteriorate and its power consumption to increase. A compute-in-memory (CIM) architecture has thus been used to address this problem. The performance of the neural network algorithm is evaluated using an open-source database of CHBMIT with 87.1% accuracy, 91.2% sensitivity, and 90.5% specificity. In contrast, the performance of hardware architecture was evaluated using its power, area, and latency.

### 

Sudarsini Tekkam Gnanasekar, Kruthi Doddabasappla, Rushi Vyas

### University of Calgary, Canada

**Abstract:** Cough is considered to be the major symptom of most respiratory problems and due to the recent COVID-19 pandemic cough detection has become a major area of interest. This paper addresses the issue of cough detection using a simple detection methodology using Fast Fourier Transform (FFT) analysis peak detection technique on accelerometer data worn in shirt pocket where smart phones are commonly carried. This methodology measures the peaks on the negative side of the FFT plots indicating phase discontinuity due to coughs or lack of best results are seen on accelerometer signals along directions perpendicular to the chest surface (Z axis) and the vertical height (Y axis) of the test subjects in the different frequency bands 1,2,3 (1-2Hz, 2-4Hz and 4-6Hz). The results indicate that the number peaks in bands 2 and 3 (2-4Hz and 4-6Hz) are significantly higher for detecting cough patterns quite consistently in 2 test subjects of height and weight 1-1.8 meters and 16-64 kilograms.

## Session B3L-F: Artificial Intelligence, Internet of Things & Systems 4

**Chair:** Najme Ebrahimi, *Northeastern University* **Co-Chair:** Shiuh-hua Chiang, *Brigham Young University* **Time:** Tuesday, August 13, 2024, 14:00 - 15:30 **Location:** Room 6

## 

Puyang Zheng, Dyumaan Arvind, Milutin Stanaćević Stony Brook University, United States

**Abstract:** We propose a RF energy harvesting system implemented in the 180nm CMOS technology that operates at 915 MHz ISM band. A system is designed to optimize the overall power conversion efficiency (PCE) within the input power range from -30 dBm to 0 dBm. The elements of the tunable matching network are selected based on the derived rectifier model. The dual-channel rectifier, including two Dickson rectifiers, is designed to achieve 64% power conversion efficiency (PCE) across the input power range. A power management unit (PMU), optimized for the operation modes of RF sensors, enables system to achieves 92% efficiency at -13 dBm. The end-to-end RF energy harvester PCE reaches 25.09% at the same input power.

## Impact of In-Pixel Processing Circuit Non-Idealities on Multi-Object Tracking

<sup>1</sup>Georgia Institute of Technology, United States; <sup>2</sup>University of California, Santa Barbara, United States

Abstract: Deep learning algorithms are robust to a small amount of noise in the input image. Traditionally, image signal processors (ISP) are used with the CMOS image sensor (CIS) to enhance image quality which consume additional energy and latency. Here, we evaluate an ISP-less CIS architecture in the presence of noise and other in-pixel circuit non-idealities for the autonomous driving application. By integrating the in-pixel processing circuits to CIS, we filter out the redundant frames and only pass the critical bit information downstream to the backend processor. Such in-pixel processing does not allow ISP operations to be applied to the captured raw image. To reflect these limitations, we model and apply circuit non-idealities to the regenerated raw images as an input to the QDTrack network for multi-object tracking. We evaluate the accuracy loss on the BDD100K dataset and examine its sensitivity on each of the image processing steps. We observe an overall accuracy drop of less than 1.2% in Identification F1-score (IDF1) and 2.1% in Multi-Object Tracking Accuracy (MOTA), suggesting that an ISP-less in-pixel processing circuit is feasible to reject 40% redundant frames directly on CIS.

| ADARE-HD: Adaptive-Resolution Framework for Efficient Object |     |
|--------------------------------------------------------------|-----|
| Detection and Tracking via HD-Computing                      | 311 |
|                                                              |     |

Mohamed Mejri, Chandramouli Amarnath, Abhijit Chatterjee

Georgia Institute of Technology, United States

Abstract: Efficient and low-energy camera signal processing is critical for battery-supported sensing and surveillance applications. In this research, we develop a video object detection and tracking framework which adaptively down-samples frame pixels to minimize computation and memory costs, and thereby the energy consumed, while maintaining a high level of accuracy. Instead of always operating with the highest sensor pixel resolution (compute-intensive), video frame (pixel) content is down-sampled spatially, to adapt to changing camera environments (size of object tracked, PSNR of video frames). Object detection and tracking is supported by a novel video resolution-aware adaptive hyperdimensional computing framework. This leverages a low memory overhead non-linear hyper-vector encoding scheme specifically tailored for handling multiple degrees of resolution. Previous classification decisions of a moving object based on its tracking label are used to improve tracking robustness. Energy savings of up to 1.6 orders of magnitude and up to an order of magnitude compute speedup is obtained on a range of experiments performed on benchmark systems.

### 

### <sup>1</sup>University of Massachusetts Lowell, United States; <sup>2</sup>University of Louisiana at Lafayette, United States

Abstract: Transformers have emerged at the forefront in the training and inference of diverse machine learning tasks, encompassing video processing, image generation and classification, and natural language processing (NLP). Despite their increasing prevalence, a comprehensive framework to efficiently implement them has been lacking. This study introduces a transformer-based framework which accelerates image processing of UNETR (U-shaped neural network transformer) model for video segmentation task using city-scapes dataset. Given the large size of images in the dataset we incorporate hyperattention and mixed precision in our design. Our model is trained on Google A100 GPU accelerator and profiled. Finally, our design is implemented on FPGA to take advantage of the reconfigurable and high-throughput characteristics of system-on-chips (SoC) for image processing. Our results indicate improvements compared to existing research in this domain.

### 

Zhiwei Zhong, Yuhao Ju, Jie Gu

### Northwestern University, United States

Abstract: Physics-embedded neural networks have recently gained significant interest in robotics due to the benefits of combining data-driven machine learning approaches with physics-based modeling methods for real-time control. Despite the improved accuracy over black-box neural networks, existing works have limitations in handling large ranges of system parameters, extracting latent physical parameters, and meeting real-time latency constraints. This paper proposes enhanced physics-embedded neural network models that overcome the scaling limitation of existing models and coupling issues in the extraction of hidden variables, rendering significantly improved model accuracy by more than 95%. A reinforcement learning based neural architecture search engine is developed to meet real-time latency constraints in embedded microprocessors, optimize the solution for scaling issues, and enable the efficient deployment of physics-embedded neural networks into resource-limited edge devices, with 3X searching speed compared with the exhaustive random search method.

## Session B4L-A: Circuits & Systems for Automotive Applications

**Chair:** Mohammed Ismail, *Wayne State University* **Co-Chair:** Leila Sharara, *Wayne State University* **Time:** Tuesday, August 13, 2024, 16:00 - 17:30 **Location:** Room 1

## 

Yazhong Lu<sup>1</sup>, Lingguang Chen<sup>2</sup>, Sean F. Wu<sup>3</sup>, Yang Zhao<sup>3</sup>, Huijun Li<sup>4</sup> <sup>1</sup>Zhejiang University, China; <sup>2</sup>Signal-Wise, LLC, United States; <sup>3</sup>Wayne State University, United States; <sup>4</sup>URD Consulting, United States

Abstract: Unlike a passenger sedan, a UTV (Utility Terrain Vehicle) has much more NVH (Noise, Vibration, and Harshness) problems because it is often driven on highly rugged outdoor road surfaces. Accordingly, the excitation forces from the rugged road surfaces are typically transient, unexpected, and highly intense. To minimize the NVH problems that driver and passengers may experience, the vibroacoustic characteristics of a full-size UTV must be thoroughly analyzed. This paper presents the vibration analyses of a full-size UTV. Specifically, modal analyses are applied to analyze the natural frequencies, natural modes, and damping coefficients of the steering wheel, driver floor panel, and driver seat rail. The critical resonance frequencies and resonance modes in these three areas that are critical to the driver's comfort level are determined. These results may be critically important for design modifications of a UTV to enhance the comfort level of a driver.

| An Advanced Cooling Device for Concentrated Photovoltaic Systems                                          |  |
|-----------------------------------------------------------------------------------------------------------|--|
| Yimeng Gou <sup>1</sup> , Jinghai Xiao <sup>2</sup> , Zhao Wang <sup>1</sup> , Caisheng Wang <sup>1</sup> |  |

<sup>1</sup>Wayne State University, United States; <sup>2</sup>The Independent Schools Foundation Academy, China

**Abstract:** The proposed phase change material based cooling method can maintain both solar panels and thermoelectric modules within a preferable temperature range, which extends the equipment's operational life, improves energy efficiency, facilitates secondary electricity generation via the thermoelectric modules, and enhances the safety of the whole system. Compared to the existing cooling methods currently used for concentrated photovoltaic panels, such as air cooling and water cooling, the phase change thermal storage material offers better stability and lower costs. Experimental studies have been conducted and validated the superior performance of the proposed cooling system over conventional cooling methods.

### High-Efficiency GaN-Based Synchronous Buck Converters for 48V to 1V Direct

### Leila Sharara, Mohammed Ismail

Wayne State University, United States

Abstract: This paper explores the design considerations and topologies for a 48V to 1V direct voltage conversion in automotive applications, focusing on Gallium Nitride (GaN) technology and a high switching frequency of 2MHz. Various converter topologies are compared, and the advantages of using GaN technology in high-frequency synchronous buck converters are highlighted. This research outlines design guidelines, identifies challenges, and proposes potential solutions to attain high efficiency, compact size, and adherence to automotive EMC standards. Efficiency optimization is highlighted as a critical factor in minimizing power losses and extending battery life within automotive contexts. The paper further discusses the pivotal role of selecting an appropriate converter topology and control scheme in achieving high efficiency and maintaining stable and reliable operation across diverse load conditions.

## Speeding Up Development of Unmanned Aerial Vehicles (UAV) with MATLAB and Simulink

Moiz Khan

### MathWorks, United States

**Abstract:** MATLAB and Simulink provide researchers with capabilities to develop algorithms and models, run simulations, analyze data, and deploy code to physical systems, all in an effort to speed up development. Researchers in unmanned aerial vehicles (UAV) use MATLAB and Simulink to model the UAV system architecture, design flight control algorithms, develop perception and motion planning systems using prebuilt algorithms, sensor models and fusion, and computer vision, and deploy production code directly to onboard processors.

### Battery – Supercapacitor Hybridization for High Performing Vehicles

Gholamabbas Nazri

### Wayne State University, United States

Abstract: The success of electric-based transportation is solely dependent on the development and implementation of a reliable, high energy – high power, safe and low-cost battery. Among various rechargeable batteries the lithium-ion battery has been proven to provide the highest gravimetric and volumetric energy density. While Lithium battery serve as the most efficient energy storage system, its power capability is limited due to slow mass transport of ion across the battery components. Supercapacitor is a high-power device and when hybridized by battery provides a high-power and high-energy system an ideal system for high performing vehicles. Application of supercapacitor in high performing vehicles has been accelerated in recent years particularly for high performing land, marine and space vehicles. In this work we are providing a brief background on status of batteries, supercapacitors, and hybridization of these two electrochemical power sources.

## Session B4L-B: Signal Processing

**Chair:** Cheol-Hong Min, *University of St. Thomas* **Time:** Tuesday, August 13, 2024, 16:00 - 17:30 **Location:** Room 2

### Direct Batch Generation of Optimal Eigenvectors of the Symmetric Kernel of the

 Discrete Hankel Transform
 843

 Magdy Tawfik Hanna
 843

## Fayoum University, Egypt

**Abstract:** The development of the recently emergent discrete fractional Hankel transform (DFRHT) depends mainly on the generation of orthonormal eigenvectors of the kernel matrix of the discrete Hankel transform (DHT). In order for the DFRHT to approximate its continuous counterpart, the eigenvectors in question should approximate samples of the eigenfunctions of the Hankel transform (HT). The recently developed techniques for the generation of optimal eigenvectors of any unitary kernel matrix can be classified as either indirect (in the sense of first generating initial eigenvectors as a prerequisite for the generation of optimal ones) or direct (in the sense of not needing initial eigenvectors). Moreover, those techniques can be viewed as either batch or sequential. The main objective of this paper is the assessment of the performance of the recently developed direct batch generation algorithms of optimal orthonormal eigenvectors of the symmetric kernel matrix of the DHT.

### 

Mamta Dalal, Steven Sandoval, Joshua Blowers

New Mexico State University, United States

**Abstract:** Traditional linear time-invariant (LTI) systems theory considers systems in which the input and output signals are complex-valued (CV) time-series. We develop an analogous theory within the framework of geometric algebra (GA) to consider linear rotation-invariant time-invariant (LRITI) systems in which the input and output signals are vector-valued (VV) time-series. To that end, we identify the importance of the rotation-invariance in the standard formalism, and point out its inheritance in the standard formalism due to the assumption of linearity with CV scalars. We reinterpret the CV product in the traditional convolution definition as a scale-rotation rather than a product. Based on this reinterpretation, we characterize LRITI systems by defining a convolution operation for VV systems which generalizes the convolution operation using rotors in a GA-based representation.

### Analysis of Oscillation Components in Instantaneous Frequency Estimation by

 Finite Order 2-D Hilbert Transformer
 852

 Jun Obara, Sena Hiruma, Naoyuki Aikawa

### Tokyo University of Science, Japan

**Abstract:** The Hilbert transformer (HT) is widely used in the field of signal processing. The HT is also used for estimating instantaneous frequency (IF) and is utilized in the analysis of non-stationary signals. This paper describes the IF estimation using 2-D HT. An analytic image is generated from the input image and the output image by the 2-D HT. From this analytic image, the instantaneous phase and IF can be derived. In the case of finite order 1-D HT, it is known that the estimated IF includes components that are even multiples of the input signal's angular frequency, due to approximation errors occurring during filter design. It is considered that similar errors are included in the estimation of IF using the 2-D HT. Therefore, this paper analyzes the errors by the 2-D HT in the estimated IF theoretically and through simulation.

### Direct Sequential Generation of Optimal Eigenvectors of the Symmetric Kernel of the

### Magdy Tawfik Hanna Favoum University, Egypt

**Abstract:** The generation of orthonormal Laguerre-Gaussian-power-like eigenvectors of the symmetric kernel matrix T of the discrete Hankel transform (DHT) is the corner stone in the development of the discrete fractional Hankel transform (DFRHT). The direct sequential generation of the columns of each partition of the modal matrix of matrix T pertaining to one distinct eigenvalue is considered. The focus of the research is the assessment of the direct sequential algorithms: the Gram Schmidt algorithm (GSA), modified Gram Schmidt algorithm (MGSA), sequential orthogonal procrustes algorithm (SOPA) and direct sequential evaluation by constrained optimization algorithm (DSEOA). Although the four algorithms are theoretically equivalent, they are quite numerically distinct. The difference in performance becomes more pronounced as the order of the square matrix T becomes larger. The assessment shows that the MGSA clearly outperforms both the GSA and the SOPA and that the DSEOA is by far the best in terms of the accuracy of the computation.

Natasha Clark<sup>1</sup>, Bettina Milosz<sup>2</sup>, Modou Dibba<sup>3</sup>, Cheol-Hong Min<sup>3</sup>

<sup>1</sup>Midcontinent Independent System Operator (MISO), United States; <sup>2</sup>United Natural Foods, Inc., United States; <sup>3</sup>University of St. Thomas, United States

**Abstract:** This study assessed eight hand gestures using two MetaMotionR (MMR) environmental sensors. With one sensor placed on the top of the hand and the other on the wrist, this experiment aimed to determine the best sensor placement to control mobile robots. By analyzing the information in time and frequency and extracting time-frequency domain features. Using an ensemble KNN, the system achieved 98.6% accuracy for the eight hand gesture recognition tasks. Therefore, it was determined that the best location for the sensor was on the combined locations of the top of the hand and wrist.

## Session B4L-C: Heterogeneous Integration for Advanced Packaging

**Chair:** Antonio de la Serna, *Siemens Government Technologies* **Time:** Tuesday, August 13, 2024, 16:00 - 17:30 **Location:** Room 3

#### 

Siemens EDA. United States

Abstract: In the process of seeking more efficient form factors for electronic devices that meet power, performance and latency requirements, heterogenous integration has risen to become one of the most promising areas to meet the demand of new products that meet very strict cost, performance, latency and power requirements. Whereas traditional board integration of different components meets cost requirements, it does not meet performance and/or latency requirements. In the same manner Systems-On-Chip (SOCs) may meet performance and form factor requirements but not cost. By increasing the complexity in which silicon dies are integrated together, it may be possible to deliver unique functionality at lower cost and faster time to market. As the manufacturing complexity increases, Design flows must be updated to accommodate a large variety for techniques allowing wiring redistribution, 2.5D interposers, 3D stacking, Plasma dicing, and Through Silicon vias (TSVs). In this work, we focus on the parallels between semiconductor processes and packaging processes, and outline a possible path to enhance heterogeneous packaging operations with techniques developed in semiconductor manufacturing.

| Opportunities, Challenges and Mitigations in 3DIC Design, Test, and Analyses | 867 |
|------------------------------------------------------------------------------|-----|
| Pratyush Kamal, Anthony Mastrojanni, Juan Rey, Antonio de la Serna           |     |

Siemens EDA, United States

**Abstract:** As 3D Die-on-Wafer and Wafer-on-Wafer stacking technologies mature, integration density will increase to provide new opportunities for complex homogeneous or heterogeneous integration of logic circuits across multiple materials and through multiple interconnect materials and technologies. This will create opportunities for new advancements in design, verification and analyses tools and design flows, test strategies and test planning, and system level practices.

| Co-Design for Heterogeneous Integration: High Level Decisions to the Rescue | 870 |
|-----------------------------------------------------------------------------|-----|
| Daniel Xing, Abir Akib, Yuntao Liu, Ankur Srivastava                        |     |

University of Maryland, College Park, United States

**Abstract:** The impact of informed high level decisions on the performance-power efficiency of semiconductors is well known. From the context of heterogeneous integration for 3D IC, new innovations are needed which close the loop from architecture to device; from multiphysics to high level abstraction. Such considerations will be presented.

### 

NHanced Semiconductors, Inc., United States

**Abstract:** The invention of the system-on-chip (SoC) was a huge leap forward from the standard semiconductor circuit board. Today the industry sees another leap forward: advanced packaging (AP). This paradigm shift in yield and performance enables affordable customization, true heterogeneous integration (HI), and a new manufacturing model: Foundry 2.0.

## Session B4L-D: Approximate & Analog Computing

**Chair:** Sahoo Bibhudatta, *SUNY University at Buffalo* **Time:** Tuesday, August 13, 2024, 16:00 - 17:30 **Location:** Room 4

Abstract: Recent innovations in devices, circuits, and systems have resulted in the rapid growth of parallel and in-memory architectures for energy-efficient computing. However, the choice of signal representation within these architectures has received less attention. While in principle the amplitude and time-dependence of the signals being processed can utilize either continuous (analog) or quantized (digital) variables, most modern processors use clocked digital representations that quantize both signal dimensions. Nevertheless, computing using analog variables is known to be fundamentally more resource-efficient for low- and moderate-precision operations, making it of interest for applications, such as inference and real-time control, where high precision is not required. However, difficulties with scalability, programambility, and memory access have impeded the realization of integrated analog computers (ACs) that can utilize these advantages. This paper provides a tutorial-style review of state-of-the-art integrated ACs and their advantages, such as ultra-low latency and high energy-efficiency, for selected problems where clocked digital architectures are not resourceefficient.

## A 5T-SRAM based Computing-in-Memory Macro Featuring Partial Sum Boosting and

**Abstract:** Analog computing-in-memory (CIM) has shown great efficiency on multi-bit matrixvector multiplication (MVM) computations. However, the analog approaches accumulate the MVM results on the voltage domain, different accumulation lengths and data sparsity cause distinct signal swing degradation and variation. Hence, the analog-to-digital converter (ADC) readout circuit requires more power and area to trade off with the noise and comparator offset. In this paper, we proposed a partial sum boosting (PSB) method to give 4.6× to 22.8× signal swing enhancement for multi-bit MVM, which mitigates severe ADC overhead for analog CIM. Furthermore, multi-bit 5T-SRAMs with diverse voltage supplies realize multibit uniform/non-uniform sign-and-magnitude (SNM) weight representation and save up to 38.8× SRAMs static power dissipation. The chip was fabricated in 28nm CMOS technology and measured a throughput density of 89.04 TOPS/mm2 and energy efficiency of 219.5 TOPS/W under 3-bit activation, 4-bit SNM weight with no truncation 5-bit partial sum readout condition.

## Real-Time 5.7 GHz Realization of Multidimensional 2D Hybrid Approximate

Buddhipriya Gayanath<sup>1</sup>, R.J. Cintra<sup>2</sup>, Soumyajit Mandal<sup>3</sup>

<sup>1</sup>Florida International University, United States; <sup>2</sup>Universidade Federal de Pernambuco, Brazil;

<sup>3</sup>Brookhaven National Laboratory, United States

**Abstract:** The measurement and perception of the RF spectrum is key for managing the electromagnetic environment of the future. Multichannel spectrum sensors that measure the RF environment can provide situational awareness across dimensions of space and time, leading to information on modulation, bandwidth, and frequency use across multiple directions of propagation. In this paper, we update our recent work on 32-channel 5.7 GHz spectrum sensing using a multiplierless approximate DFT operating in the spatial domain. In this work, 32 parallel temporal FFTs of size 1024 bins are realized in as a parallel processing systolic array on a Xilinx reconfigurable open architecture computing hardware (ROACH-2) FPGA system to enable fine grained spectrum monitoring at about 100 kHz resolution across 32 simultaneous digital RF beams.

### 

IBM, United States

**Abstract:** The IBM Z® platform is optimized for processing vast amounts of data and transactions with low latency in a highly virtualized and secured environment. The platform and its microprocessor chip are designed to deliver consistent system performance, throughput, and response times with sustained processor utilization of over 90%. The IBM z16<sup>TM</sup> design introduced a novel modular scalable cache hierarchy to the industry that is extendable to future platform generations, the adoption of emerging technologies, and new architecture enhancements to meet the needs of the mission critical workloads that run on the platform. This paper will expand on prior publications discussing this novel design, demonstrate how events flow in the system, and explore the intersection of potential future architectures and technologies that could be integrated into this type of framework from an academic perspective.

## Session B4L-E: Biosensor & Health Monitoring Systems

**Chair:** Keith Corzine, *UC Santa Cruz* **Co-Chair:** Jose de la Rosa, *IMSE Sevilla Spain* **Time:** Tuesday, August 13, 2024, 16:00 - 17:30 **Location:** Room 5

## Use of Wearable Capacitive Touchpads Common in Smartphones to Detect Respiration Rates and Coughs ....... N/A

R. Vyas, V. Gupta University of Calgary, Canada

Abstract: The timely identification of respiratory distress, often indicated by coughs, has become important for public health readiness and response to pandemics like COVID-19, SARS, and Influenza. Traditional methods of monitoring respiratory health including hospitalization rates, doctor reports, and wearable sensors have limitations in real-time reporting, extra costs etc. With smartphones used by 66 out of every 100 persons, they are useful tools in various public health initiatives. Our project studies the use of capacitive touchpad sensors present in smartphones for monitoring respiratory patterns and distress. Specifically, this work uses different touchpad scan patterns and orientations, and electrode spacings to study correlation between capacitive fluctuations and respiratory rates/coughs. Our measurements with a commercial 5 by 6 element capacitive touchpad sensor array (0.8cm pitch) worn on the chest/pocket registers fluctuations of 1-3.5nF over baseline for different scan patterns due to cough-related chest-surface movements. Furthermore, when the touchpad is tightly placed on the chest/pocket this method can also detect breathing rate with changes of 0.5-4nF over the baseline

**Abstract:** We are developing a non-invasive neonatal monitoring device to continue monitoring of the fetus during the first 30-60 minutes postpartum. We monitor critical physiological parameters such as oxygenation, heart rate, and skin pH. These are crucial for the early identification of potential health issues in newborns, including hypoxia and acidosis which are forerunners of neonatal encephalopathy and cerebral palsy. An ESP32 microcontroller and MAX30102 sensor are integrated within a wearable, sock form factor. We address limitations of traditional invasive neonatal monitors by providing continuous, real-time health assessment through seamless wireless connectivity, including Bluetooth compatibility. The project not only focuses on the technological development of the device but also emphasizes the importance of a non-invasive monitoring design for comfort and precise diagnostics for informed healthcare interventions. This project underscores the critical role of early detection and intervention for neonatal compromise, to potentially transform neonatal health monitoring and improve outcomes for newborn infants worldwide.

### Cough Event Prediction based on Spectral Features and SVM and KNN Machine Learning using

Sudarsini Tekkam Gnanasekar, Kruthi Doddabasappla, Rushi Vyas

### University of Calgary, Canada

**Abstract:** Cough is a major illness and needs to be tackled efficiently to manage respiratory distress and the long-term effects of respiratory problems. In this work an efficient method is proposed using the multi-band spectral features of triaxial accelerometer data which is recorded from different body positions is analyzed and this method reduces the computational complexity significantly. The measurements yield better results in the Y axis compared to the X and Z axes for chest and stomach positions whereas for the ear position it is observed that Z axis produced a higher feature score for cough activity prediction. The classification was performed using SVM and KNN and the best performance observed was in the ear position with a maximum accuracy of 99%.Unlike prior works that use computationally intensive methods such as CNN the proposed method uses Spectral features and SVM and KNN from worn Triaxial Accelerometer signals.

| A Real-Tim | e Detection of Patient | 's Aggressive Behaviors on a S | Smart Sensor | <br>911 |
|------------|------------------------|--------------------------------|--------------|---------|
|            |                        |                                |              |         |

### Yudai Inoue, Vasily G. Moshnyaga Fukuoka University, Japan

**Abstract:** Aggression in older people with dementia is one of the most challenging problems of medical care in today's nursing homes. Existing wireless autography-based technologies help caregivers to collect data related to behavioral and psychological symptoms of dementia. However, their inability to process the data in real-time reduces the efficiency of patient assessment and affects the quality of care. In this paper we present a computationally inexpensive technique that identifies aggressive behaviors on a sensor that measures the wrist acceleration of the dominant patient hand. Experiments show that the technique effectively detects aggressive behaviors in real-time and significantly reduces the wireless data transmissions.

<sup>1</sup>Fuzhou University, China; <sup>2</sup>Tsinghua University, China

**Abstract:** This paper presents an implantable pressure sensor readout system (IPRO) which can work at low voltage and does not need internal low dropout regulator (LDO). This paper has made a series of optimizations for power consumption. The analog frontend (AFE) and signal processing parts use current mode and differential pseudo phase locked loop (DPPLL), which can work stably under low voltage and maintain a certain degree of accuracy and power rejection ability. The wireless data transmission part adopts the impulse ultra-wide band (IR-UWB) scheme. Its ultra-low power consumption allows it to be powered by a charge pump. In order to further optimize power consumption, a simplest encoding circuit suitable for the front-end output form was proposed, and a power oscillator (PO) with accelerated starup was used to optimize area and power consumption. The system is implemented in 180 nm CMOS process, with a chip area of 1.1 \* 1.0 mm2. The measurement results indicate that the typical working voltage of the IPRO is 0.8 V, the typical power consumption is 11.8 uW, and the accuracy is 1.11 mmHg.

## Session B4L-F: Artificial Intelligence, Internet of Things & Systems 5

**Chair:** Juan A. Montiel-Nelson, *Institute for Applied Microelectronics* **Co-Chair:** Fei Yuan, *Toronto Metropolitan University* **Time:** Tuesday, August 13, 2024, 16:00 - 17:30 **Location:** Room 6

| A Hybrid Approach to Defend against Adversarial Evasion Attacks | 919 |
|-----------------------------------------------------------------|-----|
| Kuchul Jung, Jongseok Woo, Saibal Mukhopadhyay                  |     |

## Georgia Institute of Technology, United States

**Abstract:** This paper discusses the application of Automatic Modulation Classification (AMC) for addressing spectrum congestion in wireless communication and the vulnerability of deep learning models to adversarial evasion attacks. By focusing on Convolutional Neural Networks (CNN), this study proposes a robust design that employs the STFT algorithm and adversarial learning. Experimental results show an enhanced resilience against adversarial attacks, but the paper concludes that further research is needed to balance model performance with adversarial resistance.

## 

Michael Keyser, Farzad Ordubadi, Armin Tajalli

### University of Utah, United States

Abstract: This work begins with an overview of the design space of state-of-the-art analog machine learning accelerators. We highlight how the tradeoff between accuracy and increased bit-precision decreases the energy-efficiency for high-accuracy systems. Since analog computations are subjected to process variations and mismatch, high bit-precision accelerators are not realizable without a formal understanding of the effects of process variations and mismatch on the accuracy of analog accelerators. A mathematical framework is presented to quantify the effects, and a sample convolutional neural network, the VGG11 architecture modified to classify the CIFAR-10 dataset, is used as a test vehicle for this study. This study shows that despite the inherent sensitivity of analog machine learning systems to process variations and mismatch, proper calibration mechanisms that target the most sensitive computations can improve the overall network accuracy. The advantage of such an approach will be the development of analog accelerators that are very energy-efficient and high-speed without reducing accuracy.

## 

R. Kanda<sup>1</sup>, N. Onizawa<sup>1</sup>, M. Leonardon<sup>2</sup>, V. Gripon<sup>2</sup>, T. Hanyu<sup>1</sup>

## <sup>1</sup>Tohoku University, Japan; <sup>2</sup>IMT Atlantique, France

**Abstract:** This study aims to ensure consistency in accuracy throughout the entire design flow in the implementation of edge AI hardware for few-shot learning, by implementing fixed point data processing in the pre-training and evaluation phases. Specifically, the quantization module, called Brevitas, is applied to implement fixed-point data processing, which allows for arbitrary specification of the bit widths for the integer and fractional parts. Two methods of fixed-point data quantization, quantization aware training (QAT) and post-training quantization (PTQ), are utilized in Brevitas. With Tensil, which is used in the current design flow, the bit widths of the integer and fractional parts need to be 8 bits each or 16 bits each when implemented in hardware, but performance validation has shown that accuracy comparable to floating-point operations can be maintained even with 6 bits or 5 bits each, indicating potential for further reduction in computational resources. These results clearly contribute to the creation of a versatile design and evaluation environment for edge AI hardware for few-shot learning.

Twisha Titirsha<sup>1</sup>, Md Maruf Hossain Shuvo<sup>2</sup>, Shahrin Akter<sup>1</sup>, Syed Kamrul Islam<sup>1</sup> <sup>1</sup>University of Missouri, United States; <sup>2</sup>University of Texas at El Paso, United States

Abstract: Spiking neural networks (SNNs) emulate biological neurons transmitting information through discrete spikes or pulses of activity. SNNs found extensive application in neuromorphic computing due to efficient cognitive-aware computation, event-driven processing, and robustness to noise and faults. Numerous applications in pattern recognition, sensory processing, computer vision, etc., require complicated networks for enhanced performance and reliability. Realizing such complex SNNs on circuits poses significant challenges due to limitations in scaling up a single array of fully connected neurons within hardware resource and energy constraints. contemporary neuromorphic chips incorporate small crossbars with time-multiplexed interconnects to address this issue. However, designing robust and efficient neuromorphic systems based on small crossbars is impractical. Therefore, strategic segmentation and positioning of SNNs are crucial for efficiently mapping them onto neuromorphic circuits. This work proposes the hill climbing optimization method to effectively translate SNNs onto neuromorphic circuits, minimizing the spike communication on the interconnect.

### A Fully-Configurable Digital Spiking Neuromorphic Hardware Design with

 Variable Quantization and Mixed Precision
 937

 Shadi Matinizadeh<sup>1</sup>, Arghavan Mohammadhassani<sup>1</sup>, Noah Pacik-Nelson<sup>2</sup>, Ioannis Polykretis<sup>2</sup>,

Abhishek Mishra<sup>1</sup>, James Shackleford<sup>1</sup>, Nagarajan Kandasamy<sup>1</sup>, Eric Gallo<sup>2</sup>, Anup Das<sup>1</sup>

<sup>1</sup>Drexel University, United States; <sup>2</sup>Accenture Lab, United States

**Abstract:** We introduce QUANTISENC, a fully-configurable digital spiking neuromorphic hardware to optimize performance and power consumption. Similar to other design methodologies, QUANTISENC allows configuring its number of layers and neurons per layer to generate the hardware architecture for a target machine learning model. QUANTISENC introduces two key novelties. First, it allows the user to set a separate quantization and precision policy for the synaptic weights and the internal state variables of neurons through software, to fully configure and optimize the design according to the requirements of an application. Second, QUANTISENC allows to dynamically configure neuron design parameters to explore the trade-off between performance and power consumption at run-time. We open-source the Verilog-based design and testbench setup along with the software package to advance research in this field. Using open-source datasets, we show improvement in area, power, and performance over several state-of-the-art designs.

## Wednesday, August 14, 2024

## Session C1L-A: Analog & Mixed-Signal Circuits & Systems 1

**Chair:** Tejasvi Das, *Rochester Institute of Technology* **Co-Chair:** Najme Ebrahimi, *Northeastern University* **Time:** Wednesday, August 14, 2024, 8:00 - 9:30 **Location:** Room 1

<sup>1</sup>Tsinghua University, China; <sup>2</sup>Beijing Smartchip Microelectronics Technology Co. Ltd., China

**Abstract:** This paper presents a high linearity ADC front-end circuit for single-ended inputs. The input common-mode voltage of the amplifier/comparator is input-dependent in a fully-differential input ADC. So single-ended inputs will affect the linearity of the amplifier/comparator, which requires the amplifier/comparator to have a wide input common-mode range. In the proposed circuit, the summing node is floating in the sampling phase to avoid sampling the common-mode voltage of input. And in order to preserve the linearity of sampling, the gate-source voltage of the top-plate short switch is kept almost unchanged by a bootstrap circuit. The proposed front-end circuit is implemented and validated in a 0.18µm CMOS process. At a sampling rate of 125Msps, the circuit achieves 84.4dBc SFDR when the single-ended input reaches Nyquist frequency with an amplitude of 2.2VPP.

Hanna Iva Busse, Gonçalo Rodrigues, Diogo M. Caetano, Marcelino Santos, Jorge Fernandes INESC MN, Instituto Superior Técnico, Universidade de Lisboa, Portugal

Abstract: Data transfer and clock generation is a general challenge that must be tackled in implantable devices. In this work, we propose an integrated demodulator for Manchester-coded AM signals. The developed demodulator for the data Down-Link of an implantable device based on ultrasound (US) fits into an area of  $378.5 \times 136.9 \ \mu\text{m}^2$  and uses a robust design to cope with variations in the input signal due to misalignment. It is designed to receive an envelope, which provides the Manchester-coded data with an amplitude margin of 30 mV on an fluctuating DC-level, and to output a retrieved clock together with the demodulated data. The circuit functionality is verified by simulation in 0.18  $\mu$ m CMOS technology, applying the upper sideband of an AM modulated signal with data rates of 9 to 15 kHz.

### 

Faraz Adin<sup>1</sup>, Hanyu Wang<sup>1</sup>, Lukang Shi<sup>2</sup>, Rajiv Singh<sup>2</sup>, Erhan Hancioglu<sup>2</sup>, Gabor C. Temes<sup>1</sup> <sup>1</sup>Oregon State University, United States; <sup>2</sup>Infineon Technologies, United States

Abstract: This paper introduces a novel multi-residue two-step incremental analog-to-digital converter (MR-IADC) designed to enhance accuracy. In this scheme, unlike in a conventional two-step IADC, the conversion residue stored on the last integrator of the modulator is extracted multiple times. This provides an additional degree of freedom, thereby improving the Signal-to- Quantization-Noise-Ratio (SQNR). In the first step, the circuit is configured as a first-order Incremental ADC followed by a novel decimation filter. The fine quantization is realized using a SAR ADC with a two-capacitor DAC (2C-SAR). The SAR ADC samples the conversion residue of the first stage multiple times. The 2C-SAR ADC is followed by a Zero-Order-Hold (ZOH) filter, and it functions as a noise-shaping SAR within one conversion cycle. Compared to the conventional two-step IADC, the multi-residue two-step incremental ADC (MR-IADC) achieves an 8 dB improvement in SQNR by sampling the residue 4 times, and a 13 dB for sampling 8 times. Simulation results and theoretical analysis verify the effectiveness of this structure.

### 

### <sup>1</sup>Samsung Semiconductor India Research, India; <sup>2</sup>Indian Institute of Technology Delhi, India

**Abstract:** A sub-sampling front end with a low capacitive loading followed by a second order  $\Sigma\Delta$  ADC is presented. The front end is designed in the UMC 65nm low-power CMOS process and has a power consumption of 110  $\mu$ W from a 1.2 V supply. The ADC has a simulated SQNR of around 74 dB and an ENOB of around 12 bits. The time constant of the sampling front-end is around 0.76 ps. The entire front end occupies an area of 200  $\mu$ m x 550  $\mu$ m

#### Iowa State University, United States

**Abstract:** This paper introduces an innovative voltage reference design using subthreshold MOSFETs and piecewise linear compensation technique. The design addresses the challenges of high area and power consumption in traditional bandgap references (BGRs). By leveraging the negative temperature coefficient of subthreshold MOSFETs, the proposed architecture achieves a stable voltage reference with significantly reduced temperature coefficient of 2.5ppm over a temperature range of  $[-40]^{\circ}$  C to  $[125]^{\circ}$  C and power supply ripple rejection of -77dB. The design operates accurately within a supply range of 0.8V to 1.8V and achieves line regulation of 0.29%.

## Session C1L-B: Deep Learning & Applications

**Chair:** Cheol-Hong Min, *University of St. Thomas* **Co-Chair:** Tuy Nguyen, *Northern Arizona University* **Time:** Wednesday, August 14, 2024, 8:00 - 9:30 **Location:** Room 2

## MedSegNet: A Lightweight Convolutional Network Combining Dual Self-Attention and

 Multi-Scale Attention for Medical Image Segmentation
 965

 Subrato Bharati, M. Omair Ahmad, M.N.S. Swamy
 965

Concordia University, Canada

Abstract: In this work, we propose a novel lightweight convolutional neural network called MedSegNet that innovatively incorporates residual modules with a fusion of dual self-attention and multi-scale attention mechanisms, designed for the segmentation of three different types of medical images such as CT, non-mydriatic 3CCD, and colonoscopy images. This network has demonstrated proficiency in executing segmentation tasks for images of a specific modality, depending upon adequate training with a representative dataset of that same modality. Experiments are implemented to train MedSegNet using three different datasets, and the trained MedSegNet is tested on the respective test datasets. In comparative analyses with state-of-the-art models, MedSegNet has been shown to outperform in terms of segmentation dice coefficient (DSC) and intersection over union (IoU), computational efficiency, and robustness to variations in medical imaging modalities. These results highlight the potential of MedSegNet to set a new benchmark for medical image segmentation tasks.

## 

Concordia University, Canada

**Abstract:** This work introduces a novel framework for breast ultrasound image segmentation that leverages novel super-pixel grid mixing-based augmentation, an advanced loss function named contextual differential loss (CDL), and a feature fusion network (FFNet). The proposed method aims to address the limitations of current segmentation techniques by enhancing data variability, improving boundary delineation, and ensuring comprehensive feature integration. All novel concepts are utilized during training on two publicly available breast ultrasound datasets (BUS and BUSI), and we test our proposed concepts on the respective datasets that are used in training. The results and ablation study show that the results of our proposed model outperform the state-of-the-art models and create a benchmark for the future.

## PathoWAve: A Deep Learning-Based Weight Averaging Method for Improving

 Domain Generalization in Histopathology Images
 975

 Parastoo Sotoudeh Sharifi, M. Omair Ahmad, M.N.S. Swamy
 975

### Concordia University, Canada

Abstract: Recent advancements in deep learning (DL) have significantly advanced medical image analysis. In histopathology image analysis, varying staining protocols present significant domain shifts, necessitating effective domain generalization (DG) strategies to improve the reliability of automated cancer detection tools. In this paper, we introduce Pathology Weight Averaging (PathoWAve), a multi-source DG strategy to address domain shift in histopathology images. Integrating weight averaging with parallel training trajectories and histopathology-specific data augmentation methods, PathoWAve enables comprehensive exploration and precise convergence within the loss landscape, capturing flatter minima which significantly boost the generalization capabilities of DL models. To the best of our knowledge, PathoWAve is the first proposed weight averaging method for DG in histopathology images. Our quantitative results on the Camelyon17 WILDS dataset, composed of Whole Slide Images across five distinct medical centers, demonstrate PathoWAve's superiority over previously proposed methods. Our code is available at https://github.com/ParastooSotoudeh/PathoWAve.

Abstract: Global healthcare systems face growing pressure as populations rise. This can lead to longer wait times and an increased risk of treatment delays or misdiagnosis. Artificial intelligence (AI) diagnostic systems are being developed to address these challenges, but concerns exist about their accuracy and data security. This study introduces a robust AI telehealth system that offers a two-pronged approach. It utilizes a cutting-edge image analysis method, vision transformer, to enhance diagnostic accuracy, while also incorporating post-quantum cryptography algorithm, Kyber, to ensure patient privacy. Furthermore, an interactive visualization tool aids in interpreting the diagnostic results, providing valuable insights into the model's decision-making process. This translates to faster diagnoses and potentially shorter wait times for patients. Extensive testing with various datasets has demonstrated the system's effectiveness. The optimized model achieves a remarkable 95.79% accuracy rate in diagnosing COVID-19 from chest X-rays, with the entire process completed in under five seconds.

## Session C1L-C: Machine Learning Methods for Harware Security

**Chair:** Shrivastava Aatmesh, *Northeastern University* **Co-Chair:** Sirakoulis Georgios, *Democritus University* **Time:** Wednesday, August 14, 2024, 8:00 - 9:30 **Location:** Room 3

## 

### Alfred Moussa, Nader Rafla

### Boise State University, United States

Abstract: The growing Internet of Things (IoT) and System on Chip (SoC) applications have increased the prevalence of active medical implants. With the global supply chain issue in Integrated Circuits (ICs), design processes are often outsourced to multiple untrusted entities, creating opportunities for malicious modifications known as Hardware Trojans (HTs). These HTs can compromise integrity, performance, or functionality, and may even introduce backdoors for unauthorized access. This paper presents an enhanced approach for detecting hardware trojans through utilizing Machine Learning model and by reducing the linearity between features to avoid over-fitting. The supervised model showed a 99.2% true positive and true negative rate, as well as an F-measure of 99.3%, while the unsupervised model achieved a 99.5% true positive rate with the use of random projection, thereby offering a more resilient machine learning based method for detecting HT's.

### 

Niraj Prasad Bhatta, Usha Giri, Fathi Amsaad

Wright State University, United States

**Abstract:** Ensuring the integrity and security of integrated circuits (ICs) is critical in the digital age, as hardware Trojans (HTs) can enable unauthorized access and cause data leaks or malfunctions. Traditional HT detection methods, which often compare with a 'golden chip,' face challenges due to HTs' covert nature and complex designs. This paper introduces a machine learning-based approach to analyze power side-channel signals, overcoming these limitations. By analyzing a comprehensive dataset covering various Trojan states and conditions, we've developed a model that accurately detects HTs without needing a golden reference. Our use of deep learning has significantly improved detection accuracy, offering a new direction for real-time HT monitoring and enhancing IC security throughout their lifecycle.

### 

### Andey Robins, Mike Borowczak

### University of Central Florida, United States

Abstract: The von-Neumann architecture has a bottleneck which limits the speed at which data can be made available for computation. To combat this problem, novel paradigms for computing are being developed. One such paradigm, known as in-memory computing, interleaves computation with the storage of data within the same circuits. MAGIC, or Memristor Aided Logic, is an approach which uses memory circuits which physically perform computation through write operations to memory. Sequencing these operations is a computationally difficult problem which is directly correlated with the cost of solutions using MAGIC based in-memory computation. SAGA models the execution sequences as a topological sorting problem which makes the optimization well-suited for genetic algorithms. We then detail the formation and implementation of these genetic algorithms and evaluate them over a number of open circuit implementations. Over the 10 benchmark circuits evaluated, the memory-footprint needed for each of these circuits is decreased by up to 52% from existing, greedy-algorithm-based optimization solutions.

### NoC-Armor: Leveraging Quantitative Analysis for Enhanced Security ...... 1001

### Padmaja Bhamidipati, Ranga Vemuri

University of Cincinnati, United States

**Abstract:** In the realm of multi-processing systems, security challenges persist with the increasing interconnectivity of devices, especially within Network-on-Chip (NoC) architectures. Shared hardware resources between secure and malicious IPs create vulnerabilities exploited by attacks to implement side-channel attacks. Side-channel attacks pose challenges due to their exploitation of implicit information leakages, relying on power consumption, electromagnetic radiation, or timing variations. In response, we introduce NoC-Armor: Leveraging Quantitative Analysis for Enhanced Security, which dynamically identifies and counters side-channel attack patterns within NoCs. NoC-Armor employs a quantitative approach, utilizing conditional probability as a metric, to examine secure and side-channel attack events. This method serves as a valuable tool for making routing decisions, ultimately enhancing the security posture of NoC architectures. Our methodology demonstrated a remarkable 50% enhancement in security resilience with minimum power and area overhead.

## Session C1L-D: Cryogenic CMOS Circuits for Quantum Computing Applications

**Chair:** Salgado Gerardo, *Microelectronic Circuits Centre Ireland (MCCI)* **Time:** Wednesday, August 14, 2024, 8:00 - 9:30 **Location:** Room 4

## Cryogenic Alternative: CMOS versus Dynamic-Based Logic ...... 1007

Ali H. Hassan, Puneet Gupta, Sudhakar Pamarti, Chih-Kong Ken Yang

University of California, Los Angeles, United States

**Abstract:** Next-generation data-center computing requires high-performance energy-efficient servers. One counterintuitive approach to reduce energy is to lower the temperature of the processing elements even down to cryogenic temperatures of liquid nitrogen. If the processing load is reduced dramatically, the net energy cost is lower even considering the corresponding cooling cost. Operating at such temperatures will require one to utilize optimized devices and new circuit techniques. One technique is scaling down the threshold voltage of transistors which in turn allows the supply voltage to scale. Another technique is to leverage the reduced leakage and to use dynamic-based logic for processing. This paper utilizes a low-temperature low-threshold (LTLVT) model of a 14-nm FinFET device calibrated with measurements for cryogenic computing. A demonstration of a 16-bit NP-CMOS adder is presented for an energy-delay optimization up to 20X at 77K compared to 330K.

### IceMOS: Cryo-CMOS Python-Based Calibration Tool ...... 1011

Mauricio Montanares<sup>1</sup>, Minda Wen<sup>1</sup>, V.H. Arzate Palma<sup>3</sup>, Kevin G. McCarthy<sup>2</sup>, Gerardo Molina Salgado<sup>1</sup> <sup>1</sup>Microelectronics Circuit Centre Ireland, Ireland; <sup>2</sup>University College Cork, Ireland; <sup>3</sup>CINVESTAV Guadalajara, Mexico

Abstract: This study presents the development of IceMOS, a custom Python-based tool designed to automate the characterization and modeling of a 65nm Process Design Kit (PDK) for cryogenic CMOS applications at 4K. IceMOS streamlines parameter adjustments and simulations, interfacing with Cadence Spectre to improve efficiency and accuracy. Validation at room temperature (298K) showed close alignment with industry-standard tools and lab measurements, confirming IceMOS's reliability. The tool reduces the time and effort required for calibration by automating file generation, a process traditionally prone to errors. We present I-V transfer characteristic curves for NMOS and PMOS devices at 4K, using relative error and Mean Square Error (MSE) metrics to demonstrate precision. While IceMOS does not fully automate the calibration process, it facilitates parameter adjustments without intermediate file generation. This study highlights the limitations of commercial tools in cryogenic contexts and emphasizes the advantages of IceMOS for rapid and accurate cryo-PDK development, enhancing device characterization and parameter adjustment.

<sup>1</sup>Microelectronics Circuit Centre Ireland, Ireland; <sup>2</sup>University College Cork, Ireland; <sup>3</sup>Analog Devices, Inc., Ireland

**Abstract:** This paper presents the design of low-voltage and low-power cryogenic CMOS voltage reference circuits. This cryooptimized circuit uses low-threshold models to compensate for the transistors' threshold increase in ultra-low temperatures. Similarly, the PTAT factor is increased three times from its optimal value as compensation for the current decrease in cryogenic temperatures. A family of reference circuits was implemented in standard 65-nm CMOS. The silicon results show temperature coefficients of 419, 350, and 229 ppm/K in the ultra-wide temperature range from 4 to 295 K, with power consumptions at 4 K of only 5.3 uW, 22.7 uW, and 410 nW, respectively.

 PCB Design Considerations and Temperature Sensing for Cryo-CMOS
 1021

 V.H. Arzate Palma<sup>1</sup>, Minda Wen<sup>2</sup>, Mauricio Montanares<sup>2</sup>, F. Sandoval-Ibarra<sup>1</sup>, Gerardo Molina Salgado<sup>2</sup>
 1021

 <sup>1</sup>CINVESTAV Guadalajara, Mexico; <sup>2</sup>Microelectronics Circuit Centre Ireland, Ireland
 1021

**Abstract:** This work presents the design considerations of printed circuit board (PCB) and temperature sensing for cryogenic CMOS (Cryo-CMOS) applications. The PCB operates as expected at cryogenic temperatures showing that its structural design and selected materials build a useful support for usining it in quantum computing (QC). The PCB structure includes a thermal bridge made of oxy-gen-free high conductivity (OFHC) copper, which has demonstrated high efficiency in thermal transfer. The electrical test of an on-chip poly-silicon resistor is assessed for their possible use as a temperature sensor. For measuring the temperature of the surface of interest a PT100 temperature sensor was used. From experimental results both, the PT100 and poly-silicon resistor, present a residual resistance that was sustracted from the temperature-dependent resistance model in order to know the resistance value at 4 K. For modeling purposes the PT100 sensor was modeled as a series conection of three resistors by following a piecewise linear approach, while the poly-silicon resistor requires more research to be a true temperature sensor.

## A Sub-THz CMOS Transceiver IC and System for Medium-Reach Guided Wave and Swaminathan Sankaran, Gerd Schuppener, Juan Herbsommer, Brad Kramer, Hassan Ali, Robert Pavne, Carole Rush, Baher Haroun, Nirmal Warke

Texas Instruments Inc., United States

Abstract: A sub-THz guided-wave, wireless communication system based on a fully integrated 140-GHz transceiver IC implemented in 65-nm CMOS technology is presented. The transceiver is targeted for medium-reach wireline communications over a plastic dielectric waveguide (DWG) and, short-reach wireless. The antenna configurations for the two use cases have been integrated into the IC's package substrate. Designed and realized mono-mode and multi-mode DWGs are measured to have superior bending losses compared to optical cables, support <3dB/m loss and <40ps group-delay variation in the 140-220GHz WR5 band, indicating potential for robust, low-loss/cost, superior pJ/bit guided sub-THz conduits supporting the fast burgeoning broadband data need. Using BPSK modulation of the 140-GHz carrier, 16-Gb/s with BER of 9.3x10-12 over a 1m distance of DWG and wireless data transfer of 10.8-Gb/s with BER of 1.6x10-11 over 2.5cm distance is demonstrated. For a uni-directional link the transmitter and receiver components consume 520mA from 1.4-V supply.

## Session C1L-E: Hardware Friendly Biomedical Signal Acquisition & Processing Systems

Chair: Hanjun Jiang, University, Beijing Co-Chair: Wei Tang, MSU Time: Wednesday, August 14, 2024, 8:00 - 9:30 Location: Room 5

## 

Jonathan Yun, Wei Tang New Mexico State University, United States

Abstract: This paper presented an automatic threshold adjustment circuit design for selecting the reference voltage levels in predictive level crossing sampling data converters. The automatic adjustment is based on counting the rate of triggered events in a timing window. The threshold voltage is then adaptively adjusted based on the event rate using closed-loop feedback. The threshold voltage increases while the event rate is too high, or vice versa. Such adjustment does not affect the reconstruction process, and the system does not need to store the change or absolute value of threshold voltage changes since the predictive level crossing sampling records the precise digital values at a fixed sampling clock rate. The event rate is adjustable depending on specific applications. Simulation results of ECG sensing are performed to demonstrate the performance of the proposed method. The proposed circuits provide a relatively stable output data throughput, which is friendly for the following data storage and processing devices.

### A Resource-Efficient Lip Detector based on Hybrid Feature Criteria and Predictive

Kaiji Liu, Siqi Zhang, Syed Muhammad Abubakar, Nan Wu, Yanshu Guo, Zhihua Wang, Hanjun Jiang Tsinghua University, China

Abstract: The paper presents the behavioural model and hardware design of a resource efficient lip detection(RELD) unit. A hybrid feature criteria is constructed utilizing the hump-like curve formed by row-summation of lip ROI. And a 1st-order predictive bounding box tracking method is proposed to further reduce computations in continuous image flows. On GRID dataset, RELD model achieves > 95% lip detection accuracy, and the hardware implementation shows it takes a tiny cost of 352-byte SRAM and 14-byte configuration parameters. When operating on 200 x 200 video inputs, it only requires less than 5000 MAC operations per frame to complete inference and tracking task. Thanks to its high accuracy and tiny resource overhead, compared with models based on Viola-Jones/ViT/CNN/RNN, RELD may be more capable of high energy-efficiency lip detection based HCI tasks on mobile terminals.

### Ventricular Arrhythmia Prediction 3-Hours Ahead of Onset for Long-Term ECG Monitoring ...... 1041 Syed Muhammad Abubakar, Kaiji Liu, Zhihua Wang, Hanjun Jiang

### Tsinghua University, China

Abstract: This work proposes a single-lead ECG-based approach for the prediction of VAs up to 3 hours before the onset. It utilizes features based on interval and ECG PQRST waves morphology information to predict VAs using a shallow artificial neural network. This work also provides confidence in the prediction. The target is to design an algorithm with good accuracy and low computation complexity to support hardware friendly wearable ECG signal processing systems.

## **Reverse Electrowetting-on-Dielectric (REWOD) Energy Harvesting Circuit Advancement for**

<sup>1</sup>The University of Texas at Dallas, United States; <sup>2</sup>Utah Tech University, United States

Abstract: Over the past decade, several technologies have been proposed to harvest electrical energy from mechanical energy using activities motioned at lower frequencies. However, the generated energy needs to be modulated into a suitable signal to harvest energy efficiently and use it effectively in an application. This paper presents advancements of active circuitry for the reverse electrowettingon-dielectric, REWOD energy harvester method. The harvested energy is transformed into a DC supply using a designed rectifier and is maintained at the required constant voltage by implementing a DC-DC converter. The REWOD-generated charge, which is typically very low, is interpreted by implementing a charge amplifier improving the signal-to-noise ratio providing a gain of 2 V/V, and transmitted remotely to a digital receiver. Further, to avoid power flickering or disruption, the filter circuit of the rectifier is integrated with a super-capacitor, maintaining a constant power supply for 5 minutes with a power conversion efficiency of more than 80% at 1 Hz. The proposed circuitry is suitable for powering small devices such as wearable body sensors for biomedical applications

### A Direct Current-to-Digital Converter (DCDC) for Advanced Current Measurement in

Saeid Karimpour, Isaac Bruce, Michael Sekvere, Ruohan Yang, Emmanuel Nti Darko, Degang Chen Iowa State University, United States

Abstract: The article discusses a DCDC converter designed to improve current measurement in SoC designs. The converter, built using TSMC's 180 nm technology, boasts impressive energy efficiency, with a conversion energy of 1.52 pJ, power usage of 117  $\mu$ W, and a small size of 0.016 mm<sup>2</sup>. The converter's reliability and precision have been confirmed through extensive simulations under a variety of conditions. This device not only addresses the shortcomings of conventional current measurement techniques, but also lays a scalable groundwork for future progress in semiconductor technologies. Its incorporation into different SoC platforms could broaden its advantages across the industry, enhancing the lifespan and dependability of integrated circuits.

## Session C1L-F: Artificial Intelligence, Internet of Things & Systems 6

Chair: Shiuh-hua Chiang, Brigham Young University Co-Chair: Keith Corzine, UC Santa Cruz Time: Wednesday, August 14, 2024, 8:00 - 9:30 Location: Room 6

### **Exploration of Low-Energy Floating-Point Flash Attention Mechanism for**

18nm FD-SOI CMOS Integration at the Edge ..... 1056 Joaquin Cornejo<sup>1,2,3</sup>, Filipe Pouget<sup>1</sup>, Sylvain Clerc<sup>1</sup>, Tifenn Hirtzlin<sup>2</sup>, Benoit Larras<sup>3</sup>, Andreia Cathelin<sup>1</sup>, Antoine Frappe<sup>3</sup>

<sup>1</sup>STMicroelectronics Crolles, France; <sup>2</sup>Université Grenoble Alpes, CEA-Leti, France; <sup>3</sup>University of Lille, CNRS, Polytechnic University of Hauts-de-France, Centrale Lille and Junia, France

Abstract: In this work we study the sizing and feasibility of a processing element for the calculation of the attention mechanism used in the Transformer models, using floating point representation. We show that the Flash Attention implementation is needed to enable circuit integration at the edge. Different floating point formats, both standard and non standard representations, were explored and synthesized in 18nm FD-SOI CMOS to compare their area and leakage power. The quality of the results is assessed against a python reference. The best representation found is a non-standard 12-bit floating point architectures that computes a single block of Flash Attention. Once compared to a RISCV CPU with equivalent silicon area and synthesized with the same technology, shows 10<sup>4</sup> less energy consumption and 10<sup>3</sup> less latency, whilst providing a result with low mean error.

## Penta-Transistor Integrate and Fire (PTIF) Spiking Neuron with an

Ultra-Low Energy Consumption of 0.045 fJ per Spike ...... 1060 Shelby Williams<sup>1</sup>, Kasem Khalil<sup>2,3</sup>, Magdy Bayoumi<sup>1</sup>

<sup>1</sup>University of Louisiana at Lafavette, United States; <sup>2</sup>University of Mississippi, United States; <sup>3</sup>Assiut University, Egypt

Abstract: This design uses an artificial, silicon-based, neuronal implementation of the widely recognized Integrate & Fire (I&F) spiking model. Its topology exhibits superior structural simplicity, comprising only five equivalently-sized MOSFET transistors. This Penta-Transistor Integrate & Fire (PTIF) model uses a positive-feedback design. Energy efficiency is the key distinguishing hallmark of this PTIF circuit, with an energy consumption of only 0.045 femtojoules per spike. In fact, to the best of our knowledge, the proposed PTIF implementation has the lowest energy usage per spike than any previously reported Integrate & Fire (I&F) neuron design by one magnitude (10x). Linear and quadratic energy-saving performances stem from the exclusion of external capacitors and the utilization of a sub-threshold supply voltage, respectively.

## Intelligent Filtering Tuning In Edge Computing ...... 1065

Nieves G. Hernandez-Gonzalez, Juan Montiel-Caminos, Javier Sosa, Juan A. Montiel-Nelson Universidad de Las Palmas de Gran Canaria, Spain

**Abstract:** An Artificial Intelligence approach to determine the best narrow input filter for a fundamental frequency extractor is presented. The algorithms are implemented on board a water current velocity sensor node. The target sensor node is based on an ARM Cortex-M0+ without DSP and FPU hardware support. The implementation is studied in detail in domains of real and integer variables. The results demonstrate that the proposed ANN-based solution is at least 5.5 and 15 times better than the published FFT solutions when using real variables and integer variables, respectively, for an input bandwidth reduction factor of 7.

## Impact Localization in Inkjet-Printed Tactile Grid Sensor with Echo State Network ...... 1070

## Shahrin Akter, Mohammad Rafiqul Haider

## University of Missouri-Columbia, United States

Abstract: Tactile sensors with impact localization are becoming an essential part in automotive, aerospace, and civil engineering for damage assessment, safety assurance and structural monitoring. Inkjet printing is on rise for its eco-friendliness, cost-efficiency, low power consumption and quick design iteration ability. However, its minimal fabrication process results in operational challenges. These challenges can be mitigated by integrating artificial intelligence with inkjet printed sensors to enhance their performance. Among all artificial intelligences, echo state networks are gaining recognition for their low computational demands and hardware compatibility. This study developed an inkjet-printed tactile grid sensor with an echo state network for impact localization. The sensitivity of sensor was assessed through a pencil drop experiment, with data transformation across time and magnitude domains to improve network adaptability. Hyperparameters of the model were fine-tuned through sequential search. Developed echo state network with grid tactile sensor demonstrated high accuracy, pinpointing the impact location of pencil drops with an impressive precision rate of 94.23%.

# Efficient Hardware Design of Convolutional Neural Networks for Accelerated Deep Learning ...... 1075 Kasem Khalil<sup>1</sup>, Md Rahat Khan<sup>2</sup>, Magdy Bayoumi<sup>2</sup>, Ahmed Sherif<sup>3</sup>

<sup>1</sup>University of Mississippi, United States; <sup>2</sup>University of Louisiana at Lafayette, United States; <sup>3</sup>University of Southern Mississippi, United States

**Abstract:** This paper proposes a novel hardware design for CNNs that demonstrates enhanced efficiency and performance. The proposed method incorporates a shared convolutional unit shared between multiple kernel units. The proposed design leverages a meticulous allocation of hardware resources. The proposed method is Implemented on the Aletra 10 GX FPGA using the VHDL language, the CNN accelerator showcases remarkable results with resource utilization percentages ranging from 2% to 26.35%. Operating at a clock frequency of 120 MHz with an 8-bit fixed design precision, the proposed method achieves superior energy efficiency at 28.41 GOPS/W, outperforming existing approaches. Experimental testing with the CIFAR-10 dataset demonstrates a remarkable accuracy of 98.79%. The proposed method has a lower occupied area than the existing methods. This hardware design not only promises significant advancements in CNN acceleration but also offers a robust solution for applications demanding both resource efficiency and high accuracy.

## Session C2L-A: Analog & Mixed-Signal Circuits & Systems 2

Chair: Mahmoud Ibrahim, *MediaTek* Co-Chair: Yuanming Zhu, *Intel* Time: Wednesday, August 14, 2024, 10:30 - 12:00 Location: Room 1

| A 1-GHz Analog Multiphase PWM using a Single Synchronization Source | 1080 |
|---------------------------------------------------------------------|------|
| Luis Garcia-Magallon, Ivan R. Padilla-Cantoya                       |      |
| Universidad de Guadalajara, Mexico                                  |      |

Abstract: This paper implements several techniques are used to design a generator with all the proposed qualities, generating a pulse width modulation (pwm) taking advantage of the use of a triangular signal from an RC circuit built with the necessary specifications to achieve high linearity, whose theory is supported by mathematical analysis. The use of a single source is applied to generate multiple phases resistant to discrepancies in the face of source variations. The complete system is based on a ring oscillator that is easy to calculate and configure to specific frequencies, taking advantage that fact manufacturing variations will not alter the quadrature property of the signal. The entire circuit has been designed as a mixed signal circuit to take advantage of the benefits that this implies, in a 65 nm technology, managing to operate at high frequencies.

### Efficient Analog Layout Generation for In-RRAM Computing Circuits via Area and Wire Optimization ....... 1085

Bo-Han Li, Kuan-Chih Lin, Hao Zuo, Po-Cheng Pan, Hung-Ming Chen, Shyh-Jye Jou,

Chien-Nan Jimmy Liu, Bo-Cheng Lai

National Yang Ming Chiao Tung University, Taiwan

**Abstract:** his study introduces a pioneering method for the layout generation of analog circuits, specifically designed for RRAM computing circuits using the TSMC 40nm process. By focusing on area and wire optimization, we have managed to reduce the layout area by up to 28.6% and the wirelength by 45.3%, all while maintaining power consumption and accuracy at levels comparable to conventional approaches. The method leverages strategic guard ring placement and precise transistor spacing to optimize the layout efficiently. Our findings highlight the method's capacity to address the challenges in analog layout generation, offering a pathway to enhance memory computing systems. This work contributes to the broader field of computing circuit design, providing insights that could influence future approaches on RRAM compatible physical designs.

### A 4ps Resolution Capacitive-Tuned Delay Pulse Shrinking Time to Digital Converter ...... 1091

Patricia Tutuani, Emmanuel Amankrah, Randall Geiger Iowa State University, United States

**Abstract:** This paper presents a 12-bit Time-to-Digital Converter (TDC) that combines a pulse shrinking ring comprising an even number of inverting delay elements with a pulse-arbiter delay line. This innovative approach leverages the linearity of the pulse shrinking ring and the high-resolution capabilities of the pulse-arbiter delay line, resulting in a TDC with exceptional linearity and accuracy. With a resolution of 4ps, this TDC functions efficiently in a standard 0.18 µm CMOS technology, offering a Full-Scale-Range (FSR) of 36600 ps.

### A Comparative Analysis of Neuromorphic Neuron Circuits for Enhanced Power Efficiency and

## <sup>1</sup>Olin College of Engineering, United States; <sup>2</sup>University of Oklahoma, United States

**Abstract:** Neuromorphic computing aims to replicate the brain's function and relies on the design of circuits inspired by biological processes. The neuron, a fundamental component of these architectures, must be optimized for operation speed and energy efficiency in order to approach the scale and complexity of the brain. Numerous designs have focused on optimizing performance metrics across a range of CMOS technologies, highlighting the necessity for a framework focusing on neuron circuit designs. Addressing this gap, our study presents the implementation of neuron models, inspired by the Leaky-Integrate-and-Fire, Axon-Hillock and Morris-Lecar mechanisms, specifically within the 22nm technology node. With this approach and through parameter sweeping, we provide a more nuanced comparison of power efficiency and spiking frequency among these neurons and insight into further design optimizations. This methodology yielded significant improvements achieving a peak spiking frequency of 80 MHz and minimizing energy consumption to 241 aJ per spike in the Axon-Hillock inspired design, marking the highest efficiency among the models tested.

### A 0.6V, 13nW, 0.0012%/V Line Sensitivity PVT-Invariant Voltage Reference

<sup>1</sup>International Institute of Information Technology Hyderabad, India; <sup>2</sup>Analog Intelligent Design Inc., United States

Abstract: This paper presents a low-voltage, low-power PVT-invariant voltage reference with excellent line sensitivity for IoT and biomedical applications. By applying a bias current(Ibias) in NMOS-based composite pair and bias voltage(Vbias) at the body of NMOS to get the temperature-compensated voltage reference over a wide temperature range. The proposed design implemented in a 180nm CMOS process gives an output of 141mW which is independent of process, voltage, and temperature. Without trimming, the process variation of proposed design is  $1.51\%(\sigma/\mu)$  and the temperature coefficient of the proposed voltage reference is 23ppm/oC over a wide temperature range of -40oC to 100oC. For a supply voltage ranging from 0.6V - 2.1V the line sensitivity of the reference is 0.0012%/V. The simulated results show that the proposed voltage reference could operate on a minimum supply of 0.6V and the power supply rejection ratio at 1-Hz is -85dB. The area occupied by the total circuit is 0.0085mm2, while the total power consumption of the design is 14.2nW at the typical corner of 27oC and 0.6V supply.

## Session C2L-B: Edge Devices

**Chair:** Robert Brennan, *AMI Semiconductors* **Time:** Wednesday, August 14, 2024, 10:30 - 12:00 **Location:** Room 2

### **Comparisons of Metastability Impact in Time-Domain and Asynchronous**

<sup>1</sup>Texas A&M University, United States; <sup>2</sup>Gwangju Institute of Science and Technology, Korea

**Abstract:** This paper develops metastability error models for Time-Domain Analog-to-Digital Converters (TD ADCs) by analyzing the input-to-output delay of the time comparator (TCMP) and deriving the error probability. Additionally, this paper presents an analysis of the impact of metastability error propagation on the ADC-based 112 Gb/s PAM-4 link, with a comparative study of the TD ADC and asynchronous SAR (ASAR) ADC. Simulation results demonstrate that the TD ADC is particularly suited for high-speed and low supply voltage applications.

## What is Peace Engineering? ...... 1111

Ramiro Jordán<sup>1</sup>, Manel Martínez-Ramón<sup>1</sup>, Donna Koechner<sup>1</sup>, Kamil Agi<sup>2</sup>

<sup>1</sup>University of New Mexico, United States; <sup>2</sup>SensorComm Technologies, Inc., United States

Abstract: The Ibero-American Science & Technology Education Consortium (ISTEC) created in 1990 is the outcome of an Engineering Education research effort led by University of New Mexico (UNM) Electrical & Computer Engineering faculty, funded by Motorola, Inc., to catalyze technological, social, cultural, policy and socio-economic development. We envisioned STEM as the common language for global Peace Engineering (PENG) and stress sharing resources via alliances among academia, industry, government and organizations. ISTEC co-organized the International Federation of Engineering Education Societies (IFEES, 2006), the Global Engineering Deans Council (GEDC, 2008) and the first Global Peace Engineering Conference WEEF-GEDC 2018 hosted by UNM Engineering, during which a new definition of Peace Engineering emerged and the Peace Engineering Consortium (PEC) formed. Besides other PENG activities, PENG tracks are in every annual ISTEC and WEEF-GEDC event. With the PEC, UNM-ECE created a PENG Minor in 2020 covering topics including ethics, sustainability, society, JEDAI, design thinking, analytics and AI. We encourage cross-disciplinary collaboration and apply this knowledge to real world situations.

<sup>1</sup>Octavo Systems, United States; <sup>2</sup>Western New England University, United States

**Abstract:** This presentation discusses how advancements in integration and packaging technologies can facilitate the creation of powerful edge node devices for Internet of Things (IoT) applications. These devices integrate analog, digital, and silicon photonics into a single unit, resulting in compact yet powerful computing devices with high energy efficiency. The use of System- in-Package (SiP) or Heterogenous Integration (HI) enables the development of compact products around these optimized devices. This presentation envisions non-digital technologies like sensors, power management, silicon photonics, and radio frequency components, envisioning a future where such devices become more prevalent, accessible, and innovative. The technology and design innovations aim to make these advancements accessible to all, fostering the continued development of smaller, cost-effective, and innovative products.

### GIM (Ghost in the Machine): A DSP-Inspired Accelerator Platform for

Abstract: Machine-learning (ML) algorithms are finding wide adoption across a rich spectrum of application domains with diverse requirements in terms of performance, power, and cost. Complete ML systems often have a DSP front end, extracting features for the inference engine. Early work in ML drew inspiration from now common DSP algorithms like adaptive filters. Bringing a signal processing focus to an ML accelerator can have benefits from exploring and reasoning about architectural techniques like pipelining, to utilizing new coarse-grained FPGAs containing hundreds and thousands of DSP slices with dedicated local storage. These new coarse-grained architectures allow us to achieve ASIC-like clock rate and reductions in power while rapidly exploring novel and common ML architectures.

Kaouther Selmi<sup>1</sup>, Kais Bouallegue<sup>2</sup>

<sup>1</sup>University of Monastir, Tunisia; <sup>2</sup>Université de Sousse, Tunisia

**Abstract:** This study investigates the remarkable convergence between differential equations and electrical circuits in modeling chaotic systems. By exploring the dynamics of chaotic phenomena through both mathematical equations and electronic circuits, we demonstrate a significant correspondence between these two approaches. This observation highlights the interdisciplinary nature of science, where concepts from applied mathematics seamlessly integrate with practical electronic engineering. We discuss the implications of this finding for both theoretical understanding and practical applications, emphasizing potential advancements in system design, signal processing, and cryptography. This study sheds light on the potential synthesis of complex systems and underscores the importance of interdisciplinary collaboration in scientific research.

## Session C2L-C: Circuit Techniques for Encryption Methods

Chair: Shrivastava Aatmesh, *Northeastern University* Co-Chair: Sirakoulis Georgios, *Democritus University* Time: Wednesday, August 14, 2024, 10:30 - 12:00 Location: Room 3

### T-Scope: Side-Channel Leakage Assessment with a Hardware-Accelerated Online TVLA Test ...... 1130

Hao Wang, Andrew Malnicof, Patrick Schaumont Worcester Polytechnic Institute, United States

**Abstract:** Test vector leakage assessment (TVLA) is a generic and commonly-used approach to assessing a device's side-channel vulnerability based on measuring a large number power traces. However, the current use model of TVLA uses batch processing which separates trace acquisition from statistical analysis, making the method harder to use in online, continuous-monitoring scenarios. We propose T-scope, a TVLA optimized for online testing of real-time targets. We use a pipelined solution to efficiently store power traces in compact histogram structures, and then use FPGA-based hardware acceleration of Welch's t-test to compute the TVLA. By continuously updating the histograms with newly acquired power traces, T-scope visualizes real-time changes in the side-channel leakage characteristics of the target. Our FPGA-based hardware-accelerator, created using high-level synthesis, offers a 99.93X performance gain over a software-based solution. We present a hardware demonstrator of a real-time display of the TVLA assessment of a leaky implementation of the Advanced Encryption Standard.

### A Hybrid Random Number Generator based on MetaStability-Ring Oscillator

<sup>1</sup>University of Louisiana at Lafayette, United States; <sup>2</sup>University of Mississippi, United States; <sup>3</sup>Assiut University, Egypt

**Abstract:** This proposed work represents a significant initiative aimed at fundamentally enhancing the entropy of circuits containing LFSRs. The framework undergoes synthesis on the Altera Cyclone V FPGA, successfully passing all the National Institute and Standards Technology (NIST) SP800- 22 test suites for randomness. By leveraging the concepts of metastability, jitter, and phase noise, the proposed design achieves true randomness. Additionally, at technology node 14nm, this design is synthesized using Synopsys Design Compiler (SDC) and achieves high speed, minimal silicon area, and low energy consumption.

**CLAPPER: Clonable LFSR-Based Asymmetric PUF-Group with Peer-to-Peer Equivalent Response ...... 1140** Zhenzhe Chen, Kunyang Liu, Hirofumi Shinohara, Takashi Sato

Kyoto University, Japan

Abstract: Traditional physically unclonable function (PUF)-based authentication models that use challenge-response pairs (CRPs) suffer from high storage overhead and an inability to perform n-party authentication. We propose a new clonable PUF (CPUF) for decentralized authentication, wherein both master and slave PUFs share the same dynamic linear feedback shift register (LFSR) value. In this work, only the master PUF possesses an entropy source determined through the variability induced during fabrication, which serves as the seed of the LFSR. Cloning can be achieved at any time and the value of the master PUF is synchronized with the slave PUF over multiple periods via a unidirectional port. Our CPUF model demonstrates that the response exhibits excellent statistical performance, confirming its suitability for secure and scalable authentication involving multiple parties.

## Accelerating CKKS Homomorphic Encryption with Data Compression on GPUs ...... 1145

Quoc Bao Phan, Linh Nguyen, Tuy Tan Nguyen Northern Arizona University, United States

**Abstract:** Homomorphic encryption (HE) algorithms, particularly the Cheon-Kim-Kim-Song (CKKS) scheme, offer significant potential for secure computation on encrypted data, making them valuable for privacy-preserving machine learning. However, high latency in large integer operations in the CKKS algorithm hinders the processing of large datasets and complex computations. This paper proposes a novel strategy that combines lossless data compression techniques with the parallel processing power of graphics processing units (GPUs) to address these challenges. Our approach demonstrably reduces data size by 90% and achieves significant speedups of up to 100 times compared to conventional approaches. This method ensures data confidentiality while mitigating performance bottlenecks in CKKS-based computations, paving the way for more efficient and scalable HE applications.

## Session C2L-D: Trends in Quantum Computing & Photonics

**Chair:** Steve Adamshick, *Western New England University* **Co-Chair:** John Burke, *Western New England University* **Time:** Wednesday, August 14, 2024, 10:30 - 12:00 **Location:** Room 4

Designing a Quantum-Dot Cellular Automata-Based Half-Adder Circuit using

 Partially Reversible Majority Gates
 1150

 Mohammed Alharbi<sup>1</sup>, Gerard Edwards<sup>1</sup>, Richard Stocker<sup>2</sup>

<sup>1</sup>Liverpool John Moores University, United Kingdom; <sup>2</sup>University of Chester, United Kingdom

**Abstract:** In this study, an innovative, partially reversible design method is presented to address the latency and circuit cost limitations of reversible design methods. The proposed partially reversible design method serves as a middle ground between fully reversible and conventional irreversible design methodologies. Compared with irreversible design methods, the partially reversible design method still optimises energy efficiency. Moreover, the partially reversible design method improves the speed and decreases the circuit cost in comparison with fully reversible design techniques. The key ingredient of the proposed partially reversible design methodology is the introduction of a partially reversible majority gate element building block. To validate the effectiveness of the proposed partially reversible design approach, a novel partially reversible half-adder circuit is designed and simulated using the QCADesigner-E 2.2 simulation tool.

### 

<sup>1</sup>Waseda University, Japan; <sup>2</sup>Green Comuting Systems Research Organization, Tokyo, Japan

**Abstract:** Most quantum annealers, or Ising machines have the hardware limitations in its qubit size and thus the computable problem size is much limited. This paper proposes a novel classical-Ising hybrid annealing method which virtually extends the computable size of Ising machines based on a theoretical analysis. Given a quadratic unconstrained binary optimization (QUBO) model, the proposed method selects the cutting binary variable that is most likely to have the same value as the ground-state solution. Then, we recursively cut the QUBO model based on the cutting binary variable until we obtain sufficiently small-sized subQUBO models. Experimental evaluations demonstrate that the proposed hybrid annealing method can give much better quasi-ground-state solutions than state-of-the-art existing methods for large-sized QUBO models.

DSP-Free Carrier Phase Recovery System for Laser-Forwarded Offset-QAM Coherent Optical Receivers ....... 1158

Marziyeh Rezaei, Daniel Sturm, Pengyu Zeng, Sajjad Moazeni University of Washington, United States

**Abstract:** Co-packaged optics (CPO) is becoming an imminent solution for future off-package I/O bandwidth scalability challenges. Proposed CPO solutions have mainly focused on pulse amplitude modulations (PAM), while coherent modulation can further increase the data rates. However, optical coherent transceivers are power-hungry and area inefficient. Here, in this work, we propose a new DSP-free carrier phase recovery system for offset-QAM modulation based on a laser-forwarding technique. Offset-QAM can be realized with lower area/power optical modulators. The proposed approach is simulated and validated using Global Foundry monolithic 45nm silicon photonics PDK models with the circuit/system-level implementation at 25GBaud offset-QAM-4. This technique can be extended to higher-order modulations such as QAM-16 as well.

### **Ultra Compact Nanoplasmonic Dual-Band Filters with Tunable Silica Stubs for Nanoscale Networks ...... 1162** Kola Thirupathaiah<sup>1</sup>, Montasir Qasymeh<sup>2</sup>

<sup>1</sup>Koneru Lakshmaiah Education Foundation Hyderabad, India; <sup>2</sup>Abu Dhabi University, U.A.E.

**Abstract:** This article presents the design and analysis of a dual-band pass filter (BPF) with a single plasmonic circular ring resonator (CRR) using a metal-insulator-metal (MIM) guiding structure. The most distinctive feature of the designed resonator is that the first and second order degenerate modes operate concurrently at two simultaneous pass bands at corresponding wavelengths of 1300 nm and 1600 nm, respectively. Firstly, the basic characteristics of the MIM waveguide are studied and analyzed using full-wave analysis. Secondly, to design the filter structure, four and eight open-end silica (SiO2) stubs are added symmetrically to the single CRR with separations of 1350 and 450. This provides synchronous excitation of two transmission peaks in the second pass band, which is currently a desirable feature in many applications. The second stub, with 450 separations, exhibits much better performance at frequencies lower than the first pass-band and in between the two pass bands. Hence, this filter will be very useful in silicon photonic integrated circuits (SPICs) where a dual pass-band is required.

### Developing AI-Driven Systems for Automated Visual Inspection with MATLAB

## Ramnarayan Krishnamurthy

### MathWorks, United States

**Abstract:** Defect detection and quality control in Vision-based Inspection systems is crucial to various domains such as Automotive, Aerospace, Semiconductor and Electrical Power Industries. However, majority of these workflows are currently manual and can be challenging due to the scale and speed required. AI techniques like Deep learning helps in effectively automating and scaling these processes. MATLAB provides robust AI pipelines to prepare data, develop neural networks and deploy to various hardware platforms.

## Session C2L-E: Circuits & Design Methods for Biomedical Systems & Applications

Chair: Soner Sonmezoglu, Northeastern University Co-Chair: Junning Jiang, Nvidia Time: Wednesday, August 14, 2024, 10:30 - 12:00 Location: Room 5

### Invited: Investigation of Motion Artifacts for Luminescence-Based

 Transcutaneous Oxygen Wearable Device
 1167

 Olivia Kendzulak, Naisargi Mehta, Ryan McSweeney, Parisa Saadatmand Hashemi, Ulkuhan Guler
 107

 Worcester Polytechnic Institute, United States
 1167

**Abstract:** Respiratory diseases are a leading global cause of mortality, emphasizing the importance of accurate oxygen level monitoring. The COVID-19 pandemic has highlighted the need for miniaturized respiratory monitoring technologies, especially in remote care settings. This study investigates motion artifacts in a novel wearable sensor designed for transcutaneous oxygen monitoring, a marker of pulmonary disease progression. Utilizing a luminescent film, the sensor's performance was evaluated under various motion conditions. Test data showed minimal accuracy impact, with discrepancies not exceeding 0.8%. These findings demonstrate the potential of wearable oxygen sensors for reliable continuous monitoring in dynamic environments, with opportunities for further minimizing motion artifact influence.

## A 0.6-V 108-nW 100-kHz Sub-Threshold Delay-Locked Loop with Digital Linearization for

### <sup>1</sup>Toronto Metropolitan University, Canada; <sup>2</sup>Lakehead University, Canada

Abstract: This paper investigates the instability of low-power low-frequency charge-pump DLL in sub-threshold for low-power SAR ADC. We show the charge leakage of the loop-filter capacitor during the idle state of the PD causes the control voltage of the DLL to drift. Although the amount of voltage drift is insignificant, the exponential relation between the delay and control voltage of the delay stage operating in sub-threshold results in a significant change in the delay of the DLL, causing the DLL to oscillate. To combat this, we propose a counter-based sub-threshold DLL with digital linearization. The delay stages of the DLL are linearized using a DAC with its bit weights set as per the characteristics of the delay stage such that the relation between the delay and control voltage of the compensated delay stage is linear. Designed in a TSMC 130 nm 1.2 V CMOS technology with a reduced supply voltage of 0.6 V, the DLL locks to a 100 kHz 50% duty-cycle external reference at all process/temperature corners and consumes 108.6 nW. The proposed DLL offers the distinct characteristics of intrinsic stability, ultra-low power consumption, and excellent compatibility with technology.

<sup>1</sup>Technical University of Munich, Germany; <sup>2</sup>Si-Vision LLC, Egypt; <sup>3</sup>Ain Shams University, Egypt

Abstract: Electrochemical impedance spectroscopy (EIS) is operated by applying a stimulus voltage to an electrochemical cell. Then, the sensed signal is analyzed by an analog front-end (AFE) chain, followed by an analog-to-digital converter (ADC). Therefore, a bulky analog filter is required after the excitation mechanism to achieve a high-resolution AC stimulus voltage. This paper proposes an interrelated analog/digital EIS system to eliminate the need for this bulky analog filter, which is replaced by the already existing ADC digital filter. A direct digital synthesizer (DDS) is implemented to generate the AC stimulus voltage over a wide frequency band, from 10 Hz to 100 kHz. In addition, an ultra low-noise AFE is introduced in two modes of operation to relax the ADC design. These modes assure an optimized integrated system with the former DDS, which has a total harmonic distortion of less than 0.02%. Moreover, the AFE resolution reaches 0.48 pA over a bandwidth of 2.5 kHz. The attained resolution complies with the sensitivity requirements of EIS applications. Finally, the entire EIS system is validated using a combination of MATLAB modeling and spice simulations in 180nm CMOS technology.

### A High-Efficiency Capacitor-Less LDO with Adaptive Dynamic Range Extension for

Biosensing Applications ...... 1181

Will Wright, Tejasvi Das

Rochester Institute of Technology, United States

Abstract: This paper introduces an on-chip capacitor-less Low Dropout Regulator (LDO) with dynamic range extension that targets low-power biosensing applications. A current load Dynamic Range (DR) of 1850x is achieved through a 4uA-7.4mA implementation while also maintaining high current efficiency over all current load conditions. The improvement in DR due to the adaptive extender presented in this paper is 6.17x greater than the LDO without the extension circuit. This is made possible through very low-overhead circuitry that adaptively senses the load current and dynamically modifies current in the LDO buffer stage to extend operating head-room of the LDO. The LDO also achieves high Power Supply Rejection Ration (PSRR) across the varying current range while remaining stable across the 1850x load variation. It exhibits superior current efficiency and a very low total quiescent current consumption of 438nA at the low-load condition. The additional current draw of the dynamic sensing circuitry kicks in at higher load current conditions and has a very minimal impact on overall efficiency (>99% at full-load).

### Michigan State University, United States

**Abstract:** Electrochemical measurements play a crucial role across various domains including air quality assessment, biological analysis, and the food industry. Miniaturized and power-efficient electrochemical potentiostats, facilitated by integrated circuits, have been instrumental in enabling wearable devices. However, the utilization of modern CMOS technologies with low supply voltage limits the applicability of electrochemical reactions requiring higher potential windows. This paper introduces an innovative circuit architecture that extends the cell voltage range by 46% for positive voltages and 88% for negative voltages compared to conventional designs. Consequently, this advancement widens the spectrum of supported bias voltages within an electrochemical cell, thereby expanding the scope of integrated potentiostat to encompass a broader array of electrochemical reactions. Implemented in CMOS 180 nm technology, the circuit consumes 2.047 mW of power. It supports a bias potential range from 1.1 V to -2.12 V and a cell potential range from 2.41 V to -3.11 V.

## Session C2L-F: Artificial Intelligence, Internet of Things & Systems 7

**Chair:** Ruolin Zhou, *University of Massachusetts Dartmouth* **Time:** Wednesday, August 14, 2024, 10:30 - 12:00 **Location:** Room 6

## Compact Convolutional SNN Architecture for the Neuromorphic Speech Denoising ...... 1191

### Anuar Dorzhigulov, Vishal Saxena University of Delaware, United States

Abstract: This paper introduces a novel approach to speech denoising within the framework of spiking neural networks (SNNs), specifically focusing on achieving a high signal-to-noise ratio (SNR) while minimizing power consumption and network size. The proposed SNN architecture leverages spatio-temporal dynamics by incorporating 3D Convolutional layers, a departure from the conventional 1D layers used in Temporal Convolutional Neural Networks (TCNNs). Additionally, the architecture integrates a Short-Time Fourier Transform (STFT) encoder and an Inverse STFT (ISTFT) decoder in conjunction with Sigma-Delta Spiking Neurons. This fusion of elements results in a compact SNN model that achieves commendable SNR performance while maintaining efficiency in terms of energy consumption and network size.

## A High Accuracy CNN for Breast Cancer Detection using Mammography Images ...... 1196

## Gunjan Jha, Anshul Jha, Eugene John

The University of Texas at San Antonio, United States

**Abstract:** Breast Cancer (BC) is one of the most prevalent cancers and second leading cause of mortality among women. Several radiographic imaging techniques, such as mammograms, computed tomography (CT), magnetic resonance imaging (MRI), histopathological imaging (HI), etc. have made it viable to diagnose BC at an early stage. Deep learning (DL) has emerged as an aid to radiologists and pathologists in the detection and prognosis of BC, handling large amount of radiographic and histopathological images efficiently and accurately. The primary motive of this research is to develop a high accuracy convolutional neural network (CNN) to detect and classify BC using mammography images from publicly available datasets. This convolution neural network classifier model is used to distinguish the malignant and benign cells in breast images. The performance of the CNN model is measured in terms of accuracy, F-1 score, precision, recall and confusion matrix. The proposed CNN model achieved an inference accuracy of 99.18%.

### Advanced SEU and MBU Vulnerability Assessment of Deep Neural Networks in

Abstract: Deep neural networks (DNNs) are being widely used to solve real-world challenges. Their algorithms can be impacted by environmental and cyber threats and the difficulty in giving formal guarantees regarding DNN behavior under these attacks is a fundamental challenge for employing them in safety-critical systems. In this work, we analyze the impact of single-event upsets (SEUs) and multiple-bit upsets (MBUs) on DNNs. The verification procedure is intended to ensure that a network of interest complies with safety requirements and is dependable in critical conditions. We thus investigate the application of SAT-based analytic methodology to obtain insight about the behavior and vulnerabilities of DNNs in safety-critical applications. The vertical collision avoidance system (VCAS) is used as a benchmark to illustrate our methodology. Our experimental findings show that the resilience of neural networks can be different depending on the number of weight changes that occur.

### LogicNets vs. ULEEN : Comparing Two Novel High Throughput Edge ML Inference

<sup>1</sup>The University of Texas at Austin, United States; <sup>2</sup>Federal University of Rio de Janeiro, Brazil;

<sup>3</sup>Federal University of Recôncavo da Bahia, Brazil; <sup>4</sup>ISCTE - Instituto Universitario de Lisboa, Portugal;

### <sup>5</sup>Instituto de Telecomunicações, Portugal;

Abstract: With the advent of IoT and edge computing devices, there has been an increased demand for low power and highthroughput ML inference on the edge. However, the trends of increasing model sizes with numerous computations involved makes it increasingly difficult to deploy state-of-the-art models on these devices. Of late, there has been a renewed interest in lookup table (LUT)-based ML models that replace typical weighted-addition operations in artificial neurons with lookup operations. These are well suited for edge FPGAs, both due to their underlying architecture, as well as their potential for low energy consumption. LogicNets and ULEEN are two such LUT-based model architectures, that have claimed to offer high throughput and low energy inferences. These two architectures are extensions of contrasting ideas of DNNs and Weightless Neural Networks, and it is difficult to infer a suitable choice among these. We compare these, and evaluate them on some high- throughput inference use cases. Our results suggest that ULEEN outperforms LogicNets on hardware and energy requirements making it well suited for edge deployment, albeit at a slight drop in accuracy for some datasets.

## Session C3L-A: Data Converters & Other Mixed-Signal Circuits & Systems

Chair: Shiuh-hua Chiang, *Brigham Young University* Co-Chair: Yuanming Zhu, *Intel* Time: Wednesday, August 14, 2024, 14:00 - 15:30 Location: Room 1

<sup>1</sup>University at Buffalo, United States; <sup>2</sup>Brigham Young University, United States; <sup>3</sup>University of Tokyo, Japan

**Abstract:** Spiking Neural Networks (SNNs) are a class of neural networks that mimic biological neurons, and are more energyefficient than conventional neural networks. The neuron model is the building block of an SNN, and many efficient hardware implementations of the neuron model have been proposed. This work aims at bridging the gap between the design of the neuron model and training an SNNs with the neuron model. The work focuses on a ring oscillator-based neuron model, which realizes the leaky integrate-and-fire (LIF) neuron. The design of the ring oscillator-based neuron is discussed, and the neuron model is digitized using the bilinear transform to enable training. The trained network is used to classify the MNIST dataset with an accuracy of 97.35%. The circuit parameters used for training the network are discussed, which can be used to build the circuit of the ring oscillator-based neuron.

### 

<sup>1</sup>Brigham Young University, United States; <sup>2</sup>Apple Inc., United States; <sup>3</sup>National Yang Ming Chiao Tung University, United States

**Abstract:** Ultra-low-supply-voltage (ULV) analog-to-digital converters (ADCs) operating at around 0.2 V or lower are attractive for internet-of-things (IoT) and embedded applications due to their extremely low power consumption. This paper reviews state-of-the-art ULV ADCs to provide insight on the current trends and design strategies at such low supply voltages. The paper discusses the various architectures, topologies, and calibration techniques used and their trade-offs. Based on the observations, the paper provides recommendations to the circuit designer to help make informed design choices to achieve the desired performance for ULV ADCs.

### A Compact, Highly Scalable, Low-Power, and Multi-Channel Charge Digitizer for

 Beam Profile Monitoring
 1222

 Trevor Reay<sup>1</sup>, Michelle Pyle<sup>2</sup>, John Carmen<sup>1</sup>, Taylor Barton<sup>1</sup>, Nina R. Weisse-Bernstein<sup>2</sup>, John Smedley<sup>3</sup>,
 122

 Erik M. Muller<sup>4</sup>, Ryan Tappero<sup>4</sup>, Richard L. Sandberg<sup>1</sup>, Shiuh-Hua Wood Chiang<sup>1</sup>
 18

 <sup>1</sup>Brigham Young University, United States; <sup>2</sup>Los Alamos National Laboratory, United States; <sup>3</sup>SLAC National Accelerator Laboratory, United States; <sup>4</sup>Brookhaven National Laboratory, United States

**Abstract:** This paper describes a compact, highly scalable, low-power, and multi-channel charge digitizer (MCCD) designed for synchrotron beam profile monitoring. The MCCD utilizes charge amplifiers, voltage amplifiers, programmable-gain ampli- fiers (PGAs), and ADCs to amplify and digitize input signals. The MCCD also demonstrates a novel stackable PCB design to easily reconfigure the channel count. Laboratory measurement results show a sample rate of 200 Hz per channel, gain of  $6.64 \times 1011$ V/C, noise of  $1.36 \times 105e$ – rms, and power of 97 mW/channel.

## Wideband Low-Distortion Noise-Coupled Delta-Sigma ADC ...... 1226

Hanyu Wang, Faraz Adin, Un-Ku Moon, Gabor C. Temes

Oregon State University, United States

**Abstract:** This paper introduced a novel noise-coupled delta-sigma structure. The approach was based on a modified popular SMASH configuration. It promises better performance and reduced complexity. By directly feeding the input signal to the quantizer, wideband and low-distortion capabilities are attained. The matching requirements for the Gm-R buffer introduced in the structure were thoroughly investigated. A simple buffer can be efficiently shared by existing active NSSAR based quantizers. Finally, a prototype second-order wideband low-distortion NC delta-sigma ADC was designed and simulated to validate the proposed approach.

A Rail-to-Rail Low-Power Dynamic CMOS Amplifier for Switched-Capacitor

Antoine Verreault<sup>1</sup>, Paul-Vahé Cicek<sup>2</sup>, Alexandre Robichaud<sup>1</sup>

<sup>1</sup>Université du Québec à Chicoutimi, Canada; <sup>2</sup>Université du Québec à Montréal, Canada

Abstract: This paper presents a novel rail-to-rail low power dynamic CMOS amplifier optimized for discrete-time filtering in analogto-digital converters (ADC). The proposed architecture incorporates a switched resistor-capacitor (RC) parallel compensation technique, which is strategically activated in the sampling phase to minimize the amplifier's current consumption. Dynamic biasing of the amplifier further improves energy efficiency by slashing high slewing currents once linear settling takes over. Simulation results in 65 nm CMOS exhibit 3.2 times reduction in power consumption with respect to a comparable static amplifier. Accurate integration waveforms are achieved while consuming a mere 239  $\mu$ W for 6  $\tau$  settling under 4.9 ns.

## Session C3L-B: Late Breaking News - Analog

Chair: Kourosh Rahnamai, *WNE* Time: Wednesday, August 14, 2024, 14:00 - 15:30 Location: Room 2

### Design and Evaluation of Active High Pass Filter Topologies for Analog Baseband ...... 1235

Deeksha, Abhishek Srivastava

International Institute of Information Technology Hyderabad, India

**Abstract:** This paper introduces an innovative low-power, low-noise, active high pass filter (HPF) with cross-coupled, diodeconnected load. The filter is designed to function at a cutoff frequency of 300 kHz, aimed at eliminating low-frequency interference, such as spillover, in wireless communication systems. Employing TSMC 65 nm CMOS technology, the proposed filter demonstrates notable performance in simulation. It achieves a robust pass band gain of 27 dB, exhibits good linearity with an IIP3 of 4.475 dBm and maintains minimal noise with an integrated noise level of  $3.39 \times 10-7$  V2 over 10MHz bandwidth. Notably, the filter occupies a compact footprint of  $83\mu m \times 82 \mu m$ , indicating its suitability for integration into small-scale circuit designs. Moreover, it showcases significant energy efficiency, consuming only 12.97  $\mu$ W while operating from a 1 V power supply.

## Analog Re-Use in Mixed Signal Circuits via Analog Time Encoding in Shared Die Tapeouts ...... 1240

Steven Bibyk, Andrew Noonan, Soumobrata Ghosh, Chakka Manning, Sudipto Sarkar

The Ohio State University, United States

Abstract: Very recent developments in microelectronics workforce development have enabled many more designers to explore mixed signal integrated circuit design from concept to fabricated validation and test at dramatically reduced cost and effort, and in a shared community open source ecosystem. This ecosystem capability to do many shared tapeout spins reduces innovation risks of new mixed signal design approaches such as analog time encoding, which are difficult to develop using only simulation verification. Analog time encoding using predominantly digital cell libraries increases analog re-use, synthesis and programmability via digitally intensive analog circuits. A description of an approach using circuit blocks from phase lock loops and quantization noise shaping is given, and the considerations of revising analog layout matching techniques to improve performance for these circuit blocks is overviewed.

### A 10 MHz to 3.2 GHz Differential Current Starved Inverter-Based Self-Biased

<sup>1</sup>Sejong University, Korea; <sup>2</sup>University of California, Santa Barbara, United States

**Abstract:** Adaptive bandwidth PLLs offer optimized noise and jitter performance but require active feedback compensation. We present a differential inverter-based self-biased PLL with active feedforward zero placement. Our approach is simple to implement in scaled CMOS processes and achieves a FoM of -222 dB with an integrated jitter of 4.1 ps in a 100 MHz bandwidth while consuming only 3.72 mW for an output frequency of 1.92 GHz in 65nm LP CMOS. The PLL has a wide frequency range of 10MHz – 3.2 GHz. Furthermore, the physical design was done using place and route tools, unlike classical analog PLLs.

### 

**Abstract:** Receiver front-ends operating across the wide frequency ranges demand high-Q bandpass filters to protect them from nearband interference. Higher-order filters offer flatter in-band gain and high roll-off, which is desired for modern wireless radios. N-path filters offer high linearity, on-chip integration, and inherent tunability. This work presents a low loss Quasi-Elliptic Higher-Order Npath Filter operating from 3–13 GHz. The filter utilizes the frequency translation of an N-path filter to upconvert the response of a passive LC-based elliptic lowpass filter to realize a higher-order bandpass filter. Two 4-path filters have been interleaved using isolating inductors to obtain better harmonic conversion loss without requiring faster clocks. The prototype fabricated in 22nm FDSOI CMOS technology obtains a 3 dB bandwidth of 1.06 GHz at 10 GHz center frequency with a stopband rejection of -20 dB at a frequency offset of 0.89 GHz from the center frequency. The filter has an IP1dB and IIP3 of -1.67 dBm, 9.51 dBm respectively at 10 GHz center frequency and consumes 25.2 mW power at 0.9V supply voltage with an area of 8.6 mm2.

### The University of Texas at Dallas, United States

Abstract: The multi-finger architecture has been widely explored in submicron CMOS circuits because it caters to improved device performance compared to its mono-finger counterparts. However, setting appropriate finger variables, including finger width and number of fingers, to boost the circuit's performance is a cumbersome task. The objective of this work is to develop a finger-regulated simultaneous noise and input matching (FRSNIM) methodology for microwave front ends. It is driven by the concept of impedance engineering of source referred noise impedances (Zopt) to realize a theoretical optimization exploiting Keysight's Advanced Design System (ADS) platform and the 180 nm standard CMOS process. Exploiting the proposed FRSNIM scheme, a low noise amplifier (LNA) is devised, which furnishes a 16.7 dB forward gain and 1.4 dB noise figure at 2.4 GHz while consuming only 6.6 mW of power from a 1.8 V supply voltage. The input referred third-order intercept point (IIP3), defining the linearity of the LNA is -7dBm. It occupies an area of  $470 \times 337 \mu m2$ .

## Session C3L-C: Enhancing Hardware Security with Advanced Technologies

Chair: Shrivastava Aatmesh, *Northeastern University* Co-Chair: Sirakoulis Georgios, *Democritus University* Time: Wednesday, August 14, 2024, 14:00 - 15:30 Location: Room 3

### Design of an Energy/Area-Aware MTJ-Based Nonvolatile Register with a Reference-Load Sharing Scheme ... 1257

Tomoo Yoshida, Masanori Natsui, Takahiro Hanyu Tohoku University, Japan

**Abstract:** This paper describes an MTJ-based nonvolatile register configuration for persistent arithmetic operations in intermittent computing. In the proposed nonvolatile register, one bit of information is retained with the resistance value of one MTJ device and the reference MTJ device commonly shared for each bit. In addition to reducing the number of MTJ devices and the write energy required to save data, the simplification of the write current control circuit also reduces the circuit area. The performance evaluation using 55nm CMOS/MTJ hybrid process technology shows that the proposed nonvolatile register can reduce the energy required for data saving by up to 49%, as well as the circuit area and the number of MTJ devices by 39% and 25%, respectively, compared to the conventional configuration.

A Continuous Model for Ferroelectric-FETs for Computing in Memory ...... N/A

Jinwei Lin<sup>1</sup>, Wenjia Xu<sup>1,2</sup>, Yue Ma<sup>1,3</sup>, Sheng Zhang<sup>1</sup>

<sup>1</sup>Tsinghua University, China; <sup>2</sup>Greater Bay Area National Center of Technology Innovation, China; <sup>3</sup>Peng Cheng Laboratory, China

Abstract: In this work, we propose a simulation model for ferroelectric field-effect transistors (FeFETs) tailored for the design of inmemory computing circuits. For the design and simulation of in-memory computing circuits, it is crucial to consider the macroscopic behavior of FeFETs. Our proposed model not only conforms to the actual performance of FeFETs but also employs an asymptotic model to constrain the upper and lower limits of polarization in FeFETs. Furthermore, we have fitted a nonlinear function to represent the dynamic polarization change rates during different polarization tran sition processes. This enables our model to simulate FeFET circuits capable of continuous writing, erasing, and reading, as well as to assess the stability performance of FeFETs in in-memory computing arrays. Concurrently, we have validated a 2x2 FeFET in-memory computing array.By employing the proposed simulation model, we conducted circuit analysis and derived the correlation between the write voltage, write success rate, and crosstalk.

### Dynamic Security Management of Systems on Chip via Embedded FPGA in 22nm CMOS Technology ...... 1266

Elsayed Elgendy<sup>1</sup>, Ahmed Zaky Ghonem<sup>1</sup>, Sherif Abouzeid<sup>1</sup>, Islam Elsadek<sup>1</sup>, Liao Kevin<sup>2</sup>,

Yen Vincent<sup>2</sup>, K.C Yap<sup>2</sup>, Brian Faith<sup>2</sup>, Eslam Yahya Tawfik<sup>1</sup>

<sup>1</sup>The Ohio State University, United States; <sup>2</sup>QuickLogic Corporation, United States

**Abstract:** There has been a rising demand for more efficient and secure processing on the edge to cope with the number of emerging Internet of Things (IoT) applications. These applications scale from low-duty cycles high-performance tasks to high-duty cycle ultralow power operations. Embedded FPGAs (eFPGAs) can address many of these demands given their flexibility compared to ASIC, low latency, and lower energy consumption compared to CPUs. eFPGAs allow for post-silicon manufacturing patch-ability, adaptive performance, and on-chip control of large SoCs. This paper presents the implementation of eFPGA tailored for secure, efficient, and flexible System-On- Chip (SoC) management. The eFPGA acts as the heart of a Security Management Unit (SMU), which is part of a larger secure SoC platform that combines a True Random Number Generator (TRNG), an AES engine, and a NIST-compliant Lightweight cryptographic engine (LWC). Fabricated on 22nm CMOS technology, the eFPGA, was used to dynamically manage the security policies of different IPs in the SoC under a proposed threat model. We report these applications' performance, showing the embedded fabric's benefits.

| Advanced Experimental Results Verifying that Inherent Adaptive Fault Tolerance |      |
|--------------------------------------------------------------------------------|------|
| Exists in the Bio-Inspired Levy Flight Firefly Algorithm                       | 1271 |
| W.K. Jenkins <sup>1</sup> , C. Radhakrishnan <sup>2</sup>                      |      |

<sup>1</sup>*The Pennsylvania State University, United States;* <sup>2</sup>*University of Illinois Urbana-Champaign, United States* 

**Abstract:** The Bio-inspired Lévy Flight Firefly Algorithm (LFFA) can be effectively applied to IIR adaptive filters designed with conventional second order structures, coupled form structures, and lattice-ladder structures. Adaptive Fault Tolerant (AFT) digital filters take advantage of non-canonical architectures that use the inherent adaptive process as an automatic fault recovery mechanism. Since the LFFA is based on particle swarm optimization, the swarm provides many additional adaptive parameters. This paper presents some advanced experimental results to further confirm that the LFFA provides inherent adaptive fault tolerant capabilities which are very important when the LFFA is applied in adaptive digital filtering applications.

University of Cincinnati, United States

Abstract: The use of on-chip antennas for the continuous monitoring of integrated circuit aging is proposed. The detection electromigration of interconnects using near-field EM emissions by measuring the change in differential-mode coupling between the interconnects and on-chip antennas is analyzed in an electromagnetic simulator. The on-chip antennas are fabricated and annealed to determine their compatibility with back end of line process.

## Session C3L-D: Education & Workforce Development in Photonics

Chair: Steve Adamshick, *Western New England University* Co-Chair: John Burke, *Western New England University* Time: Wednesday, August 14, 2024, 14:00 - 15:30 Location: Room 4

Middle School Matters: Pioneering Photonics Education using SparkAlpha Explore ...... 1281

Vanessa Mahoney, Kristen Outler, Robert Vigneau, Kevin McComber Spark Photonics Foundation, United States

Abstract: As a nation, it is imperative that we address the science, technology, engineering, and mathematics (STEM) skills gap and ensure that our future workforce is equipped to meet the needs of the global economy. As we continue to expand middle school career exploration, we are investing in the future of our students and also laying the foundation for a stronger, more resilient workforce. In 2022, President Biden signed the CHIPS and Science Act (CHIPS Act), allocating \$52.7 billion to bolster American manufacturing, supply chains, and national security. The Act serves to broaden STEM opportunities for Americans, facilitating participation in cutting-edge industries, such as quantum computing and AI. It provides new opportunities to overlooked communities in STEM and secures U.S. manufacturing [1]. Spark Photonics Foundation launched its train-the-teacher model, SparkAlpha Explore, introducing to-day's youth to crucial STEM topics such as semiconductors, integrated photonics, and advanced manufacturing. This program sets the stage for future success for middle and high school students by engaging them with advanced STEM topics at a critical time in their studies.

<sup>1</sup>Western New England University, United States; <sup>2</sup>Springfield Technical Community College, United States

**Abstract:** This paper introduces a practical approach to comprehending silicon photonic waveguide modes. Traditionally, these modes involve complex mathematical solutions derived from Maxwell's Equations, making them challenging for many engi- neering students. However, for those pursuing associate degrees, such mathematical rigor is often unnecessary. Consequently, there is a need for alternative teaching methods tailored to students aspiring for technician roles in advanced fields like integrated photonics. These new methods aim to impart tech- nical knowledge and skills aligned with industry requirements. Effective teaching strategies often incorporate hands-on activities to reinforce both theoretical concepts and practical skills. The described hands-on activity focuses on enhancing understanding of silicon photonic waveguide modes by exploring the impact of single mode versus multimode optical fibers on the laser beam quality parameter.

### The Impact of Noise on Quantum Adder Circuits: An IBM Quantum Case Study ...... 1290 Jefferson Rice, David H.K. Hoe

## Loyola University Maryland, United States

Abstract: We are entering an exciting era where it is becoming possible to execute 'useful' quantum algorithms. However, these quantum computing platforms will continue to be noisy for the foreseeable future. In this paper, we evaluate three simple error mitigation methods applied to a practical circuit, the quantum adder: (1) error correction through the use of replicated circuits and a voter circuit, (2) error detection by encoding, which involves expanding the qubit space and partitioning it into orthogonal subspaces consisting of correct and incorrect code words, and (3) error mitigation by reducing circuit depth at the expense of increasing the number of qubits. These three mitigation methods are what we dub replication, encoding, and simplification, respectively. We find simplification to be most effective for implementing the simple quantum adders on the IBM Quantum Experience, while replication increases the gate count and the error count on current quantum devices. The encoding methods used in this study proved to be not effective for even the simple adder circuits that were implemented, suggesting the need to study and evaluate more sophisticated error correcting schemes.

### **Quantum Computing with MATLAB**

Hossein Jooya

#### MathWorks, United States

Abstract: We introduce the MATLAB Support Package for Quantum Computing as an integrated platform for developing, simulating, and executing quantum algorithms. We show how this package facilitates the prototyping of algorithms aimed at advancing problem-solving in areas such as optimization, scenario simulations, artificial intelligence, and machine learning, alongside tackling complex challenges in areas like chemistry and materials science. We will demonstrate how this package allows users to construct quantum circuits with a diverse array of gates, simulate algorithms locally or via cloud-based services, and execute these algorithms on quantum hardware.

### Impacts of Quantum Mechanics: A Survey of Applied Quantum Computing ...... 1295

Dasheng Zhang, Eric Savage, Jin Feng Lin, Ruolin Zhou University of Massachusetts Dartmouth, United States

Abstract: In this survey paper, we provide an overview of the significance of quantum computing. It includes a review of fundamental principles, technological advancements, the interconnections, and reviews the current state of research, discussing both essential achievements and persistent challenges. The paper also examines how these technologies complement each other, enhancing secure communication and computational efficiency. We propose future research directions that emphasize the importance of further advancements to address existing challenges and optimize the utilization of quantum technologies across multiple sectors.

## Session C3L-E: Biomedical Circuits & Systems

**Chair:** Pamela Abshire, *University of Maryland* **Time:** Wednesday, August 14, 2024, 14:00 - 15:30 **Location:** Room 5

Abstract: Event-based vision sensors offer high-speed, low-latency vision solutions. The sensitivity of these sensors to temporalchanges make them ideal for applications involving rapidly-moving subjects or patterns. This work demonstrates the use of an eventbased vision sensor to real-time detection of wing-beat frequency in flying honey bees. Wing-beat frequency measures is useful for species identification or colony health monitoring applications. A number of potential methods for detecting the wing-beat pattern are explored and assessed for the given problem. The results show that event-based vision can reliably measure the wingbeat frequency of bees in real time.

#### Real-Time Cell Viability Measurements using Lab-on-CMOS-Capacitance Sensor Array ...... 1305

Prithwish Dasgupta, Kyle Nielsen, Pamela Abshire, Sheung Lu

### University of Maryland, College Park, United States

Abstract: This project focuses on developing improvements for existing Complementary Metal Oxide Semiconductor (CMOS) chips, particularly enhancing their longevity and reusability, to boost their functionality in monitoring cell viability and to facilitate visual inspection. Through Printed Circuit Board (PCB) design, we created strategic board configurations aimed at mitigating packaging failures while preserving the data collection process using a microcontroller ensuring the creation of a reliable and replaceable cell viability measurement device. Using Computed Aided Design (CAD) software and programming Field Programmable Gate Arrays (FPGA), we developed a hot-swappable system and established a data readout system. These advancements notably enhance research efficiency and data quality by minimizing downtime and improving the correlation of capacitance measurements with direct visual observations of cell behavior.

## Towards a Broadly Configurable Wearable Device for Continuous Hemodynamic Monitoring ...... 1309

### Jeremy Yun, Steeve Nzama, Sahil Shah University of Maryland, College Park, United States

**Abstract:** Hemodynamic signals, such as Electrocardiogram (ECG), Photoplethysmography (PPG), and Electrical Bioimpedance (EBI), provide a myriad of critical health indicators. For instance, ECG is utilized by cardiologists to diagnose a range of cardiovascular diseases, PPG estimates blood oxygen levels, and EBI gauges vascular flow. A multimodal device capable of continuously monitoring these signals can furnish healthcare professionals with profound insights into patient well-being. This paper introduces a hardware design for a programmable, wearable device that measures multiple physiological signals, including ECG, PPG, and EBI. Equipped with an onboard microprocessor, the device enables real-time data processing. Compact and efficient, the system measures just 60.5mm by 38.75mm, has low-power dissipation, and is powered by a coin cell battery. Using largely discrete commercial components and a custom electrode array, the open-source hardware allows for rapid configurability and broad usage applications within the health monitoring domain.

Ahmedul Khan, Shiva Maleki Varnosfaderani, Mohammad Alhawari, Gozde Tutuncuoglu

### Wayne State University, United States

**Abstract:** This paper explores an energy-efficient resistive random access memory (RRAM) crossbar array framework for predicting epileptic seizures using the CHB-MIT electroen- cephalogram (EEG) dataset. RRAMs have significant potential for in-memory computing, offering a promising solution to over- come the limitations of the traditional Von Neumann architecture. By integrating a domain-specific feature extraction approach and evaluating the optimal RRAM hardware parameters using the NeuroSim+ benchmark-ing platform, we assess the performance of RRAM crossbars for predicting epileptic seizures. Our pro- posed workflow achieves accuracy levels above 80% despite the EEG data being quantized to 1-bit, highlighting the robustness and efficiency of our approach for epileptic seizure prediction.

### <sup>1</sup>Mayo Clinic, United States; <sup>2</sup>University of Minnesota, United States

Abstract: This study presents a new data acquisition Framework for synchronous dual Brain Interchange (BIC) systems recording. The setup expands the capacity for data recording by offering access to up to 64 channels. The environment utilizes our Simulink model, incorporating functionalities for synchronization using a master clock and email-based status updates. We evaluated the framework in the lab simulations, and we observed a 38 ms post-synchronization delay between the systems. We also demonstrated that this error can be minimized to as low as 5 ms through adjustments in the master clock resolution and data buffer size. We estimated units' sampling frequency with high accuracy to avoid desynchronization. We evaluated the setup on the intracranial EEG (iEEG) recording simultaneously with the clinical system and performed spike detection on the post-synchronized iEEG. We observed over 95% similarity rate between the dual BIC and clinical system. Additionally, we explored the optimal configuration for ground and reference connections between systems to achieve the highest signal quality, along with investigating the implications of frequency interference in dual-system operations.

## Session C3L-F: Artificial Intelligence, Internet of Things & Systems 8

**Chair:** Jennifer Blain Christen, *ASU* **Time:** Wednesday, August 14, 2024, 14:00 - 15:30 **Location:** Room 6

Abstract: In this work, we propose an alternative training approach for memristive circuits – the Manhattan rule training – which utilizes only sign information for weight updates. We present an in-depth analysis in both in-situ and ex-situ settings and show that not only does our method simplify circuit design but it also improves neural network robustness against device non-idealities. Using the MemTorch and our custom in- situ training framework, we implemented the Manhattan rule for MNIST classification and ECG signal detection tasks and achieved close to state-of-the-art performance under noise. Our work also provides a thorough comparison of Manhattan and conventional training methods under the effects of various device non-idealities, giving a crucial benchmark useful for the design of biomedical neural circuits.

Abstract: On the path towards more complex systems that deliver more value at lower cost in semiconductor products, the industry had to develop frameworks to support the rapid design of very complex systems giving rise to widespread adoption of VLSI. A very similar situation exists for the advanced manufacturing process supporting the semiconductor industry. In order to keep the pace of innovation in manufacturing technology, the concept of a Digital Twin has been the main target of multiple studies and proposals. In this work we demonstrate that by learning from the lessons of VLSI, we aim to construct a digital twin of the manufacturing process that permits the monitoring and optimization of the most important aspects of semiconductor manufacturing, while at the same time demonstrating that such agnostic approach can also be applied to other manufacturing models. In this work we will present necessary model characteristics to meet such requirements, under data conditions that are atypical for traditional machine learning training methods.

### Scalable 1-D FR-CNN for Signal Identification and Classification over a Wideband Channel ...... 1335

### Erika Caushi, Jin Feng Lin, Ruolin Zhou

### University of Massachusetts Dartmouth, United States

Abstract: In Next Generation (NextG) wireless communication systems, spectrum sharing addresses spectrum scarcity, ensuring high data rates and quality of service. A key to enabling spectrum sharing is sensing the electromagnetic spectrum (EMS) and characterizing surrounding wireless signals. Machine learning (ML) is essential for this, utilizing techniques like convolution neural networks (CNN), region-based CNN (R-CNN), Fast R-CNN, and Faster R-CNN (FR-CNN) to identify and extract multiple signals from a channel. This paper optimizes FR-CNN for 1-dimensional (1-D) signal processing for EMS sensing over a wide band, testing model scalability as bandwidth varies. Models are developed to run on both CPU and GPU. Universal software radio peripheral (USRP), GNURadio, and Radio-Frequency Network-on-Chip (RFNoC) are used for wideband receiver design for over-the-air tests. Evaluations show that the optimized 1-D FR-CNN effectively locates and characterizes active signals within the band. While R-CNN reduces classification time, it compromises accuracy. This study highlights the balance between speed and precision, vital for efficient spectrum management in wireless communications.

### Pedestrian and Cyclist Object Detection using Thermal and Dash Cameras in

**Different Weather Conditions** 1340 Austin Miller<sup>1</sup>, Yoosuf Marikar<sup>1</sup>, Abdulla Yousif<sup>1</sup>, Hamidreza Sadreazami<sup>2</sup>, Marzieh Amini<sup>1</sup> <sup>1</sup>Carleton University, Canada; <sup>2</sup>McGill University, Canada

Abstract: Ensuring the safety of cyclists and pedestrians has become imperative in our ever expanding urban centers. Despite advancements in vehicle safety technology, traditional cameras often fail in adverse weather and low-light conditions. This paper investigates the efficiency of integrating thermal cameras with dash cameras to enhance detection accuracy of vulnerable road users. We first collected and annotated datasets, comprising thermal and dash camera footage under various weather conditions. We then developed a deep learning object detection model using YOLOv8 and Roboflow. Separate models were trained for each camera, then fused to compensate for their individual limitations. It was observed that dash camera is prone to occlusions and varied lighting, whereas the thermal camera excels in low-light settings. The performance metrics for the thermal camera showed a total mAP50 of 0.92 and mAP50-95 of 0.52 for detecting both cyclists and pedestrians, reflecting a highly effective system with significant potential to improve road safety.

North Carolina A&T State University, United States

**Abstract:** The SD Biosensor STANDARD G6PD test is crucial for identifying glucose-6-phosphate dehydrogenase (G6PD) deficiency, which can lead to hemolytic anemia under specific stressors. For the first time, this study evaluates ChatGPT-4o's technical competency in utilizing the STANDARD G6PD test without prior demonstration, practice with observation, or independent demonstration solely based on a provided training and competency assessment document. The model's ability to accurately respond to the assessment underscores its potential to master other sensor and microelectronic technologies.

## Session C4L-A: Regulators, References & Reliability Methods

Chair: Tian Xia, University of Vermont Co-Chair: Jian Shao, Infineon Technologies Time: Wednesday, August 14, 2024, 16:00 - 17:30 Location: Room 1

### Invited Paper: Low Power, Fully-Integrated Flipped Voltage Follower LDO using

*Iowa State University, United States* 

Abstract: We present an output-capacitorless (OCL) flipped voltage follower (FVF) low-dropout regulator(LDO) with enhanced transient performance capable of providing clean output voltage for battery powered systems-on-chip (SoC) applications. We introduce undershoot and overshoot detector circuits which sense LDO output voltage and generate large dynamic current for immediate change of LDO pass device (PD's) gate source voltage during load transient. We implement a novel folded cascode error amplifier with additional turn-around stage to boost large signal slew for fast tracking of reference voltage. The proposed LDO is designed in TSMC 180nm and its performance is validated with simulation results. The LDO is capable of delivering up to 20mA load current and has 1.5V regulated output voltage. Simulation results show that the proposed design improves over/undershoot response by  $3.8 \times$  when compared with conventional FVF LDO and settles to within 1% accuracy in 20ns for load transient from  $100\mu$ A - 20mA. The proposed structure consumes  $24\mu$ A quiescent current and has a figure of merit (FoM) of 2.4ps.

### Small-Signal Model of a Boost Converter Exploiting ZVS at the High-Side MOSFET ...... 1352

Francesco Gabriele<sup>1</sup>, Fabio Pareschi<sup>1,3</sup>, Gianluca Setti<sup>2,3</sup>, Riccardo Rovatti<sup>3</sup>,

Giuseppe Calderoni<sup>4</sup>, Davide Lena<sup>4</sup>, Maria Rosa Borghi<sup>4</sup>

<sup>1</sup>Politecnico di Torino, Italy; <sup>2</sup>King Abdullah University of Science and Technology, Saudi Arabia;

<sup>3</sup>University of Bologna, Italy; <sup>4</sup>STMicroelectronics s.r.l, Italy

**Abstract:** In this paper, a small-signal model for a synchronous Pulse-Width Modulated (PWM) Boost converter operating in Continuous Conduction Mode (CCM) embedding a Zero Voltage Switching (ZVS) network for the high side power MOSFET is proposed. The exposed analysis aims at extending the field of equivalent circuit models for DC/DC power converters, presenting a fully characterization of the dynamical alteration of the Boost converter traditional small-signal model when the ZVS network introduced. The enhanced small-signal model results in a convenient tool exploitable since from the beginning of the compensation network design phase, as it permits to capture the main open-loop converter transfer function alteration, i.e. the control-to-output, line-to-output and output impedance transfer functions. The validity of the provided small-signal model is demonstrated through a set of SIMPLIS circuital simulations.

Switched-Inductor Multiple-I/O Power Supplies: MOSFET Selection and Cross Conduction ...... 1357

Linyuan Cui, Gabriel A. Rincón-Mora

### Georgia Institute of Technology, United States

Abstract: Switched-inductor power supplies are valued for their high efficiency despite the bulkiness of off-chip inductors. To enhance power density, single-inductor topologies are favored. However, single-inductor multiple-input/multiple-output power supplies (SL-MI/O) present unique design challenges that haven't been sufficiently explored. Selecting the most efficient MOSFET between NMOS and PMOS is non-trivial in SL-MI/Os. Therefore an intuitive metric called the Favorability Index (FNP) is introduced in this paper to aid designers in MOSFET selection. Unwanted turn-on of switches shorting inputs/outputs (cross conduction) can happen easily for single inductor designs, due to complex inductor node voltages. This paper investigates common methods used to mitigate cross conduction, and furthermore recommends a two-transistor selector topology to block cross conduction with low impact on efficiency.

Qianxi Cheng<sup>1</sup>, Linzhi Tao<sup>2</sup>, Jiaojiao He<sup>1</sup>, Jie Yang<sup>1</sup>, Chen Zhang<sup>1</sup>, Xin'an Wang<sup>1</sup> <sup>1</sup>Peking University, China; <sup>2</sup>University of Electronic Science and Technology of China, China

**Abstract:** This paper presents a low power high-voltage (HV) floating output level shifter (LS) with less than 0.7ns asymmetric delay. By using short-pulse method in designing both LS core and HV ground circuit (HVSS), the whole circuit consumes only 23uA with a 4MHz input square-wave signal. Cascaded by a simple digital circuit with asymmetric delay reduction technique, the delay of LS is less than 2ns normally for both rising and falling edge. The whole circuit is simulated in a 180 nm Bipolar-CMOS-DMOS (BCD) technology with an output range from 36.5V to 40V. Results show its excellent performance in both power consumption and asymmetrical delay margin, which indicates that the proposed LS is suitable for low power high voltage output DC-DC converters.

## A 113-nW, Sub-1 v Single BJT-Based Voltage and Current Reference in One Circuit ...... 1367

## Raghav Bansal, Shouri Chatterjee

### Indian Institute of Technology Delhi, India

Abstract: This article presents a single bipolar junction transistor (BJT)-based voltage and current reference combined in one circuit. The  $\beta$ -multiplier is used to generate the proportional-to-absolute-temperature (PTAT) voltage, which is added to the fractional emitterbase voltage (VEB) of the BJT to achieve a temperature-independent output reference voltage (VREF). The reference current (IREF) is obtained by the ratio of VREF and a temperature-compensated resistor (TCR), where TCR is implemented using a series connection of a complementary-to-absolute-temperature (CTAT) unsilicided P+ poly resistor and a PTAT N-well resistor. The proposed circuit is designed in a 65-nm low-power CMOS process and occupies an area of 0.1 mm2. Post- layout Monte-Carlo simulations show that the achieved average temperature coefficients of VREF and IREF are 13.9 ppm/° C and 83.85 ppm/° C, respectively, across a temperature range from  $-40 \circ$  C to  $120 \circ$  C. The mean values of VREF and IREF are 243.34 mV and 9.97 nA, respectively, with a standard deviation of 2.49 mV and 454 pA, respectively. Moreover, it operates from a minimum supply of 0.85 V, with a total power dissipation of 113 nW at  $120 \circ$  C.

## Session C4L-B: High-Performance Power Management Circuit & System

Chair: Easwaran Navaneeth, *Texas Instruments* Time: Wednesday, August 14, 2024, 16:00 - 17:30 Location: Room 2

### Distributed Energy Harvesting and Power Management Units for Self-Powered

University of Virginia, United States

Abstract: With the advancements in power and size reduction of integrated circuits, millimeter-scale self-powered systems can be seamlessly integrated into textiles or a single fiber strand, revolutionizing body sensor network applications. Energy harvesting and power management units (EHPMUs) face new design requirements for in-fabric systems including full autonomy, ultra-low quiescent power, high efficiency, and mm-scale footprint. Additionally, they must efficiently coordinate energy across distributed subsystems for enhanced system viability and scalability in a fiber or fabric network. This article examines current trends and innovative techniques for sub- $\mu$ A EHPMUs. It presents two case studies of sub- $\mu$ A distributed EHPMU designs: the first explores a switched-capacitor EHPMU featuring one-rail power sharing and cooperative dynamic-voltage-and-frequency-scaling; the second discusses a triple-input hybrid-inductor-capacitor multi-output EHPMU that offers improved efficiency, reduced size, multi-rail power sharing for heterogeneous fiber networks. We summarize the tradeoffs and key design considerations necessary for mm-scale EHPMUs designed for self-powered distributed fiber networks.

| Orthogon | al Decomposition based | Digital | <b>Controller Design</b> | for Hybrid | Converter | ••••• | 1378 |
|----------|------------------------|---------|--------------------------|------------|-----------|-------|------|
|          | - 0                    |         |                          |            |           |       |      |

Yi Tan<sup>1</sup>, Jianqiang Jiang<sup>2</sup>, Cheng Huang<sup>2</sup>, Hiroki Ishikuro<sup>1</sup> <sup>1</sup>Keio University, Japan; <sup>2</sup>Iowa State University, United States

**Abstract:** This paper proposes an orthogonal decomposition based digital hybrid converter control enhancement technology. In conventional controller design, imbalanced flying capacitor voltage could cause interference between output voltage regulation and flying capacitor voltage balancing. This could degrade related performance metrics, such as line transients. Therefore, this research proposes techniques based on orthogonal decomposition to dynamically minimize the impact of flying capacitor voltage regulation on output voltages. Comparing the simulation result of the proposed method with the conventional approach, it shows a noticeable enhancement in line transient response, indicating the potential of integrating it with many digital control schemes for extra performance boosts.

## Session C4L-C: Hardware & Software Design & Security

**Chair:** Shrivastava Aatmesh, *Northeastern University* **Co-Chair:** Sirakoulis Georgios, *Democritus University* **Time:** Wednesday, August 14, 2024, 16:00 - 17:30 **Location:** Room 3

## Ising Model Processors on a Spatial Computing Architecture ...... 1383

## Yanze Wu, Md Tanvir Arafin

George Mason University, United States

Abstract: Data-flow-driven spatial computing architectures are promising for the efficient acceleration of machine learning models at the edge devices. Interestingly, their application in other domains of computing remains under-investigated. Hence, this paper investigates the application of spatial computing architectures for designing reconfigurable Ising model processors on edge devices. We target AMD's Versal Adaptive SoCs platform that supports spatial computing and designed Ising model processors with multiple problem sizes. On a VCK-190 board, our experiments demonstrate that spatial computing implementation on the standard Metropolis algorithm achieves \$15times\$ to \$28times\$ speedup compared with a generic ARM Cortex A-72 processor for varying matrix sizes. Anonymized code and artifacts for the work are available at url{https://anonymous.4open.science/r/IMPA/} for reproducing the experiments and results presented in this work.

### RISC-Vcito: A Multicycle Tiny Processor Implemented with SKY130 PDK ...... 1388

Martin Gonzalez-Perez, S. Ortega-Cisneros, Francisco J. Rodriguez-Navarrete, German Pinedo-Diaz, Emilio Isaac Baungarten-Leon, Miguel Rivera-Acosta *CINVESTAV Guadalajara, Mexico* 

Abstract: This work provides a detailed account of the step-by-step process of creating RISC-Vcito. The journey begins by designing the processor's inner workings at the Register Transfer Level (RTL), defining how it should function and perform. To ensure the processor's reliability, a thorough validation phase using the Universal Verification Methodology (UVM) is conducted. Open-source tools, such as Icarus Verilog, Yosys and GTKWave, have been used to inspect all aspects of RISC-Vcito performance. Additionally, Cadence's Xcelium was utilized for verification. The article concludes by explaining how RISC-Vcito was implemented with the SKY130 Process Design Kit (PDK) using OpenLane. This integration includes various physical design steps like synthesis, placement, and routing, resulting in the creation of a Graphic Design System (GDSII) file. This contribution provides valuable insights into using open-source tools for processor development, covering RTL design, UVM validation, and the transformation of the design into a GDSII file, ready for the manufacturing process. Open-source tools have the power to shape modern computer architecture, leading to tangible products for production.

| Automated IC Design Flow using Open-Source Tools and 180 nm PDK                                    | 1393 |
|----------------------------------------------------------------------------------------------------|------|
| Uriel Jaramillo-Toral, Juan Carlos Garcia-Lopez, S. Ortega-Cisneros, Emilio Isaac Baungarten-Leon, |      |
| Cristian Torres-González, F. Sandoval-Ibarra                                                       |      |
| CINVESTAV Guadalaiara. Mexico                                                                      |      |

Abstract: This paper presents an academic project leveraging open-source tools, particularly OpenLane, in conjunction with the 180 nm Process Design Kit (PDK) provided by GlobalFoundries, for automating Integrated Circuit (IC) design, specifically digital design. This methodology significantly reduces the time and resources required to achieve high-quality functional ICs. The successful integration of GlobalFoundries' PDK ensures manufacturability. This work aims to inspire the digital design community to explore the potential of open-source tools and promote more accessible solutions in IC manufacturing. Two digital projects were designed, manufactured, and tested: a clock divider and a 7-segment display decoder. Experimental results show that the circuits operate at frequencies lower than 50MHz with a 3.3V bias.

| Security Risks Due to Data Persistence in Cloud FPGA Platforms     | 1398 |
|--------------------------------------------------------------------|------|
| Zhehang Zhang, Bharadwaj Madabhushi, Sandip Kundu, Russell Tessier |      |

### University of Massachusetts Amherst, United States

Abstract: The integration of Field Programmable Gate Arrays (FPGAs) into cloud computing systems has become commonplace. As the operating systems used to manage these systems evolve, special consideration must be given to DRAM devices accessible by FPGAs. These devices may hold sensitive data that can become inadvertently exposed to adversaries following user logout. Although addressed in some cloud FPGA environments, automatic DRAM clearing after process termination is not automatically included in popular FPGA runtime environments nor in most proposed cloud FPGA hypervisors. In this paper, we examine DRAM data persistence in AMD/Xilinx Alveo U280 nodes that are part of the Open Cloud Testbed (OCT). Our results indicate that DDR4 DRAM is not automatically cleared following user logout from an allocated node and subsequent node users can easily obtain recognizable data from the DRAM following node reallocation over 17 minutes later. This issue is particularly relevant for systems which support FPGA multi-tenancy.

University of Cincinnati, United States

Abstract: Split Manufacturing (SM) and Logic Obfuscation (LO) are potent design-for-trust solutions to mitigate hardware security threats when fabricating Integrated Circuits (ICs) at untrusted foundries. However, many such defense methods are subject to attacks such as Network Flow Attack and Boolean Satisfiability Attack, that have attempted to reconstruct the missing BEOL (Back End Of Line) signals and decrypt the design. Hence, it is crucial to select BEOL signals carefully. Although several defense strategies have been proposed to protect the design against the attacks, many have been implemented at lower abstraction levels that incur expensive re-synthesis cycles. In this paper, we combine SM and LO for enhanced security of 3D ICs by proposing a BEOL signal selection method at high level based on assigning edge-weights to prioritize Data Flow Graph edges, that also identify locations for Polymorphic Switch Boxes (PSBs) for LO. A PSB has more key-bit combinations compared to a CMOS SB for an attacker to correctly unlock it. Our method decreases attack correctness with increase in the percentage of edges lifted and has minimal impact on design performance in terms of area and wirelength.

## Session C4L-D: Academic Education

**Chair:** Mohsin Jamali, *University of Toledo - Ohio* **Time:** Wednesday, August 14, 2024, 16:00 - 17:30 **Location:** Room 4

Reasonable Sense of Direction: Making Course Recommendations Understandable with LLMs ...... 1408 Hong Wei Chun, Rongqing Kenneth Ong, Andy W.H. Khong

Nanyang Technological University, Singapore

**Abstract:** Course recommendation systems play an essential role in academic institutions for students to find courses that align with their interests and graduation requirements. However, due to their "black-box" nature, recommendation systems often lack transparency and interpretability, leading to challenges in trust and usability. Our proposed framework leverages Large Language Models (LLMs) to generate clear, human-readable explanations based on course content by drawing connections between the existing courses taken by the student and recommended courses.

### Early Promotion of Academic Education through Practical Courses in the Context of Smart IoT Systems ...... 1413

Lech Kolonko, Gerrit Maus, Jörg Velten, Anton Kummert University of Wuppertal, Germany

Abstract: In this paper, we present an approach to integrating Internet of Things (IoT) technologies into academic curricula, focusing on practical implementation and early exposure to multidisciplinary IoT concepts. The course structure is designed to provide handson experience with IoT applications in the first two semesters of an engineering degree, aiming to bridge the gap between theoretical knowledge and practical skills at an early stage. Through a series of nine units, students engage in experiments and mini-projects that cover essential IoT topics such as sensor and actuator integration, WiFi communication, low-energy concepts, bus systems, and audio signal processing with a minimum of theoretical introduction. The use of readily available microcontroller boards, like the ESP32-DevKitC, along with intuitive programming environments and community-developed libraries, facilitates the demystification of complex IoT technologies. The proposed approach not only enhances students' technical competencies but also fosters a sense of achievement and prepares them for advanced studies.

### 

Diego Cachay, Valerio Laura, Ayrton Poma, Sebastián Solórzano, Mario Chauca *Universidad Nacional Tecnológica de Lima Sur, Peru* 

**Abstract:** This research describes how the AR technique can play an important role in the user's interactivity with the problem. The idea of the project involves the creation of an AR app designed in Unity software. Through an image scanned by a cell phone camera, it will project the free body diagram of the control system in 3D so that the user can have a general view of the problem to be treated, an explanation of mathematical modeling through a video, and a graph with unit scale input. The results of the research focused on the validity of our proposal and were evaluated by 28 students surveyed whose analysis was carried out under the aspects, whose results were positive, whose Design aspect obtained 0.73, Pedagogical 0.78, Content 0.76, and Technical 0.75. In addition, a high value for the reliability level was obtained, with a Cronbach's alpha of 0.942. According to the research, it is concluded that the implementation has a positive impact on the students due to the design of presenting the free body diagram and the interactivity.

| Impact of Summer Undergraduate Engineering Research on Student Performance                                               |      |
|--------------------------------------------------------------------------------------------------------------------------|------|
| Highlighted through GPA Tracking                                                                                         | 1423 |
| Lokesh Saharan <sup>1</sup> , Mohsin M. Jamali <sup>2</sup> , Sepehr Arbabi <sup>3</sup> , Hossein Hosseini <sup>3</sup> |      |
| <sup>1</sup> Gannon University, United States; <sup>2</sup> The University of Toledo, United States;                     |      |
|                                                                                                                          |      |

<sup>3</sup>*The University of Texas Permian Basin, United States* 

Abstract: The engineering workforce is of critical importance for the United States, especially with the changing global landscape post-COVID-19. Especially, MSI institutions such as the University of Texas Permian Basin (UTPB), are key to providing that workforce in the critical areas of manufacturing, and energy among other engineering disciplines. However, due to low entry requirements, students often struggle with the gateway engineering courses such as Statics, Dynamics, and Fluid Mechanics-1 causing dropouts or longer graduation periods. Therefore, the authors proposed a five-pronged intervention approach through the EM-STEP program. This study highlights the impact of the undergraduate summer research program on the overall academic performance of the students. Students chosen through general applications from a targeted group got the opportunity to participate in a paid engineering research project with a faculty mentor for 8 weeks during the summer. The study shows, 13 out of 17 participating students have shown a positive change in their GPA including more than 50% of these students, showed a GPA increase of more than 10%.

## Session C4L-F: AI Circuits, Systems & Algorithms

Chair: Shiva Maleki Varnosfaderani, *Wayne State University* Time: Wednesday, August 14, 2024, 16:00 - 17:30 Location: Room 6

<sup>1</sup>Lahore University of Management Sciences, Pakistan; <sup>2</sup>Western Washington University, United States

Abstract: Seizure occurrences can be very difficult to detect at a young age and become more serious with age. Currently, there is no patient-friendly seizure prediction solution that can precisely detect a seizure's occurrences to help prevent the injury of a patient. This paper presents a Neural Network (NN)-based classifier design to detect and predict the seizure events from EEG signals. The proposed system is divided into two phases, 1) training of the Classifier to detect/predict the seizure events, and 2) realization of the trained NN model for real-time implementation. The implemented exponent-mantissa sigmoid activation function in the Artificial NN (ANN) hidden layer exploits the IEEE-754 property and rearranges to attain an energy consumption of  $0.39 \mu$  J/operation and decreases the area by 28.2% compared to the conventional activation function realizations. The proposed system predicts the seizure with an average accuracy of 92.47%, 92.31%, and 88.77%, for 10 sec, 20 sec, and 30 sec, respectively, using the CHB-MIT EEG database with a latency of 1 sec.

Abstract: Recent advancements in Deep Neural Networks (DNNs) have revolutionized various industries, but they come with challenges, notably in memory storage and computational demands. Logarithmic Number Systems (LNS) offer a promising solution by efficiently representing data and reducing precision requirements. This review explores integrating LNS into DNN frameworks, focusing on architectures with LNS-based multipliers. Taxonomy covers end-to-end implementations and architectures utilizing LNS multipliers. Various designs, including Mitchell's, iterative, bilateral error, and explicit logarithm-antilogarithm-based multipliers, are examined. Comparative analysis shows these architectures yield power-efficient hardware with minimal accuracy loss, ideal for DNN inference tasks. Tailored designs like bilateral error multipliers offer additional power savings and accuracy enhancement, highlighting LNS's potential for efficient DNN architectures.

| Visual Analysis of Leaky Integrate-and-Fire Spiking Neuron Models and Circuits | 1437 |
|--------------------------------------------------------------------------------|------|
| Sara Sedighi, Farhana Afrin, Elonna Onyejegbu, Kurtis D. Cantley               |      |

### Boise State University, United States

**Abstract:** Emulating biologically plausible online learning in spiking neural networks (SNNs) will enable the next generation of energy-efficient neuromorphic architectures. While software leads the way in terms of exploring various Machine Learning (ML) algorithms and applications, bridging the gap between hardware (devices and circuits) and software is crucial to accurately predict network properties, especially at large scale. This work compares behavior of a spiking neuron circuit simulated with Cadence Spectre to a Python model implemented with a custom spiking neuron model. The results demonstrate that the two exhibit the same spiking characteristics over a range of parameter values, confirming that the more versatile Python model indeed has a hardware equivalent.

### DECO: Dynamic Energy-Aware Compression and Optimization for In-Memory Neural Networks ...... 1441

Rebati Gaire<sup>1</sup>, Sepehr Tabrizchi<sup>1</sup>, Deniz Najafi<sup>2</sup>, Shaahin Angizi<sup>2</sup>, Arman Roohi<sup>1</sup> <sup>1</sup>University of Nebraska-Lincoln, United States; <sup>2</sup>New Jersey Institute of Technology, United States

**Abstract:** This paper introduces DECO, a framework that combines model compression and processing-in-memory (PIM) to improve the efficiency of neural networks on IoT devices. By integrating these technologies, DECO significantly reduces energy consumption and operational latency through optimized data movement and computation, demonstrating notable performance gains on CIFAR-10/100 datasets. The DECO learning framework significantly improved the performance of compressed network modules derived from MobileNetV1 and VGG16, with accuracy gains of 1.66% and 0.41%, respectively, on the intricate CIFAR-100 dataset. DECO outperforms the GPU implementation by a significant margin, demonstrating up to a two-order-of-magnitude increase in speed based on our experiment.

### Performance Evaluation of Epilepsy Prediction Model based on Seizure' Type and Patients' Characteristics ....... 1446 Shiva Maleki Varnosfaderani, Nabil J. Sarhan, Mohammad Alhawari

### Wayne State University, United States

**Abstract:** In this paper, for the frst time, we interpret the epileptic prediction model results based on seizure types, patient's gender and age, and seizure onset zone. Analyzing results indicate that seizure type, gender, age, and seizure onset zone can affect the model performance. Our findings indicate that the epileptic prediction model obtained worse performance for the patients in the range of 60 to 80 years old, male, patients with the seizure onset zone of temporal compared to younger patients, female patients, and patients with the seizure onset zone of frontal and partial lobs, respectively. Many problems may be resolved by taking these factors into account when developing an epileptic seizure prediction model.