Publications

ChannelExplorer: Exploring Class Separability Through Activation Channel Visualization

Deep neural networks (DNNs) achieve state-of-the-art performance in many vision tasks, yet understanding their internal behavior remains challenging—particularly how different layers and activation channels contribute to class separability. We introduce ChannelExplorer, an interactive visual analytics tool for analyzing image-based outputs across model layers, emphasizing data-driven insights over architecture analysis for exploring class separability. ChannelExplorer summarizes activations across layers and visualizes them using three primary coordinated views: a Scatterplot View to reveal inter- and intra-class confusion, a Jaccard Similarity View to quantify activation overlap, and a Heatmap View to inspect activation channel patterns. Our technique supports diverse model architectures, including CNNs, GANs, ResNet and Stable Diffusion models. We demonstrate the capabilities of ChannelExplorer through four use-case scenarios: (1) generating class hierarchy in ImageNet, (2) finding mislabeled images, (3) identifying activation channel contributions, and (4) locating latent states' position in Stable Diffusion model. Finally, we evaluate the tool with expert users.

🎯 article

3DMPE: 3D Multi-Perspective Embedding

We describe a 3-Dimensional Multi-Perspective Embedding (3DMPE) approach for 3D point cloud reconstruction. The algorithm takes as input two or more 2D snapshots, i.e., 2D subspace projections of an unknown 3D point cloud, along with a correspondence between the points, although not all points are required to be present in all projections. Different from current state-of-the-art algorithms that require training on many examples and perform well mostly on those types of objects that were seen in training, ours is an optimization-based (unsupervised learning) algorithm that solves a simultaneous multi-perspective optimization problem and works well on any type of object. We demonstrate the algorithm's performance on multiple datasets using three quality measures: Earth Mover distance, Chamfer distance, and ROA. We quantitatively evaluate the scalability and robustness of 3DMPE when varying the number of input projections and the size of the input data. Finally, we demonstrate the robustness of 3DMPE with various noise regimes, including incorrect correspondences between the points and incorrect distance measurements.

🎯 article

IEEE 3DV 2024 Motivation Paper (MPSE)

Prediction of Apple Leaf Diseases Using Multiclass Support Vector Machine

Every year apple yield has been affected by Black rot and Cedar apple rust. It has a significant effect on both the apple industry and the country's economy. Here, we recommend a system to detect diseases from the infected apple leaves by combining machine learning and image processing principles. This approach can classify both infected and non-infected apple leaves efficiently. The identification is started by preprocessing the image using several image processing techniques, including the Otsu thresholding algorithm and histogram equalization. Using the image segmentation region of the infected part separates, and a Multiclass SVM recognizes the disease type from the original leaf image among 500 images with 96% accuracy. It also demonstrates the percentage of the total infected area of that diseased apple leaf image.

🎯 article

IEEE Conference

Visualizing Interaction Networks and Evidence in Biomedical Corpora

The abundance of scientific articles published and indexed in publicly accessible repositories has spurred the research and development of automated information extraction systems. The output of such systems can be used to assemble large networks capturing the understanding of mechanistic pathways and their interactions as represented in the underlying body of research.We describe a system designed to help researchers search, visualize and interact with biological networks derived via information extraction tools. As input, the system takes a dataset of biological and biochemical interactions automatically generated by an information extraction system and provides an interface designed to search, visualize and interact with the data. The usage paradigm consists of identifying a starting point for a search, then using the data’s network structure by incrementally exploring the immediate neighborhood of the elements displayed by the system.Our system differs from prior work as it leverages both the network structure in the data and the natural language text backing those connections: every connection displayed is traceable back to the documents and phrases in the corpus that support that specific piece of information. We also present two case studies with immunobiology researchers using the system to find previously unknown relationships between biological entities. While the evidence suggesting these relationships already existed, it was scattered across the literature, and existing specialized web databases and domain-search engines could not find it. The system is open-source, with the code publicly available on GitHub.

🎯 article

IEEE Conference

A Multi-Modal Human Machine Interface for Controlling a Smart Wheelchair

As the number of disabled people all over the world is increasing very fast, the role of an electric wheelchair is becoming crucial to improve the mobility for them. Independent mobility is a vital aspect of self-respect and plays an important role in the life of a disabled person. The smart wheelchair is an endeavor to provide an self-supporting mobility to those people who are not able to move freely. Typical electric powered wheelchairs are usually controlled by the traditional joysticks which cannot fulfill the needs of a person who has motor disabilities and some specific types of disabilities like paralysis who can only move their eyes. This paper aims to develop a multi-modal human machine interface for the larger domain of disabled persons to control the wheelchair efficiently. The interface comprises joystick, smart hand-glove, head movement tracker and eye tracker. The system presented in this paper can support a wide variety of users with different types of disabilities.

🎯 article

IEEE Xplore

Agglomerative Clustering of Handwritten Numerals to Determine Similarity of Different Languages

Handwritten numerals of different languages have various characteristics. Similarities and dissimilarities of the languages can be measured by analyzing the extracted features of the numerals. Handwritten numeral datasets are available and accessible for many renowned languages of different regions. In this paper, several handwritten numeral datasets of different languages are collected. Then they are used to find the similarity among those written languages through determining and comparing the similitude of each handwritten numerals. This will help to find which languages have the same or adjacent parent language. Firstly, a similarity measure of two numeral images is constructed with a Siamese network. Secondly, the similarity of the numeral datasets is determined with the help of the Siamese network and a new random sample with replacement similarity averaging technique. Finally, an agglomerative clustering is done based on the similarities of each dataset. This clustering technique shows some very interesting properties of the datasets. The property focused in this paper is the regional resemblance of the datasets. By analyzing the clusters, it becomes easy to identify which languages are originated from similar regions.

🎯 article

IEEE Xplore PDF

Audio Future Block Prediction with Conditional Generative Adversarial Network

Signal processing is a vast subfield of electrical and computer science where audio signal processing has secured a remarkable position to restore corrupted or missing audio blocks. However, generating possible future audio block from the previous audio block is still a new idea that can help to reduce both audio noise and partially missing an audio segment. In this paper, a generative adversarial network (GAN) along with a pipeline is proposed for the prediction of possible audio after an input audio sequence. The proposed model uses short-time Fourier transformation of audio to make it an image. The image is then fed to a conditional GAN to predict the output image. After that, Inverse short-time Fourier transform is then applied to that predicted image, generating the predicted audio sequence. For a small audio sequence prediction, the proposed methodology is quite fast, robust and has achieved a loss of 0.43. So it is may work well if deployed on a voice call and broadcasting applications.

🎯 article

IEEE Explore

Extraction of Sequence from Bangla Handwritten Numerals and Recognition Using LSTM

In the promising era of Handwritten Numeral Recognition (HNR), despite Bangla being one of the major languages in the Indian subcontinent, fewer explorations have been done on Bangla numerals compared to other languages. Among the existing methods, several convolutional neural network (CNN) based method outperformed other methods. But CNN always gets confused with some specific Bangla numerals due to the similarity of shape and size of different numerals. The main purpose of this study is to expand Bangla HNR by considering a novel methodology with a Long Short-Term Memory (LSTM) network. In the proposed method, images are thinned and a sequence is extracted. These extracted sequences are used to classify using LSTM network. Both single-layer LSTM and Deep LSTM models are trained and performance tested on a benchmark dataset with a large number of samples. On the other hand, traditional CNN is also trained for better understanding. Experimental outcomes revealed that the proposed LSTM based method outperformed CNN with remarkable accuracy for the similar shaped numerals. Finally, the proposed method achieved a test set recognition rate of 98.03% which is better than or competitive to other prominent existing methods.

🎯 article

IEEE Explore

Handwritten Numeral Recognition integrating Start-End Points Measure with Convolutional Neural Network

Convolutional neural network (CNN) based methods have been very successful for handwritten numeral recognition (HNR) applications. However, CNN seems to misclassify similar shaped numerals (i.e., the silhouette of the numerals that look same). This paper presents an enhanced HNR system to improve the classification accuracy of the similar shaped hand written numerals incorporating the terminals points with CNN’s recognition. In hand written numerals, the terminal points (i.e. the start and end positions) are considered as additional property to discriminate between similar shaped numerals. Start-End Writing Measure (SEWM) and its integration with CNN is the main contribution of this research. There are three major functional steps in the proposed SEWM-CNN: classification of a numeral image using a standard CNN; identification of start and end writing points from the silhouette of the numerals; and finally, system output integrating SEWM with the CNN decision. The proposed method is tested on rich benchmark numeral datasets of Bengali and Devanagari numerals. SEWM-CNN reveals itself as a suitable HNR method for Bengali and Devanagari numerals while compared with other existing methods.

🎯 article

Journal Site PDF