J Manoj Balaji

Data Scientist & Passionate Researcher
manojbalaji1@gmail.com

About Me


From my childhood, I have had a very high curiosity which has made me a constant learner. My passion for learning new things is unsatiable. This got me to complete my Bachelor's and finish a couple of internships during the course. This also fuelled me to publish research papers and get me enrolled in a Master's Degree to continue the learning path. I am very curious about mathematics and its application in different fields such as Physics, Finance, Data Science.

I am a problem-solver by nature. I will sit with you, understand the problem, ask you more questions to get my doubts cleared, and will actively brainstorm with you. We will go through the problem-solving cycle
Understand --> Formulate --> Hypothesize --> Brainstorm --> PoC --> Test --> Deploy --> Repeat
repeatedly and go through solving the problem in an iterative fashion(One can just not reach Mars in a single step :p ).

I am also an open-source contributor with my latest contributions have been in Google's datacommonsorg/api-python, uber/causalml apart from other past contributions.

From a research perspective, my interest lies in solving problems related to Causal Inference, Optimization, Computer Vision. I have been constantly working on many research projects. Please feel free to ping me if anyone wants to discuss more on these topics!

Web Presence

Github

Loading the data just for you.

Skills

Work Experience


Data Scientist - 2

Deloitte

Sept, 2021 - Present

Data Scientist

DataDirect Networks

April, 2021 - Aug, 2021

  • Setup Airflow Cluster on AWS EKS
  • Migrate cronjobs to Airflow jobs
  • Predictive Maintainance and Anomaly Detection PoC

Data Scientist

Rakuten

June, 2018 - April, 2021

  • Optimizing ROI using Uplift based targetting
  • Bottom-up data-based analysis to understand Customer-Item interaction for identifying the potentiality of the introduction of new product lining and/or creating new product categories/genres for increasing revenue.
  • Understand and acquire customers across service(cross-service) using propensity based modeling. Improved CVR by ~18% contributing to GMS of ~900MM JPY
  • Effectively predict customers who might churn and help Business in preventing churn by incentivizing. Improved CVR by ~15% contributing to GMS of ~250MM JPY
  • Reactivate churned out customers to buy again using machine learning based targeting. Improved CVR by ~25% contributing to GMS of ~600MM JPY
  • Design and implementation of scalable framework and pipeline architecture for automating major tasks of machine learning project life cycle(like Uber's Michelangelo).
  • Analyse customer behaviour and demography data to understand pattern in customer behaviour and visualize them accordingly
  • Leverage AutoML for understanding customer behaviour
  • Build machine learning/deep learning models to forecast CLV(Customer Lifetime Value)
  • Understand how CLV(Customer Lifetime Value) can be leveraged to increase ROI of Business
  • Build tools to annotate, audit and review data
  • Leverage Google cloud platform for annotating data with voice to text for increasing efficiency and throughput
  • Build tools to understand efficiency of content curators and provide visualisation for management to understand the issues faced by curators
  • Build weak supervised model to decrease time spent on training data annotations

Data Science Intern

Rakuten

January, 2018 - June, 2018

  • Understand and hypothseize the problem statement
  • Create labelled dataset for the solving the problem
  • Build Proof of concept for catalog data validation, verification and backfill by product matching using Deep learning

Software Development Intern

JotArthur.com

October, 2016 - September, 2017

  • Development and Testing of Python-Django Application
  • Rapid Prototyping of proof of concept
  • Machine Learning and Predictive Analytics

Education


Master of Technology, Data Science and Engineering

Birla Institute of Technology, Pilani

October, 2019 - October, 2021

  • Machine Learning
  • Big Data Systems
  • Mathematics for Machine Learning
  • Deep Learning

Bachelor of Engineering, Information Science and Engineering

University Visvesvaraya College of Engineering, Bengaluru

August, 2014 - June, 2018

  • Data Mining
  • Soft Computing
  • Neural Networks
  • Mathematics

High School

Kendriya Vidyalaya IISc, Bengaluru

  • Computer Science
  • Physics
  • Chemistry
  • Mathematics

Publications


Dragonfly-Net: Dragonfly classification using Convolution Neural Network

DOI: 10.13140/RG.2.2.22681.85608

Abstract

Scientific and engineering interests towards dragonflies has been a consistent source of ideas and solutions owing to the evolutionary success of the species. The importance of these "toothed ones", as the Greek translation of the family name "Odonates" maps to, in terms of ecological diversity is invaluable, more pressingly with the context of only two of the six suborders of the order Odonata being non-extinct. With a widespread existential timeline, identifying them is in itself is a critical task for taxonomists. This literature is oriented to provide a standard identification tool that aids researchers, amateur naturalists, and beginners in quick and easy identification of odonates, thus aiming to influence deeper exploration of the order. We propose a novel approach in terms of Dragonfly-Net, that has a widespread application possibility in the field of ecology and biology, starting with classification of a given image with dragonfly or damselfly into the pre-trained list of species belonging to the order, without any pre-processing. The proposed model performed with an accuracy of 76.99% on the training set, 67.59% on the validation set, and 61.35% on the hold-out set. The model predicts 94 different species of dragonflies/damselflies. The effort is also protruded to derive and investigate the performance of the model with state-of-the-art evaluation techniques, scoped to explore the regions of activation contributing to its performance.


A Comprehensive Survey On Vision-Based Insect Species Identification and Classification

DOI: 10.13140/RG.2.2.10083.50720/1

Abstract

The research on automation of identification and classification, supported by studies of insect ecology and related domains was marked by the introduction of DAISY - digital automated identification system. The framework embarked multiple approaches to solve the problem, varying from PCA to NNC (nearest neighbour clustering) algorithms. Efforts involving statistical methods are also evident in the domain. Evolution of artificial neural networks, associative memory networks also marked major changes in the direction of effort towards the problem statement. Other machine-learning based classification techniques such as SVM, PCALC have also been used as solution methods. Clustering techniques combined with image features were revived with Correspondence filters. Various transforms such as Fourier, SIFT and wavelet as features also based some studies. With the dawn of deep learning, more advanced techniques such as pose estimation have also become a base to solution framing. Image-based techniques, involving pattern extraction have prevailed with exceptional results. Approaches towards automation of the process to the solution have been decorated ever since.


Recognition of Offline Kannada Handwritten Characters by Deep Learning using Capsule Network

DOI: 10.35940/ijeat.F8726.088619

Abstract

Handwritten character recognition is an important subfield of Computer Vision which has the potential to bridge the gap between humans and machines. Machine learning and Deep learning approaches to the problem have yielded acceptable results throughout, yet there is still room for improvement. off-line Kannada handwritten character recognition is another problem statement in which many authors have shown interest, but the obtained results being acceptable. The initial efforts have used Gabor wavelets and moments functions for the characters. With the introduction of Machine Learning, SVMs and feature vectors have been tried to obtain acceptable accuracies. Deep Belief Networks, ANNs have also been used claiming a con- siderable increase in results. Further advanced techniques such as CNN have been reported to be used to recognize Kannada numerals only. In this work, we budge towards solving the problem statement with Capsule Networks which is now the state of the art technology in the field of Computer Vision. We also carefully consider the drawbacks of CNN and its impact on the problem statement, which are solved with the usage of Capsule Networks. Excellent results have been obtained in terms of accuracies. We take a step further to evaluate the technique in terms of specificity, precision and f1-score. The approach has performed extremely well in terms of these measures also.


Offline Kannada Handwritten Characters using Convolutional Neural Networks

DOI: 10.1109/WIECON-ECE48653.2019.9019914

Abstract

Handwritten characters are still far from being replaced with the digital form. The occurrence of handwritten text is abundant. With a wide scope, the problem of handwritten letter recognition using computer vision and machine learning techniques has been a well pondered upon topic. The field has undergone phenomenal development, since the emergence of machine learning techniques. This work on a major scale devises to bridge the gap between the state-of-the-art technology, of deep learning, to automate the solution to handwritten character recognition, using convolutional neural networks. Convolutional neural networks have been known to have performed extremely well, on the vintage classification problem in the field of computer vision. Using the advantages of the architecture and leveraging on the preprocessing free deep learning techniques, we present a robust, dynamic and swift method to solve the problem of handwritten character recognition, for Kannada language. We discuss the performance of the network on two different approaches with the dataset. The obtained accuracy measured upto 93.2 per cent an accuracy of 78.73 per cent for the two different types of datasets used in the work.

Thank you!