• Hi!
    I'm Arpit

    Currently seeking full-time opportunities in the field of Software Engineering, Machine Learning and Data Science.

    Download CV

  • I am
    a Software Engineer & Data Scientist

    Focused on providing insights to help shape the product and meet business goals.

    Download CV

  1. 1
  2. 2

Who Am I?

Hi I'm Arpit Shah, a passionate Data Scientist and a Software Developer by heart. I recently graduated with a Master of Science in Data Science from Indiana University Bloomington.

I've worked as a Software Engineer at GEP Worldwide which gave me an opportunity to enhance my development skills, collaborate with various teams, and also work in an agile environment. I also worked as a Machine Learning - Software Engineer at AVA; a start-up focused in Health, Wellness, and Fitness domain. I was responsible for building a recommendation model for complementary recipes and Machine Learning models for ingredients, meal-type and cuisine classification of recipes, and also helping in Software Engineering efforts of the company.

I have a passion for various sports like football, cricket, tennis, table tennis, etc. I love going on trekking trips and facing all the challanges it posseses. Along with that, I have also been an active member of Leo Club Of Bombay Mulund Youth, a non-profit organization; for a year. As a part of this organization, I provided meals in flood affected areas, clothes to the needy and also been a part of many cleanliness drives.

What are my expertise?

Machine Learning and Deep Learning

Efficient at building various classification and predictive models along with optimization techniques.

Software Development

Experienced Software Engineer with proficiency in backend along with a good understading of SDLC.

Data Analysis

Evaluating data using analytical and statistical tools to discover useful information and aid in business decision making.

Data Modeling and Preprocessing

Preprocessing data from different sources into required format and connecting all of them to create a comman logical model.

Data Engineering

Building pipelines to store and transform data to prepare it for analysis.

Data Visualization

Creating innovative dashboards and visualizations to better understand data and take informed decisions.

My Skills

Programming Languages

Python
C#
Java
R
C++
C

Data Engineering

SQL Server (SSMS)
Apache Spark
Hadoop
PostgreSQL
MySQL
MongoDB

Data Science Tools

Tensorflow
Keras
NLTK
Excel
Scikit-learn
Numpy
Pandas

Data Visualization

Tableau
Matplotlib
Seaborn
ggplot
Excel

Web Technologies

AngularJS
JavaScript
HTML5
CSS
jQuery

Certifications

Oracle Certified Professional Java SE-6 Programmer (OCPJP)
Data Analysis with Python and R
Big Data Analytics

Education

Indiana University Bloomington August 2018 - May 2020

Relevant courses :

Deep Learning, Machine Learning for Signal Processing, Advanced Natural Language Processing, Search, Advanced Database Concepts, High Performance Big Data Systems, Applied Algorithms, Statistics

K. J. Somaiya College of Engineering June 2013 - May 2017

Relevant courses :

Data Warehouse and Mining, Machine Learning, Artificial Intelligence, Distributed Databases, Database Management Systems, Analysis of Algorithms, Data Structures, Object Oriented Programming

Work Experience

Machine Learning - Software Engineer Intern June 2019 - Dec 2019

AVA

  • Designed a recommendation system using text-embeddings for complementary recipes and discovering new food pairings.
  • Improved F1-score from 0.53 to 0.77 by developing a classifier using sklearn and Keras to predict the meal type of a recipe.
  • Increased model accuracy to 89% for prediction of 10 tags of 5000 recipes for users' customized diet recommendations.
  • Collaborated with nutritionists and built a REST API to fetch various attributes of recipes using Java and PostgreSQL.

Technologies Used : Python, Sklearn, Keras, Java, Git, Bash, PostgreSQL

Associate Software Engineer June 2017 - June 2018

GEP Worldwide

  • Developed end-to-end functionalities to improve user accessibility and data retrieval using C#, SQL Server, and AngularJS.
  • Reduced man-hours by 10% by developing a bulk utility system in SQL to upload 1 million rows of data per day via Excel.
  • Eliminated build dependency between modules to reduce downtime and increase development efficiency by 15%.
  • Integrated automatic email service to notify users on the completion of certain actions.
  • Enhanced understandability and readability of the document by developing an ‘Export to PDF’ feature using HandleBar.js.
  • Refactored legacy code to remove inefficiencies from the product to improve scalability and maintainability.
  • Performed unit and regression testing of the features using NUnit for backend and Selenium for frontend.
  • Collaborated with other teams to integrate the required functionalities into the product.
  • Actively participated in Agile/Scrum ceremonies for sprint planning, brainstorming, knowledge sharing, and retrospection.

Technologies Used : C#, .NET, SQL Server, T-SQL, TFS, MVC, REST, WCF, Angularjs, Java Script, jQuery

Assistant Engineer Dec 2015 – Mar 2016

Alchemus Infotech LLP

  • Stored content for the website in the form of quotes, images, videos, audio files for the website using MongoDB.
  • Collaborated with the Search Engine Optimization team and also filtered the content to be uploaded on the website.

Technologies Used : MongoDB, PHP

Recent Work

Yelp Recommendation System

Yelp Recommendation System

Built a restaurant recommendation system on Yelp dataset using Content-based, Collaborative and Hybrid based models. Achieved 95.5% precision on textrank based key phrase generation from Yelp restaurant reviews, implemented Whoosh and BM25 based review indexing and retrieval, developed prototype using Flask.

Audio Sentiment Analysis

Audio Sentiment Analysis

Developed a Deep Neural Network to analyse the sentiment of a customer in a conversation with a call center agent using various acoustic features and speech processing techniques.

Latent Dirichlet Allocation (LDA) on YELP Data

Latent Dirichlet Allocation (LDA) on Large Scale Data

Extensively carried out experiments to compare the performance of EM and Online LDA in Spark on Yelp dataset after tuning both LDA and Spark parameters.

Soldier Workload Classification

Soldier Workload Classification

In this project we are trying to classify the workload on various subjects using their Electroencephalographical Recordings (EEG) and applying various non-linear algorithms to classify if the subject has Base Line (BL) or Low Work Load (LWL) or High Work Load (HWL) workload.

Diabetes Prediction

Diabetes Prediction

Designed and developed an Android application that took parameters as input to predict if a person is diabetic or not. Trained a model using Adaptive Neuro-Fuzzy Inference System (ANFIS) and used the weights in backend of the application.

American Sign Language Video Generation from Text
Indy Big Data Visualization Challenge

Indy Big Data Visualization Challenge

Identified factors to improve the alignment of Indiana’s talent pipeline with the needs of the employers.

Get in Touch