portfolio

Project 1 - Customer Churn Prediction

This project aims to predict customer churn for ABC Multistate Bank using supervised machine learning techniques. The dataset includes various customer attributes such as credit score, age, balance, and whether they have churned (exited).

Key Features:

Data Preprocessing: Handling missing and duplicate values, encoding categorical variables, normalization, and feature engineering.
Modeling: Logistic Regression, Decision Trees, Random Forest, and SVM were compared to find the best-performing model.
Evaluation: Models were evaluated based on accuracy. The SVM model provided the best mix of training and testing accuracy without significant overfitting.
Results: SVM is recommended due to its robustness and reliable performance in predicting customer churn.

For more details, please visit the project repository.

Project 2 - Sales Analytics Dashboard

Key Features:

Data Preparation: Imported and cleaned sales data using Power Query to ensure data accuracy and consistency.
Data Transformation: Applied transformations for item categorization, store type classification, and sales aggregation.
Data Modeling: Built data models to analyze relationships between store types, item categories, and sales performance.
Visualization Building: Created interactive charts and graphs to display sales trends, outlet establishment year, item breakdowns, and location-based sales distribution.
Dashboard Design: Developed an intuitive layout with slicers for filtering by store type, item type, and outlet size to enable dynamic user interaction.

For more details, please visit the project repository.

Project 3 - Sentiment Analysis

This project aims to classify customer reviews of Alexa products as positive or negative using supervised machine learning techniques. The dataset includes attributes such as feedback, variation of the product, review, rating and date.

Key Features:

Data Preprocessing: Handling missing and duplicate values,, text preprocessing (lowercasing, removing punctuation, tokenization, stemming), and building the corpus.
Modeling: Logistic Regression, Random Forest, and Support Vector Machine (SVM) were compared to find the best-performing model.
Evaluation: Models were evaluated based on train and test accuracies.
Results: The Support Vector Machine (SVM) model is recommended due to its highest test accuracy, indicating robust performance in predicting sentiment.

For more details, please visit the project repository.

Project 4 - Crop Yield Prediction

This project aims to predict crop yield in hectograms per hectare (hg/ha) using supervised machine learning techniques. The dataset includes attributes such as Area, Crop name, Year, hg/ha_yield, average rainfall in mm per year, pesticides in tonnes, average temperature.

Key Features:

Data Preprocessing: Handling missing and duplicate values, encoding categorical variables, and feature engineering.
Modeling: Random Forest and XGBoost were compared to find the best-performing model.
Evaluation: Models were evaluated based on accuracy and R^2 value. Hyperparameter tuning and cross-validation were performed to determine the most suitable model.
Results: Random forest model is recommended due to lower RMSE, indicating robust performance in predicting crop yield.

For more details, please visit the project repository.

Project 5 - Customer Segmentation

This project aims to segment customers based on annual income and spending score using K-Means clustering.

Key Features:

Data Preprocessing: Handling missing values and duplicate values.
Exploratory Data Analysis: Data visualization using histograms, box plots, and scatter plots to identify patterns.
Clustering Algorithm: K-Means clustering is applied to group customers.
Results: Customers were segmented into 5 groups based on annual income and spending score:
1. Low income and low spending score
2. High income and low spending score
3. Medium income and medium spending score
4. Low income and high spending score
5. High income and high spending score

For more details, please visit the project repository.

Project 6 - Exploratory Data Analysis with Netflix Dataset

This project involves conducting an exploratory data analysis (EDA) on a Netflix dataset to gain insights into the content available on the platform.

Key Features:

Dataset Overview: Understanding the structure and types of data.
Analysis: Includes distribution of content types, top content-producing countries, prolific directors and actors, genres, and evolution of content length.

Results:

Detailed visualizations and insights for each analysis objective were created.
Trends and patterns in content types, production by countries, popular directors and actors, genre distribution, and movie lengths were identified and presented.

For more details, please visit the project repository.