Scala Machine Learning Projects

Md. Rezaul Karim

更新时间：2021-06-30 19:06:29

coverpage

Title Page

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Analyzing Insurance Severity Claims

Machine learning and learning workflow

Typical machine learning workflow

Hyperparameter tuning and cross-validation

Analyzing and predicting insurance severity claims

Motivation

Description of the dataset

Exploratory analysis of the dataset

Data preprocessing

LR for predicting insurance severity claims

Developing insurance severity claims predictive model using LR

GBT regressor for predicting insurance severity claims

Boosting the performance using random forest regressor

Random Forest for classification and regression

Comparative analysis and model deployment

Spark-based model deployment for large-scale dataset

Summary

Analyzing and Predicting Telecommunication Churn

Why do we perform churn analysis and how do we do it?

Developing a churn analytics pipeline

Description of the dataset

Exploratory analysis and feature engineering

LR for churn prediction

SVM for churn prediction

DTs for churn prediction

Random Forest for churn prediction

Selecting the best model for deployment

Summary

High Frequency Bitcoin Price Prediction from Historical and Live Data

Bitcoin cryptocurrency and online trading

State-of-the-art automated trading of Bitcoin

Training

Prediction

High-level data pipeline of the prototype

Historical and live-price data collection

Historical data collection

Transformation of historical data into a time series

Assumptions and design choices

Data preprocessing

Real-time data through the Cryptocompare API

Model training for prediction

Scala Play web service

Concurrency through Akka actors

Web service workflow

JobModule

Scheduler

SchedulerActor

PredictionActor and the prediction step

TraderActor

Predicting prices and evaluating the model

Demo prediction using Scala Play framework

Why RESTful architecture?

Project structure

Running the Scala Play web app

Summary

Population-Scale Clustering and Ethnicity Prediction

Population scale clustering and geographic ethnicity

Machine learning for genetic variants

1000 Genomes Projects dataset description

Algorithms tools and techniques

H2O and Sparkling water

ADAM for large-scale genomics data processing

Unsupervised machine learning

Population genomics and clustering

How does K-means work?

DNNs for geographic ethnicity prediction

Configuring programming environment

Data pre-processing and feature engineering

Model training and hyperparameter tuning

Spark-based K-means for population-scale clustering

Determining the number of optimal clusters

Using H2O for ethnicity prediction

Using random forest for ethnicity prediction

Summary

Topic Modeling - A Better Insight into Large-Scale Texts

Topic modeling and text clustering

How does LDA algorithm work?

Topic modeling with Spark MLlib and Stanford NLP

Implementation

Step 1 - Creating a Spark session

Step 2 - Creating vocabulary and tokens count to train the LDA after text pre-processing

Step 3 - Instantiate the LDA model before training