Statistical Modeling of Wine Reviews

1 minute read

Building a predictive stastical model to identify wines through blind tasting by transforming the data into a story that can be used by non-technical people like wine sellers in a qualitative way or by people who have a peculiar taste in wines

Here we focus on providing stastical solution for the following questions. This is part of an academic project.

alt

The wine review dataset taken from (https://www.kaggle.com/zynicide/wine-reviews) is analyzed. Dataset contains wine reviews, the rating of wine (measured in points) and other relevant information obtained from wine enthusiasts from winemag.com. The data is available in two formats – json and csv. The objective here is to analyze this data to transform it into some useful information that can be used by non-technical people like wine sellers who would like to use the analysis in qualitative way or by technical managers/supervisors who check the correctness of the analysis done.
Statistical methods and models like Gibb’s sampling and Bayesian model is used to compare the means of different wines corresponding to different countries in order to find out the best rated wines and their regions. Use of Linear Regression model to estimate the rating (points) of the wines depending on other factors.
The report is divided into two parts Question 1 and Question 2, each having sections like Data Handling, Analysis (Analysis of Q1, Analysis for Q2), Conclusions (Summarize results, overall evaluation, and further recommendations).

Link to detialed report explaining solution to each question can be found below:
1. Report
2. Notebook Solution 1 - R
2. Notebook Solution 2 - R

Updated: