Detailed information and calculation of Pearson’s Correlation using Excel, Python, R and SPSS

Image ref

What is Pearson Correlation?

Pearson Correlation or Pearson Product Moment Correlation of (PPMC) or Bivariate correlation is the standard measure of correlation in statistics. It shows the linear relation between two sets of data. It answers the question in simple terms: can I draw a line graph to represent the data?

The Pearson correlation is expressed by two letters: the Greek letter rho (ρ) for a population, and the letter “r” for a study.

To find the relationship between variables in the data, correlation coefficient formulas are used. The formulas return a value ranging from -1 to 1, where:

1 implies a good relationship…


When we have a big dataset and excited to get started with analyzing it and building your machine learning model. Our machine gives an “out of memory” error while trying to load the dataset.

It’s happened to us most of the time when we have big dataset. Big dataset is one of the biggest hurdles we face in data science — dealing with massive amounts of data on computationally limited machines (of course we can resolve it with additional resource power).

So how can we overcome this problem? Is there a way to pick a subset of the data and…


Image source: Canva

Clustering is a statistical classification approach for the supervised learning. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters).

It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields including machine learning, pattern recognition, image analysis and data compression.

Clustering can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how…


Artificial intelligence and Machine learning related concepts , applications touches every part of our day to day lives.

AI Application in day to day Life . Reference: link

Your smartphone uses artificial intelligence to comprehend human language and answer questions or act in response to your commands, but the potential of Artificial Intelligence goes above and beyond that. Apple introduced AI Powered software called SIRI, few people around the world understood its mainstream significance and thus it didn’t gain popularity immediately, also owing to the fact that being AI based, it needed to learn and evolve which required time. …


Image Link : Reference

Nowadays, the manufacturing industry faces significant transformations. Due to the rapid growth of the digital world and the broad application of data science, different human activity fields are pursuing improvement. Modern manufacturing is also referred to as Industry 4.0, manufacturing under the conditions of the fourth industrial revolution that resulted in data robotization, automation, and widespread use. Every day, the amount of data to be stored and processed is increasing. Today’s manufacturing companies, therefore, need to find new solutions and use cases for this knowledge. Data, of course, brings its advantages to manufacturing businesses as it helps them automate large-scale…


In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (Label or dependent variable) and one or more exploratory variables (Features or response or independent variables). The case of one explanatory variable is called a simple linear regression. For more than one explanatory variable or response, the process is called multiple linear regression.

In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models.

Most commonly, the conditional mean of the response given the values of the explanatory variables…


Logistic Regression is a statistical approach which is used for the classification problems. In statistics, the logistic model (or logit model) is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be combined to model several classes of events such as determining whether an image contains a cat, dog, lion, etc… Each object is detected in the image would be assigned a probability between 0 and 1 and the sum adding to one.

Difference between linear regression and logistic regression

Types of logistic regression:

  1. Binary (eg. Tumor Malignant or Benign)
  2. Multi-linear functions fails Class (eg. Cats, dogs or Sheep’s)


Interaction plots are used to understand the behavior of one variable depends on the value of another variable. Interaction effects are analyzed in regression analysis, DOE (Design of Experiments) and ANOVA (Analysis of variance).

This blog will help you to understand the interaction plots and its effects, how to interpret them in statistical designs, and the problems you will face if you don’t include them in your statistical models.

In any statistical study, whether it’s a product development, manufacturing process, simulation, health, testing and so on. Many variables can affect the expected outcome (Response). …

Suresha HP

Machine Learning & Artificial Intelligence developer, researcher and educator with over 16 years experience in the Automotive and Manufacturing industry

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store