In applying statistics to a problem, it is common practice to start with a population or process to be studied. Populations can be diverse topics such as “all people living in a country” or “every atom composing a crystal”. Ideally, statisticians compile data about the entire population (an operation called census). This may be organized by governmental statistical institutes. Descriptive statistics can be used to summarize the population data. Numerical descriptors include mean and standard deviation for continuous data (like income), while frequency and percentage are more useful in terms of describing categorical data (like education).

A standard statistical procedure involves the collection of data leading to test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. The whole idea behind advanced statistics is to make decisions, make predictions based on modeled data.

It is vital to have an idea about the process of data collection before moving on to examine any form of statistical relationship that could exist between the data sets. This statistical technique involves the use of sophisticated mathematical notation that describes the relationship between various parameters. For many people, statistics might seem rather intimidating because it involves a lot of jargon which seems quite confusing at first glance. However, this should not pose a problem because it is important to grasp these concepts in order for one to be able to fully appreciate all aspects of statistical analysis

A hypothesis is a tentative statement about the relationship between two or more variables. It should be testable by an experiment or observation. The hypothesis is derived from background research, and may provide a rationale for the experiment performed. A hypothesis may be derived from the background research results, or it may be an independent variable that is not inherently tied to background research.

The null hypothesis is generated from the statistical relationship between the two data sets and the alternative hypothesis generated from synthetic data generated from an idealized model. The generated null and alternative hypotheses are then compared and a statistic calculated using standard methods. From this statistic a conclusion is made whether the relation between the two data sets matches the idealized model. If it does not, we can reject the null hypothesis and accept the alternative hypothesis as we may be close to discovering a true statistical relationship between the two data sets. The significance of this statistic depends on its probability distribution, which depends on random sampling theory.

What is correlation? Correlations can be positive, negative, or zero. These correlations play a significant part in the way we analyze data and gain insight into a problem. They are a powerful tool that can be used to gain a better understanding of not just our data but also the theories involved with the problem. The result of a correlation test is usually displayed on a scatter plot chart of data from both variables plotted on the axis. In addition, if there is an inverse relationship between the two variables on a scatter plot chart, you can add another dimension to your analysis by creating a correlation matrix.

The application of the statistics falls into the categories of Descriptive Statistics and Analytics. Descriptive statistics provides a summary or picture of data and helped in assessing the presence and types of variation present in the data and it is mostly used and required for management to take decisions. Analytics, on the other hand, provides insights from data sets to make predictions about how factors might change over time.

Around since the 17th century, statistics has emerged as a formal science which is applied in most fields of social science. It can be defined as the study of information about data in order to use them for problem solving. Data can be represented numerically; symbolically; graphically; by count etc. Statistics belongs to the field of mathematical sciences, although it utilizes inputs from other disciplines like engineering like computer applications, biochemistry like bio statistical applications and sociology like social statistical applications.