Page 1 of 1

This data visualization is used in many cases and meets different needs.

Posted: Tue Jan 07, 2025 9:48 am
by tanjimajuha20
A box plot or whisker plot is an element that aims to show the distribution of data related to a continuous variable. This box plot is also called a box diagram. Other names are found, such as, for example, boxes with legs or Tukey box. It is the graphical representation of a statistical series. Its purpose is to saudi arabia phone data help see the center and the distribution of the data. This box plot is an easy way to present the basic profile of a quantitative statistical series.

What is a box plot?
The box plot was invented in 1977 by John Tukey. It summarizes some position indicators of the character being studied. We thus find on this graph different elements:

Median;

Quartiles;

Minimum ;

Maximum or deciles.

This diagram is mainly used to compare the same data in two different groups (this is the case, for example, of populations of different sizes).

The basis of the box plot is to draw a rectangle that goes from the first to the last quartile, cut by a median. This rectangle is sufficient to represent a box plot. However, segments are added to the ends. These segments lead to the extreme values ​​(up to the first and ninth deciles).

In the first box plots, the length of the whiskers corresponds to 1.5 times the interquartile range. From the beginning, these diagrams were used in sectors for which the data can be modeled via a normal distribution . The theory then shows that the ends of the whiskers are close to the first and 99th percentiles. The diagram is then used to discover exceptional data.

With the box plot , it is then possible to have an overview of the data without waiting.

The box plot represents the distribution of data for a continuous variable. It helps to see the center, but also the distribution of the data . It is thus possible to use it as a visual tool that will allow you to check normality, but also to identify outliers. This statistical distribution is obtained by associating classes of values ​​obtained via the experience of their appearance.

A box plot consists of different parts, the main ones being:

The center line. This indicates the median of the data. Half of the data is higher, half is lower. If the data is symmetrical, the median will be in the center of the box. When the data is not symmetrical, the median will be closer to the top or bottom of the box.

Percentiles. The bottom and top of the box will show the 25th and 75th quantiles, also called percentiles or quartiles. Each one corresponds to a quarter of the data. The length of the box is therefore the difference between the two percentiles. This will then be called the interquartile range (IQR).

Whiskers are the lines that extend from the box. These represent the variation that is expected in the data. They extend 1.5 times the IQR from the top and bottom of the given box. In the case where the data does not extend to the ends of the whiskers, the whiskers extend to the minimum and maximum data values. Where values ​​fall above or below the ends of the whiskers, they are represented as points, called outliers. An outlier is more extreme than the expected value. It is important to verify that it is an outlier and not an error. The whiskers do not include these outliers.

It can then be very useful to compare the outliers to the quantiles . We can then find the median of the 25th and 75th percentiles. This corresponds to the 25% of the data that are below the 25th quantile.

The box plot then adds the 2.5th, 10th, 90th, and 97.5th quantiles into the box plot. These quantiles are the outliers.

The data visualization enabled by the box plot concerns all sectors of activity. It can be used in scientific and technical sectors, in administration, finance, marketing, services and even sport. Taking advantage of information via a visualization, such as the box plot, is essential today. It allows messages to be conveyed visually and to better take advantage of the information obtained. Several professions use this box plot to process information more quickly and easily. This may be the case for the data scientist , who will thus be able to respond more simply to market needs and make decisions.

Knowing how to create a box plot is part of the knowledge that must be mastered and this is learned during studies. If you are preparing an MBA in finance and data , you will be able to see this.

To simplify your work and obtain a box plot more easily, it is possible to use the Internet. Use a dedicated application and copy your data table. Select the appropriate tab and click on the variables you want to create the box plot .