What is a dimension?

You might have heard the word “dimension.” You might have heard people say the term “high dimensional data.” Let us discuss what this term dimension means.

Here is the tabular data from the previous lesson.

Name	Salary ($)	Age (Years)
Jane	90000	52
John	85000	48
Delilah	75000	32
Dave	90000	53
Ellen	82000	44

We said that the actual data part in the table above is:

90000	52
85000	48
75000	32
90000	53
82000	44

In this running example, we have two features or two columns, as explained in the previous lesson. We have five objects or five rows.

We call the data of our running example a two-dimensional dataset. That is the number of features is equal to the number of dimensions of the dataset. Again, the table above is a two-dimensional dataset because the table has two features or columns.

That is:

Number of features = number of dimensions

If we had three features or three columns, we would have called this a three-dimensional dataset. An example is provided below. The table below has three features and five objects.

90000	52	10
85000	48	20
75000	32	30
90000	53	40
82000	44	20

If we had four features or four columns, we would have called this a four-dimensional dataset. An example is below.

90000	52	10	50
85000	48	20	60
75000	32	30	30
90000	53	40	35
82000	44	20	40

I am sure, the idea is clear by this time. If the dataset has 1 feature, it is called, 1-dimensional; with 2 features it is called 2-dimensional, so and so forth. With n features or n columns, the data is called n-dimensional.

Feature 1	Feature 2	Feature 3	Feature 4	—- —-	Feature n
90000	52	10	50		43
85000	48	20	60		2
75000	32	30	30		73
90000	53	40	35		36
82000	44	20	40		90

Notice one thing here — regardless of the number of features or number of columns, or the number of dimensions, the data table can be stored in a two-dimensional array. That is, even one hundred-dimensional dataset can be kept in a 2D array or in a 2D matrix.

The word “dimension” in programming is used to count the number of cells. In data science, the word “dimension” has a different meaning. “Dimension” in data science refers to the mathematical space, such as the Euclidian space.

As an example, the following data table has three columns or three features. There are five objects or five rows.

90000	52	10
85000	48	20
75000	32	30
90000	53	40
82000	44	20

In programming, we will say that this table can be stored in a 2-dimensional array of size 5 times 3. That means, it has five rows and three columns.

In data science, this table is called a three-dimensional dataset because it composes a mathematical space of three dimensions.

Similarly, a data table with four columns, such as the following one, is referred to as a four-dimensional dataset even though we store it in a two-dimensional array.

90000	52	10	50
85000	48	20	60
75000	32	30	30
90000	53	40	35
82000	44	20	40

That is a higher number of features would mean a higher number of dimensional mathematical space. The physical memory space is the memory occupied with the corresponding two-dimensional array. The physical memory is a programming concept and always a 2-dimensional array for an any-dimensional dataset.

9 Comments

Abdul Hamid Butt

November 17, 2024 3:53 pm

I will

Dami

May 14, 2024 2:00 pm

I learnt from this lesson, that number features in a data set = number of dimensions.
I also learnt that high dimensions are no visualise because 2D and 3D are the only dimensions we can visualise thank you for this clarity sir.

Aniwari

April 28, 2024 10:54 am

Can I rewrite the tests I failed

Aniwari

April 28, 2024 10:53 am

How did I fail b’cos I didn’t attempt answering even a single question ?

Nonhlanhla

April 28, 2024 10:41 am

Very interesting topic

Augustine Tarq Williams

November 22, 2023 11:52 pm

This is very interesting