An Introduction to Data
Data Dimensionality and Space
Proximity in Data Science Context
The simplest form of data
A table is probably the simplest form of data. Surprisingly, most implementations of data science algorithms still today use tabular data as inputs. Data scientists prefer to convert any type of complex data — such as text, image, or time series — to tables to make sure that existing tools can be leveraged for analysis.
As an example, let us say that a company keeps information about its employees in an excel table. Here is the table.
|Name||Salary ($)||Age (Years)|
One reason for the popularity of tabular representation is the ease in storing the tabular data directly in the main memory of the computer. Regardless of the number of rows or columns, a tabular dataset can always be stored in a 2-dimensional array. We will discuss more on this in the upcoming lessons.