An Introduction to Data
Data Dimensionality and Space
Proximity in Data Science Context
Objects and features of a data table
The table below is from the previous lesson. Real-life data is much more complex than this one. Many of the datasets have hundreds or even several thousand columns. The number of rows can go beyond millions. For example, consider a data table covering citizens of a country. There can be millions of rows.
|Name||Salary ($)||Age (Years)|
Data is generally described using two things — objects and features. Let us speak in terms of the data table above, which we are using as our running example.
In the table above, the two columns — salary and age — are features. Notice that there are five people, whose names are placed in five rows. Each row is an object. Generally, in tabular form, we place objects in rows and features in columns.
An object is characterized/explained/defined by features. For example, given the data table above, Jane is defined or explained as a person of salary 90000 and age 52. That is two features — salary and age — explains Jane. Similarly, John is explained as a salary and age combination of 85000 and 48.
Notice that the name of Jane could be Person1, John could be Person2, Delilah could be Person3, so and so forth. The name column is not a feature of this dataset, rather it is an object identifier. The name column helps us in understanding which row belongs to whom.
Similar to the name column being an identifier, the header row which has the actual texts “Name”, “Salary ($)”, “Age (Years)”, is nothing but a descriptor that states what to call the columns. Therefore, the actual data part of the table above is composed of numbers only, as shown below.
We practically use the data content table as a matrix. In this matrix, we have five rows and two columns. That is, we have five objects and two features in the table.