Data items processed by computers form a data hierarchy that becomes larger and more complex in structure as we progress from the simplest data items (called “bits”) to richer data items, such as characters, fields, and so on. Figure 1.1 illustrates a portion of the data hierarchy.
The smallest data item in a computer can assume the value 0 or the value 1. Such a data item is called a bit (short for “binary digit”—a digit that can assume either of two values).
It’s remarkable that the impressive functions performed by computers involve only the simplest manipulations of 0s and 1s—examining a bit’s value, setting a bit’s value and reversing a bit’s value (from 1 to 0 or from 0 to 1). We discuss binary numbers (and closely related octal and hexadecimal numbers) in more detail in Appendix D, Number Systems.
It’s tedious for people to work with data in the low-level form of bits. Instead, we prefer to work with decimal digits (0–9), letters (A–Z and a–z), and special symbols (e.g., $, @, %, &, *, (, ), –, +, ", :, ? and / ). Digits, letters and special symbols are known as characters.
The computer’s character set is the set of all the characters used to write programs and represent data items on that device. Computers process only 1s and 0s, so every character is represented as a pattern of 1s and 0s. The Unicode character set contains characters for many of the world’s languages. C# supports several character sets, including 16-bit Unicode ® characters that are composed of two bytes—each byte is composed of eight bits. See Appendix B for more information on the ASCII (American Standard Code for Information Interchange) character set—the popular subset of Unicode that represents uppercase and lowercase letters in the English alphabet, digits and some common special characters.
Just as characters are composed of bits, fields are composed of characters or bytes. A field is a group of characters or bytes that conveys meaning. For example, a field consisting of uppercase and lowercase letters could be used to represent a person’s name, and a field consisting of decimal digits could represent a person’s age.
Several related fields can be used to compose a record. In a payroll system, for example, the record for an employee might consist of the following fields (possible types for these fields are shown in parentheses):
• Employee identification number (a whole number)
• Name (a string of characters)
• Address (a string of characters)
• Hourly pay rate (a number with a decimal point)
• Year-to-date earnings (a number with a decimal point)
• Amount of taxes withheld (a number with a decimal point)
Thus, a record is a group of related fields. In the preceding example, all the fields belong to the same employee. A company might have many employees and a payroll record for each.
Files A file is a group of related records. [Note: More generally, a file contains arbitrary data in arbitrary formats. In some operating systems, a file is viewed simply as a sequence of bytes—any organization of the bytes in a file, such as organizing the data into records, is a view created by the programmer.] It’s not unusual for an organization to have thousands or even millions of files, some containing billions or even trillions of characters of information. You’ll work with files in Chapter 17.
A database is a collection of data that’s organized for easy access and manipulation. The most popular database model is the relational database in which data is stored in simple tables. A table includes records and fields. For example, a table of students might include first name, last name, major, year, student ID number and grade point average fields. The data for each student is a record, and the individual pieces of information in each record are the fields. You can search, sort and otherwise manipulate the data based on its relationship to multiple tables or databases. For example, a university might use data from the student database in combination with data from databases of courses, on-campus housing, meal plans, etc. We discuss databases in Chapter 22.
The amount of data being produced worldwide is enormous and growing quickly. According to IBM, approximately 2.5 quintillion bytes (2.5 exabytes) of data are created daily and 90% of the world’s data was created in just the past two years!3 According to an IDC study, approximately 1.8 zettabytes (equal to 1.8 trillion gigabytes) of data was used worldwide in 2011.4 Figure 1.2 shows relationships between byte measurements.