You'll probably see a lot of CSV files in the workplace, or generate them from the vast ocean of spreadsheets that are floating around the average office. But that won't always be the case, and sometimes you're going to need to tap directly into an existing database or build your own. So how do you practice working with a database?
An Open Source project whose parent company is Oracle, MySQL also happens to be nearly ubiquitous. It's easily one of the most popular database solutions, in use everywhere from mom and pop shops to government entities to major enterprises. It's free and not too difficult to install or use, so it's ideal for our purposes in the lab.
We'll need to take the following steps to get the sales data into MySQL:
- Download MySQL and install
- Start the MySQL database server
- Start the MySQL command line
- Create a database and table with a schema that matches the field in our CSV
- Import the CSV, ignoring the headers
As you'll see, none of these steps are particularly difficult.
Download and Install
You can download MySQL from SourceForge by clicking here. Don't go to MySQL.com unless you want to sign up for an account with Oracle and give them your name, address, telephone number, etc.
As you can see from the screenshot, SourceForge figured out I need the installer for OSX. There is also download for Windows, as well as nearly every flavor of Linux. If you want a different version from what the downloader chooses for you, just click on the "Browse all Files" link below the green button and choose whatever you need.
Since I'm using the Mac OSX DMG version, I'll just need to run the installer by double-clicking it and accepting the defaults:
Installation on Linux is a little more involved. You can find the instructions by clicking here. Note that on some distributions of Linux, MySQL is pre-installed. For example, on CentOS and Red Hat, it's as simple as:
sudo yum install mysql-server
I don't recommend using Windows for your lab, but if you insist go watch this video to learn how to install MySQL.
If you run into any problems go here for assistance with troubleshooting on Linux, or go here for assistance with troubleshooting on Mac.
Starting the Database
Once MySQL is installed you can start the database by issuing the following command on OSX:
sudo /usr/local/mysql/support-files/mysql.server start
For Linux use one of these instead (depending on your distribution):
service mysqld start
If neither of those work, use "mysql" instead of "mysqld"
Securing Your Installation
Securing MySQL isn't particularly difficult. Type the following:
and select the defaults for all the options. Easy.
Starting the CLI
Now that we have the database installed and running, we'll need to start MySQL's command line. Typing MySQL commands into the server's shell won't get us anywhere.
mysql -u root -p
...and then enter the password.
mysql -u root -p
Note: I know this is just a lab, but your should assign a password to the root account and probably create a different account for yourself. Otherwise MySQL becomes a security risk on the server.
Creating the Database and Table
From the MySQL CLI we can create a table with a schema to match our data. To do so, issue the following commands (the same in OSX and Linux):
CREATE DATABASE IF NOT EXISTS sales;
Once the database is created, we have to specify that's the one we want to work on.
Now we can create the table for our data. While doing so we'll have to specify the name and data type of each column.
CREATE TABLE sales1 (Transaction_date DATETIME,Product TEXT,Price DECIMAL (15,2),Payment_Type TEXT,Name TEXT,City TEXT,State TEXT,Country TEXT,Account_Created DATETIME,Last_login DATETIME,Latitude TEXT, Longitude TEXT);
There are several data types available in MySQL, and choosing the right one for the data in a column is critical for avoiding future problems. You can find a list of the types and their meanings here.
To verify that the table was created correctly, ask MySQL to describe it:
mysql> DESCRIBE sales1;
| Field | Type | Null | Key | Default | Extra |
| Transaction_date | datetime | YES | | NULL | |
| Product | text | YES | | NULL | |
| Price | decimal(15,2) | YES | | NULL | |
| Payment_Type | text | YES | | NULL | |
| Name | text | YES | | NULL | |
| City | text | YES | | NULL | |
| State | text | YES | | NULL | |
| Country | text | YES | | NULL | |
| Account_Created | datetime | YES | | NULL | |
| Last_login | datetime | YES | | NULL | |
| Latitude | text | YES | | NULL | |
| Longitude | text | YES | | NULL | |
12 rows in set (0.29 sec)
Importing the CSV
Finally, we can import the CSV into our database. We won't need the headers in the CSV anymore, so to import while excluding those we'll issue the following command:
LOAD DATA LOCAL INFILE '/path/to/file/Sales.csv' INTO TABLE sales1 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r' IGNORE 1 LINES;
Note that since I'm using a Mac I specified that lines are terminated by '\r'. If that doesn't work for you, use the following command to delete the data from your table:
DELETE from sales1
Then try the LOAD command again, specifying '\n' for field termination.
Verifying Your Work
To verify that the data imported to the table correctly, we can look at the first 10 rows:
SELECT * FROM sales1 LIMIT 10
You should now have a working database with a table that is populated with our data. Soon we'll look at how to access that data from external applications, and even how to import it into Hadoop. Stay tuned.