Postgres and MySQL are great Open Source databases in use by organizations of all sizes. Both of them are flexible and somewhat scalable. Both of them have nice GUI front-ends available to make them even easier to use.
Sequel Pro for MySQL is a handy GUI frontend that runs on Macs, and one of the things I like most is that it will infer the schema of a CSV on import. This saves a lot of time since it’s not necessary to create the table and define the columns ahead of time. Unfortunately Sequel Pro can also be a bit buggy.
pgAdmin is a similar tool designed for Postgres, and is more stable than Sequel Pro in daily use. But like a great many other database tools, pgAdmin won’t infer schema (and if I’m wrong about that, please comment below). So if you need to import a CSV with a lot of columns and you’d rather not define them all yourself, what can you do?
Back to the Command Line
One of the most useful means to deal with the problem is to use Pandas. Pandas dataframes will infer schema when you read in a CSV, and the “to_sql” function will load dataframes into a few different databases. Here’s an example of Pandas reading a CSV located at /Users/me/Desktop/mydata.csv and then loading it into a Postgres table named yourdata in a database called hisdb.
First we have to import Pandas, and “create_engine” from sqlalchemy:
from sqlalchemy import create_engine
import pandas as pd
Now we use create_engine to create a connection to the Postgres database:
engine = create_engine(‘postgresql://user:password@localhost/hisdb’)
Then we can load our CSV into a Pandas datframe, and use the “to_sql” to load that dataframe into a table called “yourdata” over the engine connection:
df = pd.read_csv(‘/Users/me/Desktop/mydata.csv’)
Done and done. On occasion it will be necessary to change the data type of a column because Pandas guessed wrong, but that’s a minor inconvenience when compared to manually defining dozens of columns ahead of time.