In the data science world, some of the best stuff is free. I’ve already posted about free books and some of the better videos on YouTube, so now let’s put together a list of software tools. Some of these are limited versions of commercial software. Others, like R, are Open Source packages that have become the go-to standards in their area. Enjoy.
Tableau Public – Very limited version of a very popular visualization package. Great for learning to assemble dashboards but doesn’t allow local storage of data, which limits its usefulness.
RapidMiner Studio – If you want to move past visualization and get into data mining, it’s hard to do better than RapidMiner. The free download won’t import anything other than text files or CSVs and it’s limited to 1 Gig of memory, but when you think about it there’s a lot you can do with that. You can also get a two-week trial of the full version. The best part? Very little coding knowledge is necessary.
WEKA – Similar in purpose to RapidMiner, but completely different in use. Written in Java and runs on almost any operating system.
KNIME – Yet another free data mining app. Can use all of the modules in WEKA, and also incorporates plugins that allow integration with R. Powerful stuff, and a go-to tool for even the biggest companies.
Cloudera Quickstart – All of the Big Three (Cloudera, MapR, and Hortonworks) have free versions of their distributions due to the fact that Hadoop is Open Source. I’m partial to the Cloudera distribution so I’ve included it in this list. Includes such gems as Hive, Hbase, Spark, etc.
MySQL – Sooner or later you’re going to need to work with a database, and there are few free choices in such widespread use as MySQL.
MySQL Workbench – A GUI for MySQL. Optional, but extremely useful.
R – This free software package remains the most popular programming language for data science, despite the recent surge of popularity by python. If you’re going to be a data scientist, sooner or later you’re going to need R skills.
R Commander – A popular GUI for R. Comes as a collection of plugins for different uses.
R Studio – If the idea of developing in R from the command line makes you want to poke yourself in the eye with a sharp stick, save yourself the pain and get an IDE (R Studio) instead.
Continuum Anaconda – A powerful collection of Python software in one download, including pandas, numpy, skit-learn, and Jupiter.
Feel free to add any I have missed in the comments below.