When you think about becoming a Data Scientist, one of the first questions that will come on your mind is: What do I need to start, what tools do I need?
Well, today I’ll share my secret tools, and in multiple series of posts, I’ll try to make you proficient in it.
Basically, I don’t use super fancy tools like the one on CSI.
I use Excel, notepad, R, Python – those two are really popular nowadays. I have been using Microsoft BI full stack (SSRS, SSIS, SSAS), JENA, ENCOG, RapidMiner and what not else.
Starting in my latest company, I was introduced to a little gem called KNIME.
Surprisingly enough KNIME was quoting really good sufficient by that time, but I haven’t got the time to explore it before.
But being wholly outsourced unlike his peers, KNIME really outshines the others.
What Gartner Says about KNIME:
KNIME (the name stands for “Konstanz Information Miner”) is based in Zurich, Switzerland. It offers a free, open-source, desktop-based advanced analytics platform. It also provides a commercial, server-based solution providing additional enterprise functionality that can be deployed on-premises or in a private cloud. KNIME competes across a broad range of industries but has a large client base in the life sciences, government and services sectors.
- Almost every KNIME customer mentions the platform’s flexibility, openness, and ease of integration with other tools. Similar to last year, KNIME continues to receive among the highest customer satisfaction ratings in this Magic Quadrant.
- KNIME stands out in this market with its open-source-oriented go-to-market strategy, large user base and active community — given the small size of the company.
- Many customers choose KNIME for its cost-benefit ratio, and its customer reference ratings are among the highest for good value.
- The most common customer complaints are about the outdated UI (which was recently updated in version 3.0 in October 2015, so few customer references have seen it) and a desire for better-performing algorithms for a distributed big data environment.
- Customers also expect a high level of interactive visualizations from their tools. KNIME lacks in this area, requiring its customers to obtain this from data visualization vendors such as Tableau, Qlik or TIBCO Spotfire.
- Some customers are looking for better insight into and communication of the product roadmap, but they do give KNIME high scores on including customer requests into subsequent product releases.
Read more here.
I strongly encourage you to download and get familiar with KNIME.
Do I use only KNIME as a Data Science Platform?
No. The beauty of KNIME is that can easily integrate with external solutions, Weka, R, Python. KNIME is very solid in building Predictive models, but sometimes I make models in R because I find libraries that I personally think are better to be used than KNIME native libraries.
After that, I integrate R models in KNIME using their R task. Works like a charm.
Use Database systems too. Please.
One thing that I should be careful while using these platforms is their memory consumption. R draining computer’s memory because it loads everything in a buffer. KNIME is similar. Therefore I use a database system to filter out the data before I load it in R or KNIME.
Database systems are must use. If you want to build effective and fast models, you need to let the Database system handle the vast amount of data first and then load it to the analytical platform. The choice of which platform you should use really depends on your company policy. It is terrific if you have a distributed database systems like Hadoop where you can run SQL operations on Big Data and then sent limited datasets to KNIME and R.
It will be fast, and it will save you a lot from a painful experience like filling up the DB Memory buffer.
However, if you don’t have a distributed Database system, the conventional Database system will do as well. Mayor task as a Data scientist makes the standard database system work with your data too 🙂
What is next?