Data scientists: How can I use my data?

I have been asked numerous times, I got access to the database can you please tell me how to use the data?

Since I have been asked this more and more,  I got some time to answer it here and help people with this question.

What do you want to do with your data?

First and foremost, what are you trying to do with the data?
Ask yourself, your manage, friend or whoever is making you do something with the data, what do you want the data to show you?
Most of the times the data is powerful as much as you can understand it. Here is how you can do that:

Understand how your data is linked

Databases no matter relational or non relational have schemas. This shows where specific attributes are stored, in which tables or objects, also shows how tables or objects are connected between themselves, the linking.

What is an attribute? Attribute is basically everything that is descriptive, name, surname, dress, profit etc..
What is table or object? Table or object is the structure that is holding or grouping the attributes.
What are links or keys? Links, keys and foreign keys are basically information that allows you linking one table to another. For example you want to link the profit to a sales person, or address to a person, you will use linking or joining to the tables. Mostly this is done using foreign keys or by joining multiple attributes – creating composite key. 

How to learn the links between the data

Some people try to learn the schema all at once, seeing the tables their attributes and how they link to each other.
My suggestion is to learn by doing. Lately in the world of big data, the database systems are getting too complex to be learnt all at once and most of the times you won’t need to know it all. Learning it by practical use cases can help you understand not just table structure but also the underlaying dat.

How do I get the data?

Usually we use SQL to query the databases. SQL is the fastest and best performing way to do it.

Other ways can be using code: Java, .Net, R, Python and what not else.
Excel, you can query data easily using Excel while creating Pivot table.
Lately Data Scientist are using tools like KNIME, Alteryx to fetch the data. Using this approach does not requires knowing any query language, but you might face the risk of downloading gigabytes of data in your memory or disk if the table you are trying to query is that big.
Query with Excel:
Query with Javascript: 

Use your imagination.

Once you succeed in getting your data you should start using your imagination and think of making useful use case scenarios that will help your business.

Visualize your data

Plain data is boring so that is why we visualize it.

Easy ways to do that is using Excel. Excel is really powerful by itself and can create pretty charts.
Some other popular tools are Tableau, QlickView and Jaspersoft, HighCharts
 Off course,There are endless other solutions that can create pretty charts.
One word of caution about visualization, don’t try to over visualize things because they can become really confusing. Also try to use few colors instead of using all color palette, so other people can follow you.


Create a story with your data

Now that you have your use case and cool visualization, try to create a story.
People will understand what you want to say and even get new ideas if you tell your analysis in a nice story.
Happy data mining, now when you know what to do with your data!

One thought on “Data scientists: How can I use my data?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.