Here is what you should NOT do when you start studying machine learning in Python.
- Get really good at Python programming and Python syntax.
- Deeply study the underlying theory and parameters for machine learning algorithms
- Avoid or lightly touch on all of the other tasks needed to complete a real project.
I think that this approach can work for some people, but it is a really slow and a roundabout way of getting to your goal. It teaches you that you need to spend all your time learning how to use individual machine learning algorithms. It also does not teach you the process of building predictive machine learning models in Python that you can actually use to make predictions.
Sadly, this is the approach used to teach machine learning that I see in almost all books and online courses on the topic.
Lessons: Learn how the sub-tasks of a machine learning project map onto Python and the best practice way of working through each task.
Projects: Tie together all of the knowledge from the lessons by working through a case study predictive modeling problems.
Recipes: Apply machine learning with a catalog of standalone recipes in Python that you
can copy-and-paste as a starting point for new projects.
You need to know how to complete the specific subtasks of a machine learning project using the Python ecosystem. Once you know how to complete a discrete task using the platform and get a result reliably, you can do it again and again on the project after project. Let’s start with an overview of the common tasks in a machine learning project. A predictive modeling machine learning project can be broken down into 6 top-level tasks:
- Investigate and characterize the problem in order to better understand the goals of the project.
- Analyze Data: Use descriptive statistics and visualization to better understand the data you have available.
- Prepare Data: Use data transforms in order to better expose the structure of the prediction problem to modeling algorithms.
- Evaluate Algorithms: Design a test harness to evaluate a number of standard algorithms on the data and select the top few to investigate further.
- Improve Results: Use algorithm tuning and ensemble methods to get the most out of well-performing algorithms on your data.
- Present Results: Finalize the model, make predictions and present results.