Data Scientists need to become an Adaptive Thinkers

Data Scientist adaptive thinkers

Google exposed AutoML, an automated machine learning system that could produce a synthetic intelligence solution without the help of a human engineer. IBM Cloud and Amazon Web Services (AWS) offer machine learning options that do not need AI developers. GitHub and other cloud platforms already provide thousands of machine learning programs, minimizing the requirement of having an AI professional at hand. These cloud platforms will slowly, however, undoubtedly decrease the need for expert system developers. Google Cloud’s AI supplies automated machine learning services. Microsoft Azure provides easy to use machine learning user interfaces. At the same time, Massive Open Online Courses (MOOC) are thriving all over. Anybody anywhere can choose up a machine learning option on GitHub, follow a MOOC without even going to college, and beat any engineer to the job. Today, the expert system is primary mathematics translated into source code that makes it challenging to discover for traditional developers. That is the primary reason why Google, IBM, Amazon, Microsoft, and others have ready-made cloud solutions that will need fewer engineers in the future. As you will see, you can occupy a primary function in this brand-new world as an adaptive thinker. There is no time at all to waste. In this article, we are going to dive rapidly and directly into reinforcement learning, among the pillars of Google Alphabet’s DeepMind asset (the other being neural networks).
Reinforcement learning often utilizes the Markov Decision Process (MDP). MDP contains a memoryless and unlabeled action-reward formula with a learning criterion. This equation, the Bellman equation (frequently created as the Q function), was used to beat first-rate Atari players. The objective here is not to take the easy path. We’re aiming to break intricacy into reasonable parts and challenge them with the truth.
You are going to find out right from the start how to apply an adaptive thinker’s procedure that will lead you from an idea to service in reinforcement learning, and right into the center of gravity of Google’s DeepMind jobs.

I wrote before about What are the most important soft skills for data scientists? Adaptive thinking is one more.

How to be an adaptive thinker?

Reinforcement learning, among the foundations of machine learning, expects to learn through trial and mistake by communicating with an environment. This sounds familiar, ideal? That is what we humans do all our lives in discomfort! Attempt things, examine, and after that continue; or try something else. In reality, you are the agent of your idea procedure. In a machine learning model, the agent is the function of computing through this trial-and-error procedure. This believed process in machine learning is the MDP. This form of action-value education is often called Q. To master the outcomes of MDP in theory and practice, a three-dimensional approach is a requirement. The three-dimensional technique that will make you an artificial expert, in basic terms, means: Starting by describing an issue to resolve with real-life cases Then, developing a mathematical design Then, compose source code and/or using a cloud platform option It is a method for you to go into any project with an adaptive mindset from the beginning.

Addressing real-life issues before coding a solution

You can find tons of source code and examples on the web. However, most of them are toy experiments that have nothing to do with real life. For example, reinforcement learning can be applied to an e-commerce business delivery person, self-driving vehicle, or a drone. You will find a program that calculates a drone delivery. However, it has many limits that need to be overcome. You as an adaptive thinker are going to ask some questions:
What if there are 5,000 drones over a major city at the same time? Is a drone-jam legal?
What about the noise over the city?
What about tourism?
What about the weather?
Weather forecasts are difficult to make, so how is this scheduled?
In just a few minutes, you will be at the center of attention, among theoreticians who know more than you on one side and angry managers who want solutions they cannot get on the other side. Your real-life approach will solve these problems.

A foolproof method is the practical three-dimensional approach:

  • Be a subject professional: First, you have to be a topic professional. If a theoretician geek comes up with a hundred Google DeepMind TensorFlow operates to resolve a drone trajectory issue; you now know it is going to be a hard ride if real-life specifications are considered. An SME knows the subject and hence can rapidly identify the crucial elements of a given field. The expert system typically needs finding a solution to a severe issue that even a professional in a given area can not reveal mathematically. Machine learning, in some cases, means finding an option to a problem that humans do not understand how to explain. Deep knowing, including complex networks, resolves a lot more challenging issues.
  • Have enough mathematical knowledge to comprehend AI concepts: Once you have the appropriate natural language analysis, you require to build your abstract representation rapidly. The very best way is to browse in your everyday life and make a mathematical design of it. Mathematics is not an option in AI, but a prerequisite. The effort is worthwhile. Then, you can start writing a reliable source code or begin executing a cloud platform ML service.
  • Know what source code is about as well as its perspective and limitations: MDP is an excellent method to go and begin working in the three measurements that will make you adaptive: describing what is around you in information in words, translating that into mathematical representations, and then executing the result in your source code.

Change and uncertainty are the only definites. The ability to change behavior when faced with unpredicted circumstances is crucial in the technological future unfolding around us. The Internet and social media have changed the way we connect and communicate. Machines are taking over jobs in the service industry, and global outsourcing is the new normal.  As a result, high and low skilled jobs are now flooding the market. One essential both have in common is the need for workers to develop novel and adaptive thinking in order to survive in the fast-paced fast-changing global world we now live in.

Daily we are confronted with new possibilities and unpredictability. The ability to think through problems, acting swiftly, while negotiating fear of the unknown is the foundation of novel and adaptive thinking.

The more you practice adaptive thinking the easier it will come. Follow these steps and you will surely be on your way to perfecting a powerful skill for the workplace.

image source:

Reasons Why Your Data Science Project is Likely to Fail

Businesses are creating ahead with digital improvement at an unmatched rate. A current survey by Gartner Research discovered that 49 percent of CIOs are reporting that their company has already altered their business designs to scale their digital undertakings or are in the procedure of doing so.

As companies create ahead with these changes, they are instilling data science and machine learning into various company functions. This is not a simple job. A typical enterprise data science task is extremely complicated and requires the release of an interdisciplinary team that includes assembling data engineers, developers, data scientists, topic specialists, and people with other special abilities and understanding.

Additionally, this talent is limited and costly. In reality, only a little number of companies have actually been successful in building a skilled data science practice. And, while making this team takes time and resources, there is an even more significant problem faced by a number of these companies: more than 85 percent of big data jobs fail.

A variety of factors add to these failures, including human aspects, and challenges with time, ability, and impact.

Lack of Resources to Execute Data Science Projects

Data science is an interdisciplinary method that includes mathematicians, statisticians, data engineering, software application engineers, and notably, subject matter specialists. Depending upon the size and scope of the project, companies may release numerous data engineers, an option architect, a domain specialist, a data scientist (or several), company analysts and perhaps additional resources. Lots of business do not have and/or can not manage to release sufficient funds because employing such skills is ending up being increasingly-challenging and also because company frequently has many data science tasks to carry out, all of which take months to complete.

Heavy Dependence on Data Scientists abilities, Experiences of Particular People

Traditional data science much relies on skills, experiences, and intuitions of experienced people. In specific, the data and feature engineering procedure now are mostly based upon manual efforts and instincts of domain experts and data scientists. Although such gifted individuals are valuable, the practices relying on these individuals are not sustainable for enterprise business, given the hiring challenge of such skilled talents. As such, companies need to seek solutions to help equalize data science, allowing more individuals with different ability levels to carry out on tasks effectively.

Misalignment of Technical and Company Expectations

A lot of data science projects are carried out to provide crucial insights to the business group. Nevertheless, often a task begins without precise alignment between the service and data science groups on the expectations and goals of the job, resulting in that the data science team is focused primarily on model accuracy, while the company team is more thinking about metrics such as the monetary advantages, business insights, or model interpretability. In the end, the business team does not accept the outcomes of the data science team.

Data science projects take long turnaround time and upfront effort without exposure into the possible value

Among the most significant obstacles of data science projects is the big in advance effort required, despite an absence of presence into the eventual outcome and its business value. The traditional data science process takes months to finish until the result can be examined. In specific, data and function engineering process to transform service data into a machine learning, ready format takes a huge quantity of iterative efforts. The long turnaround time and significant upfront efforts related to this approach typically lead to job failure after months of investment. As an outcome, business executives are reluctant to apply more resources.

Absence of Architectural Consideration for Production and Operationalization on Data Science projects

Numerous data science tasks begin without consideration for how the established pipelines will be deployed in production. This takes place since the company pipeline is often handled by the IT group, which does not have insight into the data science process, and the data science team is concentrated on verifying its hypotheses and does not have an architectural view into production and option integration. As an outcome, instead of getting integrated into the pipeline, many data science tasks wind up as one-time, proof-of-concept exercises that fail to provide real business effect or triggers substantial cost-increases to productionalize the jobs.

End-to-end Data Science Automation is a Solution

The pressure to attain higher ROI from expert system (AI) and machine-learning (ML) initiatives has actually pressed more magnate to look for innovative options for their data science pipeline, such as machine learning automation. Picking the right service that delivers end-to-end automation of the data science procedure, including automated data and feature engineering, is the key to success for a data-driven business. Data science automation makes it possible to perform data science processes quicker, often in days instead of months, with more transparency, and to deliver minimum practical pipelines that can be improved continuously. As a result, companies can quickly scale their AI/ML initiatives to drive transformative business modifications.
However, Data science and machine learning automation can bring new types of problems, that is why I wrote before that : Guided analytics are the future of Data Science and AI

Practical Predictive Analytics in everyday enterprises

Predictive analytics is now part of the analytics fabric of companies. Even as companies continue to adopt predictive analytics, many are struggling to make it stick. Lots of organizations have not thought about how to virtually put predictive analytics to work, provided the organizational, technology, procedure, and deployment concerns they face.

These can be some of the biggest challenges organizations face today:

Skills development. Organizations are concerned about abilities for predictive modeling. These abilities consist of comprehending how to train a model, interpret output, and determine what algorithm to utilize in what circumstance. Skills are the most significant barrier to adoption of predictive analytics; many of the times, this is the top difficulty.

Model deployment. Companies are utilizing predictive analytics and machine learning throughout a series of use cases. Those checking out the technology are likewise preparing for a diverse set of use cases. Many participants are ruling out what it requires to build a valid predictive model and put it into production. Just a small number of Data Science Teams have a DevOps group, or another group that puts machine learning designs into production maintains versioning or monitors the designs. From experience, operating in this team structure, it can take months to put models into production.

Facilities. On the facilities side, the vast bulk of companies use the data storage facility, along with a variety of other innovations such as Hadoop, data lakes, or the cloud, for developing predictive designs. The bright side is that business appears to be looking to broaden their data platforms to support predictive analytics and machine learning. The relocation to contemporary data architecture to support the diverse type of data makes good sense and is required to prosper in predictive analytics.

New Practices for Predictive Analytics and Machine Learning

Since predictive analytics and machine learning abilities are in such high need, vendors are offering tooling to assist make predictive modeling easier, particularly for brand-new users. Essential to ease of usage are these functions:

  • Collaboration features. Anyone from a business analyst to a data scientist building a model often wants to collaborate with others. A business analyst may want to get input from a data scientist to validate a model or help build a more sophisticated one. Vendors provide collaboration features in their software that enable users to share or comment on models. Collaboration among analysts is an important best practice to help democratize predictive analytics.
  • Workflows and versioning. Lots of products supply workflows that can be saved and reused, including data pipeline workflows for preparing the data in addition to analytics workflows. If a data researcher or another model home builder develops a model, others can recycle the model. This frequently consists of a point-and-click interface for model versioning– crucial for monitoring the newest designs and model history– and for analytics governance.
  • GUIs. Lots of users do not like to program or even write scripts; this stimulated the movement toward GUIs (graphical user interfaces) decades earlier in analytics items. Today’s GUIs typically offer a drag-and-drop and point-and-click interface that makes it easy to construct analytics workflows. Nodes can be picked, defined, dragged onto a canvas, and linked to form a predictive analytics workflow. Some supplier GUIs enable users to plug in open source code as a node to the workflow. This supports models integrated into R or Python, for example.
  • Persona-driven features. Various users desire different user interfaces. A data scientist may want a notebook-based interface, such as Juypter note pads (e.g., “live” Web coding and collaboration user interfaces) or just a programming user interface. A business analyst may prefer a GUI user interface. A business analyst may desire a natural language-based interface to ask questions quickly and discover insights (even predictive ones) in the data. New analytics platforms have tailored environments to satisfy the requirements of various personas while maintaining reliable data stability beneath the platform. This makes structure models more efficient.

Next to read is:

What should you consider when choosing the right machine learning and AI platforms?

Important things to consider before building your machine learning and AI project

Current State of the market

In order to go in-depth on what exactly data science and machine learning (ML) tools or platforms are, why companies small and large are moving toward them, and why they matter in the Enterprise AI journey, it’s essential to take a step back and understand where we are in the larger story of AI, ML, and data science in the context of businesses:

 1. Enterprise AI is at peak hype.

Of course, the media has been talking about consumer AI for years. However, since 2018, the spotlight has turned to the enterprise. The number and type of devices sending data are skyrocketing while the cost of storing data continues to decrease, which means most businesses are collecting more data in more types and formats than ever before. Moreover, to compete and stay relevant among digital startups and other competition, these companies need to be able to use this data not only to drive business decisions but drive the business itself. Now, everyone is talking about how to make it a reality.

2. AI has yet to change businesses.

Despite the hype, the reality is that most businesses are struggling to leverage data at all, much less build machine learning models or take it one step further into AI systems. For some, it’s because they find building just one model is far more expensive and time-consuming that they planned for. However, the great majority struggle with fundamental challenges, like even organizing controlled access to data or efficient data cleaning and wrangling.

3. Successful enterprises have democratized.

 Those companies that have managed to make progress toward Enterprise AI have realized that it’s not one ML model that will make the difference; it’s thousands or hundreds. Also, that means scaling up data efforts in a big way that will require everyone at the company to be involved. Enter democratization. In August 2018, Gartner identified Democratized AI as one of the five emerging trends in their Hype Cycle for Emerging Technologies. Since then, we have seen the word “democratization” creep into the lexicon of AI-hopefuls everywhere, from the media to the board room. Also, to be sure, it’s an essential piece of the puzzle when it comes to an understanding of data science and machine learning (ML) platforms.

Is hiring Data Scientist enough to fulfil your AI and Machine learning goals?

Employing for data functions is at an all-time high. Currently in 2019, according to career listing data, a data scientist is the hottest career out there. Moreover, though statistics on Chief Data Offers (CDOs) vary, some put the figures as high as 100-fold growth in the function over the past 10 years.

Hiring data experts is a crucial element to a robust Enterprise AI strategy; however, hiring alone does not guarantee the expected outcomes, and it isn’t a factor not to invest in data science and ML platforms. For one thing, working with data scientists is costly – often excessively so – and they’re only getting more so as their need grows.

The truth is that when the intention is going from producing one ML model a year to tens, hundreds, or even thousands, a data team isn’t enough because it still leaves a big swath of employees doing day-to-day work without the capability to take advantage of data. Without democratization, the result of a Data team – even the very best one comprised of the leading data scientists – would be restricted.

As a response to this, some companies have decided to leverage their data team as sort of an internal contractor, working for lines of business or internal groups to complete projects as needed. Even with this model, the data team will need tools that allow them to scale up, working faster, reusing parts of projects where they can, and (of course) ensuring that all work is properly documented and traceable. A central data team that is contracted out can be a good short-term solution, but it tends to be a first step or stage; the longer-term model of reference is to train masses of non-data people to be data people.

Choosing the right tools for Machine Learning and AI

Opens Source – critical, but not always giving what you need

In order to be on the bleeding edge of technological developments, using open source makes it easier to onboard a team and hire. Not only are data scientists interested in growing their skills with the technologies that will be the most used in the future, but also there is less of a learning curve if they can continue to work with tools they know and love instead of being forced to learn an entirely different system. It’s important to remember, that keeping up with that rapid pace of change is difficult for big-sized corporations.
The latest innovations are usually highly technical, so without some packaging or abstraction layers that make the innovations more accessible, it’s challenging to keep everybody in the organization on board and working together.
A business might technically adopt the open source tool, but only a small number of people will be able to work with it. Not to mention that governance can be a considerable challenge if everyone is working with open source tools on their local machines without a way to have work centrally accessible and auditable.
Data science and ML platforms have the advantage of being usable right out of the box so that teams can start analyzing data from the first day. Sometimes, with open source tools (mostly R and Python), you need to assemble a lot of the parts by hand, and as anyone who’s ever done a DIY project can attest to, it’s often much more comfortable in theory than in practice. Choosing a data science and ML platform wisely (meaning one that is flexible and allows for the incorporation and continued use of open source) can allow the best of both worlds in the enterprise: cutting-edge open source technology and accessible, governable, control over data projects.

What should Machine Learning and AI platforms provide?

Data science and ML platforms allow for the scalability, flexibility,
and control required to thrive in the era of Machine Learning and AI because they provide a framework for:

  • Data governance: Clear workflows and a method for group
    leaders to monitor those workflows and data jobs.
  • Efficiency: Finding little methods to save time throughout the data-to-insights process gets business to organization value much faster.
  • Automation: A specific type of performance is the growing field
    of AutoML, which is broadening to automation throughout the data pipeline to ease inefficiencies and maximize personal time.
  • Operationalization: Effective ways to release data jobs into production quickly and safely.
  • Collaboration: A method for additional personnel working with data,
    much of whom will be non-coders, to add to data tasks in addition to data scientists (or IT and data engineers).
  • Self-Service Analytics: A system by which non-data expert from various industries can access and deal with data in a regulated environment.

Some things to consider before choosing the AI and MAchine Learning platform

Governance is becoming more challenging

With the quantity of information being accumulated today, data safety and security (particularly in specific sectors like financing) are crucial. Without a central area to access and collaborate with information that has correct user controls, data might be saved across different individuals’ laptop computers. And also if an employee or specialist leaves the company, the threats raise not just because they could still have accessibility to sensitive data, however since they might take their collaboration with them as well as leave the group to go back to square one, uncertain of what the individual was servicing. On top of these concerns, today’s enterprise is afflicted by shadow IT; that is, the suggestion that for years, different divisions have invested in all kinds of various innovations and are accessing as well as utilizing information in their ways. A lot to make sure that also IT groups today do not have a central sight of that is using what, just how. It’s a problem that becomes dangerously amplified as AI efforts scale and points to the requirement for governance at a more significant as well as much more fundamental scale throughout all industries in the business.

AI Needs to Be Responsible

We learn from a young age that topics like science and mathematics are all goal, which implies that naturally, individuals think that data science is as well – that it’s black and white, a specific discipline with just one method to reach a “proper” service, independent of who constructs it. We’ve understood for a long time that this is not the case and that it is possible to utilize data science strategies (and, hence, produce AI systems) that do things, well … incorrect. Even as just recently as last year, we are witnessing with problems that giants like Google, Tesla and Facebook face with their AI systems. These problems can cause domino effect very fast. It can be private information leakage, photo mislabelling, or video recognition not recognizing a pedestrian on crossing the road and hitting it.
This is where AI needs to be very responsible. And for that you need to be able to discover in early stages where you AI might fail, before deploying it in the real world.
The fact that these companies might not have fixed all of the problems, showing quickly how challenging it is to get AI.

Reproducibility of Machine Learning projects as well as scaling the same projects

Absolutely nothing is extra ineffective than needlessly repeating the same processes over as well as over. This relates to both duplicating procedures within a project (like data prep work) over and over as well as repeating the same process throughout projects or – even worse – unintentionally duplicating entire jobs if the team gets large yet does not have insight right into each other’s role. As well as no service is insusceptible to this danger – as a matter of fact, this issue can become exponentially worse in huge ventures with bigger teams and also even more separate in between them. To range efficiently, data groups require a tool that helps in reducing duplicated work and makes sure that work between members of the group hasn’t currently been done before.

Utilize Data Experts to Augment Data Scientists’ Job

Today, information researcher is one of the most in-demand settings. This means that data scientists can be both (1) difficult to locate and bring in and also (2) expensive to work with as well as retain. This combination implies that to range data initiatives to pursue Venture AI, it will unavoidably need to be submitted with service or information analysts. For the two sorts of a team to collaborate appropriately, they require a central atmosphere from which to work. Experts also often tend to work in a different way than data scientists, experienced in spreadsheets as well as possibly SQL yet generally not coding. Having a tool that allows each account to leverage the tools with which (s)he is most comfortable enables the performance to range data efforts to any size.

Ad-Hoc Methodology is Unsustainable for Large Teams

Small teams can sustain themselves to a specific point
by dealing with data, ML, or larger AI tasks in an ad-hoc fashion,
indicating staff member save their work in your area and not centrally and don’t have any reproducible procedures or workflows, figuring
things out along the method.
However, with more than just a couple of employee and more than one
job, this becomes rowdy rapidly. Any business with any hope of
doing Enterprise AI requires a central location where everybody involved
with data can do all of their work, from accessing data to deploying
a design into a production environment. Permitting workers -whether directly on the data team or not – to work ad hoc without a central tool from which to work is like a construction group attempting to build a high-rise building without a primary set of blueprints.

Machine Learning models Need to be Monitored and Managed

The most significant distinction between developing traditional software application and developing machine learning models is upkeep. For the most part, the software is composed when and does not need to be continually kept – it will typically continue to work over time. Machine learning models are established, put in production, and then must be kept an eye on and fine-tuned up until performance is ideal. Even when the efficiency is optimal, model performance can still move gradually as data (and the individuals producing it) changes. This is quite a different approach, especially for companies that are used to putting software application in production.
Moreover, it’s easy to see how issues with sustainability might eventually trigger – or intensify – problems with ML design bias. In reality, the two are deeply linked, and disregarding both can be devastating to a business’s data science efforts, particularly when magnified by the scaling up of efforts. All of these factors point to having a platform that can help manage design tracking and management.

Required to Create Models that Work in Production

Investing in predictive analytics and data science means guaranteeing that data teams are productive and see projects through to completion (i.e., production) – otherwise called operationalization. Without an API-based tool that allows for a single release, data teams likely will need to hand off designs to an IT team who then will have to re-code it. This step can take lots of time and resources and be a substantial barrier to executing data tasks that genuinely affect the business in essential methods. With a tool that makes it smooth, data groups can easily have an impact, screen, fine-tune, and continue to make improvements that positively impact the bottom line.

Having all said, choosing the right platform is not always straightforward. You need to carefully measure what you really need now and what will you need in the future.
You need to do so taking in account your budget, employees skills and their willingness to learn new methodologies and technologies.

Please bare in mind that developing a challenging AI project takes time, sometimes couple of years. that means your team can start building your prototipe in easy to use open source machine learning platform. Once you have proven your hypothesis you can migrate to more complex and more expensive platform.

Good luck on your new machine learning AI project!

Machine learning in the cloud

Machine Learning in the cloud

As artificial intelligence (ML) and also artificial intelligence come to be extra prevalent, data logistics will be vital to your success.
While building Machine Learning projects, most of the effort required for success in artificial intelligence is not the algorithm, design, structure, or the learning itself. It’s the data logistics. Perhaps less amazing than these other facets of ML, it’s the data logistics that drive performance, continuous knowing, as well as success. Without data logistics, your capability to remain to refine as well as scale are significantly limited.

Data logistics is key for success in your Machine Learning and AI Projects

Great data logistics does more than drive effectiveness. It is essential to reduce prices currently and also boosted agility in the future. As ML and also AI continue to develop and also expand right into even more business processes, business have to not enable very early successes to become limitations or issues long-term. In a paper by Google scientists (Artificial intelligence: The High Rate Of Interest Credit Card of Technical Financial Debt), the writers point out that although it is simple to spin up ML-based applications, the initiative can result in expensive data dependencies. Excellent data logistics can mitigate the difficulty in managing these intricate data reliances to prevent hindering agility in the future. Using an appropriate structure such as this can also ease deployment and also administration as well as permit the advancement of these applications in ways that are difficult to predict precisely today.

When building Machine Learning and AI projects use – Keep It Simple to Start

Nowadays, we’ll see a shift from complex, data-science-heavy implementations to an expansion of efforts that can be finest called KISS (Keep It Simple to Start). Domain experience as well as data will be the chauffeurs of AI processes that will evolve and improve as experience grows. This strategy will use an additional benefit: it also improves the productivity of existing personnel along with costly, hard-to-find, -hire, as well as -preserve data researchers.

This approach additionally removes the problem over choosing “simply the right devices.” It is a fact of life that we need several devices for AI. Structure around AI the proper way allows continual adjustment to capitalize on brand-new AI tools as well as formulas as they appear. Don’t stress over performance, either (including that of applications that need to stream data in real time) due to the fact that there are constant bear down that front. For instance, NVIDIA recently announced RAPIDS, an open resource data scientific research initiative that leverages GPU-based processes to make the growth and training of models both much easier and also much faster.

Multi-Cloud Deployments will become more standard methods

To be completely agile for whatever the future may hold, the data platforms will certainly need to support the complete selection of diverse data kinds, including documents, items, tables, as well as events. The system must make input as well as outcome data available to any kind of application anywhere. Such agility will certainly make it feasible to totally utilize the worldwide sources offered in a multi cloud setting, thereby empowering organizations to attain the cloud’s complete potential to maximize efficiency, cost, as well as conformity requirements.

Organizations will move to release a common data platform to synchronize and drive converge of (and additionally preserve) all data throughout all deployments, as well as through a global namespace provide a sight into all data, any place it is. An usual data platform throughout numerous clouds will certainly also make it less complicated to explore different services for a range of ML as well as AI demands.

As companies broaden their use ML as well as AI throughout numerous industries, they will require to access the full variety of data sources, types, and also structures on any cloud while staying clear of the creation of data silos. Attaining this end result will cause releases that surpass a data lake, and also this will certainly mark the increased proliferation of worldwide data platforms that can extend data kinds and also places.

Analytics at the Cloud Will End Up Being Strategically Crucial

As the Web of Things (IoT) continues to increase and also develop, the capability to unite edge, on-premises, and cloud processing atop an usual, worldwide data platform will certainly become a tactical important.

A distributed ML/AI style efficient in coordinating data collection as well as processing at the IoT side removes the requirement to send large quantities of data over the WAN. This capability to filter, aggregate, and analyze data at the edge additionally promotes faster, much more reliable handling and also can cause better neighborhood decision making.

Organizations will certainly aim to have a typical data system– from the cloud core to the venture edge– with consistent data administration to make certain the honesty and also safety of all data. The data system picked for the cloud core will, therefore, be adequately extensible and also scalable to deal with the complexities connected with distributed processing at a scattered and also vibrant side. Enterprises will position a premium on a “light-weight” yet capable as well as compatible variation appropriate for the calculate power available at the side, especially for applications that should deliver results in real-time.

A Final Word

In the following years we will see a boosted focus for AI and also ML development in the cloud. Enterprises will maintain it basic to begin, avoid dependencies with a multicloud global data platform, as well as encourage the IoT edge so ML/AI campaigns provide more worth to business in latest years and also well right into the future.

More reads:

Where does a Data Scientist sit among all that Big Data

Predictive analytics, the process of building it

Advanced Data Science

How to turn your boring work into an amazing experience?

Have you ever found yourself stuck in boring work? Days are passing slow, and you just have no motivation to get up from your bed and go to the office. You find yourself daydreaming, searching the web for your next vacation or even playing an online game. The phenomenon of being bored at work, not enjoying going to the officer and finding millions of excuses to stay home is not new. I read somewhere that around 80% of working people don’t like their job and if they don’t have to do it, they would never do it. 80% of people hating what they do every day is too high of a number. But there is some good news.

Like everyone else, I was stuck in dread boring office work multiple times in my career. I was looking for ways to entertain myself in every possible creative scenario, from having super long breaks to playing every possible online game. I even got bored with the games and news of the web portals that I read every day. Time wasn’t passing fast enough, and I was stuck.

Where to start when things are not going your way anymore?

Make a decision to change the things in your favor by yourself. Get out of your comfort zone.

One day, after long waiting, I decided that I need to take things in my hands. I realized that my manager would never give me the exciting tasks that were nice to have, he needed the operational tasks done. I decided that I realized that the time to stop wasting my life in doing boring and to a significant extent unproductive tasks have come. Just typing what is asked from me to type and live from paycheck to paycheck isn’t my game anymore. I needed a change. And I knew I am that one who is going to bring that change in my work.

I was very aware that the management was really focused on finishing the operational tasks, fast. The business was waiting. So I was okay with the fact that I will have to put some extra hours in creating my own project.

Do ground research before you come up with a plan
The culture in every company is always fantastic if you are open to it.

People love to talk about their work, most of the time they will complain about it, but if you listen carefully enough, you will find the reason what bothers them.

That is why I think that these data scientists need to have some soft skills as I describe in my previous article.

I was already working for 6 months in this specific company. During those 6 months, I tried to meet many of my colleagues that I thought their work is exciting and I can learn from them.

In the beginning, it was a bit strange. According to me, people weren’t accustomed that someone from IT will just walk around the floors of the company and approach them and invite them for lunch. But after a dozen invitations people learned about my friendly and curious nature, so I had an easier time in making invitations. Still, on many occasions, I had to be patient and wait for a free spot on their agenda, or try to find understanding when they cancel on me six times.

In the end, persistence paid off.

In my research time, I managed to talk with my colleagues on different seniority levels from a different department and got really accustomed with the business plan, the business model and more importantly the problems that the business was facing.

Now, that I knew the core of the business, I knew that some exciting opportunities can be tackled and can bring the company on the next level.

The company I worked at that time was into the travel sector. They were selling airplane tickets and travel arrangements.

My professional interest was creating predictive algorithms and work with machine learning. Predictive algorithms can really give benefit to a company in the travel sector. However, in my surprise, predictive algorithms and machine learning weren’t used very much there.

Soon I would discover why.

After work hours I started working on a few project descriptions and use cases that I identified that would be valuable for the company.

I used our data to create working demos and presentations on how the project would help in the long run.

First, I presented this to my manager and his manager, they were both impressed by the ideas. They were pleased to give me permission to continue working on this project as long as I was working on in them after work hours. I was okay with it.

Pitching the projects to stakeholders
Naturally, I started talking with the managers I was close to and slowly started planting my ideas into them for bringing new projects in the company. I would ask them: What do you think if we could do this project to help you with that? Most of the times I would get a positive response, I realized later, more out of courtesy.

I realized, most of the time people want to stay in their comfort zone, protect their position and don’t accept gladly new ideas that might endanger their status or even worse, make them learn something new.
After a month from planting my ideas to different stakeholders, I arranged a meeting where I wanted to present the working demos to them.

On my big surprise, the response I got from them was unexpected. Mostly it was going in the lines of: “I love the idea, but I have more important things on my timeline.” or “I would not change things until they are bringing profit.”, “Excel gives me the forecasts I need, I don’t need anything more than that.”

Stay persistent – Don’t get discouraged easily
These responses were a shock to me, for months I listen people whining about their work, the processes, and faulty systems, but the time I came to them with working solutions, they rejected it immediately saying that they don’t need improvement nor changes.

However, I was determined to upgrade myself and my work, and there was no one stopping me.

Next destination was the 15th floor. 15th floor was reserved for C level people and high management.

I had the opportunity to meet a few of our C level people during company celebrations, and I left space for future unofficial conversation with them.

In my experience, in big companies, you can’t just schedule a meeting with C level person. Especially if you are coming from a low levels hierarchy, like me coming from IT at that time. So you need to make the meeting happen outside their office.

I already knew that the kitchen is the place where everyone must come past by at least once a day. My strategy was to prey on the C level people I had spoken to before and in an informal conversation mention them that I work on this exciting project that can help the business if it implemented.

After patient praying for a few weeks, I managed to meet the CEO of the company in the hallway on his way out. And in time of 6 minutes, the time to go from 15th floor to the parking lot and his car, I managed to tell him the most interesting facts about my projects that he invited me for an official meeting. I was thrilled, my plan was finally working.

The big meeting

After two weeks of meeting my CEO, the big meeting happened.

In those two weeks, I gave the maximum effort to make my demo the most exciting and eye-catching it can be.

The meeting was planned to be only 30 minutes, and it was about to happen not only with the CEO but with other stakeholders that he counted as critical people for the business.

After an hour and a half – one hour after the initially planned meeting, my big rally was over, and I got the attention I needed.

From the following Monday, I officially started working on my project half of the week. I got the promise that if the initial test phase is successful, I would be allowed to work on this full time and even form my own team. I was in the stars.

Not everyone is happy when you progress

My big news wasn’t accepted so gladly from some other managers. Now they started seeing me as their competition. The immediately scheduled meetings on which they tried very hard to prove that my project will fail.

These people were longer in the company, they already knew the business more than I do and their name was recognized more than mine was. All this made me sweat profoundly before and during the meetings, but also made me make sure that my demo is bulletproof and covered from every possible angle.

Nevertheless, I showed that I’m not bothered by their attitude and I kept the most friendly and professional character.

In the end, the only thing that matters is your goal

After a painful but exciting test phase, the final results were out. The predictive model turned out to be even more successful in the real world scenario, and the business owners were really pleased with the extra profit that the project brought.

Now I had the opportunity to work my dream work and never get bored from it again.

All the difficulties during the process, after work hours, unpleasant moments and failures now seemed like nothing else but a good experience, because all it matters is: I realized my plan.


What are the most important soft skills for data scientists?

Data scientist are the people who are thought to be the statistics wizards and tech gurus.

I have written about what tech skills does a data scientist need. In most cases, these beliefs are the truth. Data scientist are expected, most of the time, to perform wonders using some fancy algorithm names and tools. Everyone is focusing on their technical knowledge and expertise, their past tech knowledge and the project they have been partly from. This is all great, it is needed and it is a big part of everyday work a Data scientist should do.

Unfortunately, not so many people focus on the soft skills data scientists should have. That is why I took the time to think about and state the top three skills data scientist should have, according to me.

Read More »

How not to learn programing language like Python and R for machine learning the wrong way

Here is what you should NOT do when you start studying machine learning in Python.

  1. Get really good at Python programming and Python syntax.
  2. Deeply study the underlying theory and parameters for machine learning algorithms
  3. Avoid or lightly touch on all of the other tasks needed to complete a real project.

I think that this approach can work for some people, but it is a really slow and a roundabout way of getting to your goal. It teaches you that you need to spend all your time learning how to use individual machine learning algorithms. It also does not teach you the process of building predictive machine learning models in Python that you can actually use to make predictions.

Sadly, this is the approach used to teach machine learning that I see in almost all books and online courses on the topic.

Lessons: Learn how the sub-tasks of a machine learning project map onto Python and the best practice way of working through each task.

Projects: Tie together all of the knowledge from the lessons by working through a case study predictive modeling problems.

Recipes: Apply machine learning with a catalog of standalone recipes in Python that you

can copy-and-paste as a starting point for new projects.

1.2.1 Lessons

You need to know how to complete the specific subtasks of a machine learning project using the Python ecosystem. Once you know how to complete a discrete task using the platform and get a result reliably, you can do it again and again on the project after project. Let’s start with an overview of the common tasks in a machine learning project. A predictive modeling machine learning project can be broken down into 6 top-level tasks:

  1.  Investigate and characterize the problem in order to better understand the goals of the project.
  2. Analyze Data: Use descriptive statistics and visualization to better understand the data you have available.
  3. Prepare Data: Use data transforms in order to better expose the structure of the prediction problem to modeling algorithms.
  4. Evaluate Algorithms: Design a test harness to evaluate a number of standard algorithms on the data and select the top few to investigate further.
  5. Improve Results: Use algorithm tuning and ensemble methods to get the most out of well-performing algorithms on your data.
  6. Present Results: Finalize the model, make predictions and present results.

Modeling marketing multi-channel attribution in practice

multi channel attribution.png

What is the next step I need to take to close his deal? What will this customer ask for next and how can I drive it to him? What is the shortest path to close a deal?
How much do all my marketing and sales activities really cost? How much does one action or marketing channel costs?

All these are common questions marketing and sales are facing with on daily bases.

Luckily, there is an answer.

Here I represent the advantages on using machine learning models that will produce multi-channel attribution models. I strongly recommend you to read it.Read More »

Data Science Platforms

What is a good platform for Data science

When you think about becoming a Data Scientist, one of the first questions that will come on your mind is: What do I need to start, what tools do I need?

Well, today I’ll share my secret tools, and in multiple series of posts, I’ll try to make you proficient in it.

Basically, I don’t use super fancy tools like the one on CSI.

I use Excel, notepad, R, Python – those two are really popular nowadays. I have been using Microsoft BI full stack (SSRS, SSIS, SSAS), JENA, ENCOG, RapidMiner and what not else.
Starting in my latest company, I was introduced to a little gem called KNIME.

Surprisingly enough KNIME was quoting really good sufficient by that time, but I haven’t got the time to explore it before.

In the latest Gartner Report KNIME site on the far right among well-established industry leaders like IBM, SAS, RapidMiner.

But being wholly outsourced unlike his peers, KNIME really outshines the others.

Gartner Magic Quarter.png

What Gartner Says about KNIME:

KNIME (the name stands for “Konstanz Information Miner”) is based in Zurich, Switzerland. It offers a free, open-source, desktop-based advanced analytics platform. It also provides a commercial, server-based solution providing additional enterprise functionality that can be deployed on-premises or in a private cloud. KNIME competes across a broad range of industries but has a large client base in the life sciences, government and services sectors.

  • Almost every KNIME customer mentions the platform’s flexibility, openness, and ease of integration with other tools. Similar to last year, KNIME continues to receive among the highest customer satisfaction ratings in this Magic Quadrant.
  • KNIME stands out in this market with its open-source-oriented go-to-market strategy, large user base and active community — given the small size of the company.
  • Many customers choose KNIME for its cost-benefit ratio, and its customer reference ratings are among the highest for good value.
  • The most common customer complaints are about the outdated UI (which was recently updated in version 3.0 in October 2015, so few customer references have seen it) and a desire for better-performing algorithms for a distributed big data environment.
  • Customers also expect a high level of interactive visualizations from their tools. KNIME lacks in this area, requiring its customers to obtain this from data visualization vendors such as Tableau, Qlik or TIBCO Spotfire.
  • Some customers are looking for better insight into and communication of the product roadmap, but they do give KNIME high scores on including customer requests into subsequent product releases.

Read more here.

I strongly encourage you to download and get familiar with KNIME.

Do I use only KNIME as a Data Science Platform?

No. The beauty of KNIME is that can easily integrate with external solutions, Weka, R, Python. KNIME is very solid in building Predictive models, but sometimes I make models in R because I find libraries that I personally think are better to be used than KNIME native libraries.
After that, I integrate R models in KNIME using their R task. Works like a charm.

Use Database systems too. Please.

One thing that I should be careful while using these platforms is their memory consumption. R draining computer’s memory because it loads everything in a buffer. KNIME is similar. Therefore I use a database system to filter out the data before I load it in R or KNIME.

Database systems are must use. If you want to build effective and fast models, you need to let the Database system handle the vast amount of data first and then load it to the analytical platform. The choice of which platform you should use really depends on your company policy. It is terrific if you have a distributed database systems like Hadoop where you can run SQL operations on Big Data and then sent limited datasets to KNIME and R.

It will be fast, and it will save you a lot from a painful experience like filling up the DB Memory buffer.

However, if you don’t have a distributed Database system, the conventional Database system will do as well. Mayor task as a Data scientist makes the standard database system work with your data too 🙂

What is next?

How do you become a data scientist?

What skills does a Data Scientist need?

How can you make great reports?

What skills does a data scientist need and how to get them?

Upgrading your skills constantly is the way to stay on the top.
What skills do you need to have to become a Data Scientist?
I have written before but I’ll try to put again some more info to help the people who really want to go that path.

Free Tools can help a lot to start!

 There are many tools that can help you overcome this easily to some extent: KNIME is one great tool I use literally every day. It is really easy to learn and it covers 90% of the tasks you will be asked daily as Data Scientist. The best is free.
Check it out here:
Other similar tools: RapidMiner
The important fact is you should know what to do with it.
I have given numerous courses on how they use the tool and how to start with super basic DS tasks.
Understanding Basic terms can help you along the way:
What are regression and what classification?
It is good to know how to approach a specific problem in order to solve it. Almost every problem in the world we are trying to solve can fall into these two.

What algorithms can be used and should be used for each problem?

This is important but not show stopper for the beginning. Decision trees can do just right for a start.
How to do:

Data Cleaning or Transformation

This is one of the most important things you’d come across working in Data Science. 90% of the time, you are not going to get well-formatted data. If you are skilled in one of the programming language, Python or R, you should be pro at packages like Pandas or Dplyr/Reshape.
Exploratory Data Analysis
I have written before of How can you start using the data. Check this link to get an idea.
Once again, this is the most important part, whether you are working to take insights or you want to do predictive modeling, this step comes in. You must train your mind analytically to make an image of variables in your head. You can build such a mind by practice. After that, you must be very good with hands-on with packages like matplotlib or ggplot2, depending upon the language you work with

Machine Learning / Predictive Modelling

One of the most important aspects of today’s data science is predictive modeling. This is dependent upon your EDA and your knowledge of mathematics. I must inform you that invest your time in theory. The more theoretical knowledge you have, the better you’d be going to do. There is no easy way around it. There’s this great course by Andrew NG that goes much into theory. Take it.

Programming Languages

If you want to go more advanced, it is important to have a grip on at least one programming language widely used in Data Science. But you should know a little of another language. Either you should know R very well and some Python or Python very well but some R.
Take my case, I know R very well ( at least I think so) but I can work around with Python too ( not expert level ), Java, C#, JavaScript. Anything works if you know to use it when you need it.
Example of complete data analysis that one Data Scientist is doing can be found here.
I use Knime, R and Python every day, I think if you are a total beginner, its good idea to start with KNIME.

Useful courses for learning Data Scientists

I really recommend spending some time on the following courses:
I have passed them myself and I learned a lot from each of it.
Happy learning!
Image credit: House of bots