Expert opinion

Data Science: a tremendous opportunity

a great opportunity for companies to predict key events in their business.

Sep 15, 1:27 PM

A great opportunity for companies to predict key events in their business.

Stemming from the rapid rise of machine learning and the intensive use of open source tools (such as R and Python), Data Science is in a way the extension of Data Mining to the new Big Data platforms.

If we take a closer look, we realize that most of the foundations of the algorithms cited as belonging to Data Science were defined a long time ago. Whether it is image processing, text processing or machine learning.

What has changed, however, is the coupling of near-infinite computing power with the democratization of access to the latest generation of algorithms, which now make it possible to process any type of information and deliver more predictions and recommendations in real time, with sometimes surgical precision. And while the scope of what's possible today is vastly expanded, many of the projects launched recently could already have been handled without any problem 10 years ago on a desktop PC! So much the better if all this buzz around Big Data and Data Science has served to wake people up!

The other advantage of the new Big Data platforms is that they allow to gather in a unique environment all the company's data sources (structured or not, Data Warehouse, Web, Sensors, external data...). This significantly increases the productivity of Data Scientists and makes it possible to have a 360° vision, which until now had remained at the virtual stage for many companies.

While reconciling all this data in a single environment is simplified, it should not be forgotten that each Data Science project requires a very specific phase of scoping and data preparation. Reconstructing individual histories and customer trajectories (growth, decline, instability of behavior...) in an omnichannel context, with a view to predicting an event (churn, subscription, household expansion, real estate project...) is not something you can improvise if you've never done it before!

Indeed, most algorithms need to work on data tables that look nothing like the raw data dumped in the datalakes. In most cases, these algorithms need to work on tables where each row represents a distinct individual and each column a specific information about this individual. However, most of the data dumped into the datalakes is in transactional format. For example, for a customer knowledge project, it will be necessary to be able to transform this raw data in order to summarize the situation of each customer before the event we are trying to model. These indicators will be based on the customer's profile as well as on his past behaviors (cumulative and recent purchases, online or offline visits, purchasing path, reactivity to marketing solicitations, consumer opinions, travel, affinity preferences, product use via sensory sensors, etc.).

You may be the "king of programming", but you won't get very far if you've never had to transform raw data into potentially relevant indicators to explain or predict the targeted event. Until now, most data mining projects have been devoted to data preparation. As you can see, nothing changes in this respect with the arrival of Data Science.

Finally, this technological shift is a tremendous opportunity for companies wishing to anticipate and predict key events in their business. It is just as important for data miners themselves, who will be able to discover new approaches (machine learning) and new tools (R, Python, H2O...), which are ultimately very easy to learn.

And even if some data miners must certainly have felt a little lost in the face of such effervescence and the improbable accumulation of new environments, languages, packages and solutions they were being asked to master by companies eager to recruit, rest assured! These job descriptions correspond to the profiles of the pioneers of data science: the famous "12-legged sheep". They will gradually give way to two types of complementary profiles:

Big Data architects with a profile that is more IT than business: responsible for configuring and administering the BigData platform, managing data flows, preparing data and automating its transformation to facilitate the work of the Data Scientist and the operational exploitation of predictions or recommendations.
Data Scientists with a more statistical and business profile: in charge of making the link between business needs and data, transforming them to analyse, synthesise, explain and predict certain events or behaviour. In a way, an extension of the data miner profile with, in addition, mastery of the R and Python languages and a real agility to choose the right language according to the specific needs of each study.

More generally, Big Data architectures lead to a change in the collaborative approach of the various players. Whereas the Data Miner was confined to the end of the chain and was very rarely called upon upstream of projects, the Data Scientist will work from the start of the project with the Big Data architect, depending on the use case to be processed, on the best way to retrieve the data (API, JSON type files, real-time processing of a data stream, etc.). The Data Scientist will thus give his inputs according to the packages, libraries and algorithms he intends to use, the very use of these algorithms being conditioned by the volume of data.

There is therefore a governance dimension involved in the work of the data scientist, due to his or her unique ability within the datalake to cross-reference all of the company's transversal data. This raises questions about security, respect for and protection of private data, handling of sensitive data, etc. The Data Scientist will therefore have to work tomorrow with profiles such as the CISO (Information Systems Security Manager), but also the CDO (Chief Data Officer) who steers the strategy and ambition of the data within the organisation.

Due to Big Data, the computing power of new platforms and the need to deliver more and more predictions, prescriptions and relevant recommendations, some of them in real time, the intensification of the use of Data Science in machine learning mode in operational processes is inevitable. But machine learning means a black box, and predictive analysis means being limited to the spectrum of past events to influence and guide the future. And yet, companies will always need to understand, create and experiment with new offerings, strategies and devices.

Companies will therefore have to be proactive and make massive use of the "test and learn" approach. This is how the classic statistical approach and data science will enable them to measure and identify their new growth levers.