Share on twitter
Share on linkedin
Share on email

Data Science: a great opportunity

a great opportunity for companies to predict key events in their business.

A great opportunity for companies to predict key events in their business.

Stemming from the strong rise of machine learning and the intensive use of open source tools (such as R and Python), Data Science is in a way the extension of Data Mining to the new Big Data platforms.

If we take a closer look, we realise that most of the foundations of the algorithms cited as belonging to Data Science were defined a long time ago. Whether it is image processing, text processing or machine learning.

What has changed, however, is the coupling between almost infinite computing power and the democratisation of access to the latest generation of algorithms, which now makes it possible to process any type of information and to deliver more predictions and recommendations in real time, with sometimes surgical precision. However, if the field of possibilities has now been greatly extended, many of the projects launched recently could already have been processed without any problem 10 years ago on a desktop PC! So it's good that all this buzz about Big Data and Data Science has helped to wake people up!

The other advantage of the new Big Data platforms is that they allow all the company's data sources (structured or not, Data Warehouse, Web, Sensors, external data...) to be brought together in a single environment. This significantly increases the productivity of Data Scientists and makes it possible to have a 360° vision, which until now has remained at the virtual stage for many companies.

While reconciling all this data in a single environment is simplified, it should not be forgotten that each Data Science project requires a specific data preparation and scoping phase. The reconstitution of individual histories and customer trajectories (growth, decline, instability of behaviour, etc.) in an omnichannel context with a view to predicting an event (churn, subscription, household expansion, real estate project, etc.) cannot be improvised if you have never done it before! Indeed, most of the algorithms need to work on data tables that bear no resemblance to the raw data poured into the datalakes. In most cases, these algorithms need to work on tables where each row represents a distinct individual and each column a specific piece of information about that individual. However, the data dumped into the datalakes is mostly in transactional format. For example, for a customer knowledge project, it will be necessary to be able to transform this raw data in order to summarise as well as possible the situation of each customer before the event that we are trying to model. These indicators will cover both the customer's profile and past behaviour (cumulative and recent purchases, online and offline visits, purchasing paths, reactivity to marketing requests, consumer opinions, travel, affinity preferences, product use via sensory sensors, etc.).

So, even if you are the "king of programming", you will not be very advanced if you have never been confronted with the transformation of raw data into potentially relevant indicators to explain or predict the targeted event. However, until now, most data mining projects have been devoted to data preparation. We can see that nothing changes from this point of view with the arrival of Data Science.

Finally, this technological shift is a great opportunity for companies wishing to anticipate and predict key events in their business. It is just as much for the Data Miners themselves who will be able to discover new approaches (machine learning) and new tools (R, Python, H2O...), which are finally very accessible.

And even if some Data Miners must certainly have felt a little lost in front of such an effervescence and the incredible accumulation of new environments, languages, packages and solutions that they were asked to master by companies wishing to recruit, let them be reassured! These job descriptions correspond to the profiles of the pioneers of data science: these famous "12-legged sheep". They will gradually give way to two types of complementary profiles:
  • Big Data architects with a profile that is more IT than business: responsible for configuring and administering the BigData platform, managing data flows, preparing data and automating its transformation to facilitate the work of the Data Scientist and the operational exploitation of predictions or recommendations.
  • Data Scientists with a more statistical and business profile: in charge of making the link between business needs and data, transforming them to analyse, synthesise, explain and predict certain events or behaviour. In a way, an extension of the data miner profile with, in addition, mastery of the R and Python languages and a real agility to choose the right language according to the specific needs of each study.

More generally, Big Data architectures lead to a change in the collaborative approach of the various players. Whereas the Data Miner was confined to the end of the chain and was very rarely called upon upstream of projects, the Data Scientist will work from the start of the project with the Big Data architect, depending on the use case to be processed, on the best way to retrieve the data (API, JSON type files, real-time processing of a data stream, etc.). The Data Scientist will thus give his inputs according to the packages, libraries and algorithms he intends to use, the very use of these algorithms being conditioned by the volume of data.

There is therefore a governance dimension involved in the work of the data scientist, due to his or her unique ability within the datalake to cross-reference all of the company's transversal data. This raises questions about security, respect for and protection of private data, handling of sensitive data, etc. The Data Scientist will therefore have to work tomorrow with profiles such as the CISO (Information Systems Security Manager), but also the CDO (Chief Data Officer) who steers the strategy and ambition of the data within the organisation.

Due to Big Data, the computing power of new platforms and the need to deliver more and more predictions, prescriptions and relevant recommendations, some of them in real time, the intensification of the use of Data Science in machine learning mode in operational processes is inevitable. But machine learning means a black box, and predictive analysis means being limited to the spectrum of past events to influence and guide the future. And yet, companies will always need to understand, create and experiment with new offerings, strategies and devices.

Companies will therefore have to be proactive and make massive use of the "test and learn" approach. This is how the classic statistical approach and data science will enable them to measure and identify their new growth levers.

The fight against health insurance fraud

The fight against insurance fraud...

The fight against supplemental health insurance fraud is a concern...
Stellantis launches Mobilisights: a real Data strategy at the heart of the business

Stellantis launches Mobilisights: a real strategy...

Leveraging data allows companies to stay competitive and ensure...

AI, coupled with human expertise, a weapon...

Tax fraud, social fraud, credit card fraud, ... the types of fraud...

Contact us