Feature engineering: definition and basic techniques


Feature engineering is an early stage of machine learning that focuses on pre-processing raw data before using it as training data.

What is feature engineering?

In the field of artificial intelligence (AI), feature engineering is the preprocessing of raw data for use as machine learning data. They should really have their own characteristics and format (in terms of customer number, item, document, file, chronological scale / and associated values ​​or volumes, etc.).

The process differentiates them and identifies any anomalies. This provides a more reliable and efficient predictive model, reducing the risk of bias and model drift.

What is feature engineering in data science?

The goal of data science is to create knowledge from the study and analysis of data sets. Historically used in decision analysis or business intelligence, as well as big data, it has recently expanded to artificial intelligence and building learning models. In this context, it actually covers feature engineering, which aims to pre-process the training data and features above the training stages.

Some examples of feature engineering

Due to its importance in the fields of artificial intelligence, data science and machine learning, feature engineering has various application areas:

  • Identifying features,
  • Management of related values, especially missing values,
  • Digitization of data without elements of comparison,
  • Enrichment of databases…

What are feature engineering techniques in machine learning?

Feature engineering is the first step in processing training data before building a machine learning model. There are several feature engineering techniques:

  • Significance method : a score is given to evaluate the importance of the characteristic.
  • Feature extraction : Raw data is used to create new features.
  • Selection of features : the system selects the most suitable sets and subsets.

What about feature engineering techniques for time series?

Feature engineering for chronological or time series uses techniques specific to the characteristics of these systems:

  • History functionsto predict on a changing time scale.
  • Time stamp : the principle remains the same as the previous method, but it gains precision. In particular, there is a difference between working and non-working hours.
  • Offset : based on the selection of a variable at a more or less distant time interval, taking into account the value and fluctuations of past data.

What is feature engineering in NLP?

Natural Language Processing (NLP) uses feature engineering for many use cases. Capabilities range from automatic translation processes to syntactic analysis, from optical character recognition to voice synthesis. It allows to reconcile artificial intelligence and linguistics in the digital environment.

NLP should understand how to structure text, sentence, word and their expression style. Deciphering this context is a prerequisite to understanding its meaning. Feature engineering is precise in capturing these contextual elements. From here, learning consists of extracting contextual information from the raw data to generate key characteristics of the dataset: number of words in the texts to be processed, number of capitalized words, number of punctuation marks, number of unique words, average sentence length, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *