<< back to Guides

AI Core Concepts (Part 10): Feature Engineering

Feature Engineering is the process of transforming raw data into meaningful inputs that improve model performance. Even with powerful models like deep neural networks, quality features remain crucial—especially in classical ML.


1. What Is a Feature?

A feature is an individual measurable property or characteristic of the phenomenon being observed. In datasets, features are columns (excluding the target in supervised learning).


2. Goals of Feature Engineering


3. Common Techniques

🔹 Numerical Transformations

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_features = scaler.fit_transform(numeric_data)

🔹 Categorical Encoding

from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder()
encoded = encoder.fit_transform(data[['category']])

🔹 Binning and Bucketing

Group continuous variables into bins (e.g., age → young, middle-aged, old)

data["age_group"] = pd.cut(data["age"], bins=[0, 18, 40, 65, 100],
                           labels=["Teen", "Adult", "MiddleAge", "Senior"])

🔹 Feature Crosses and Interactions

from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2, interaction_only=True)
X_poly = poly.fit_transform(X)

🔹 Date & Time Features

Extract:

df["day_of_week"] = df["timestamp"].dt.dayofweek
df["hour"] = df["timestamp"].dt.hour

🔹 Text Features

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(max_features=1000)
X_text = vectorizer.fit_transform(df["text"])

4. Automated Feature Engineering

Libraries like Featuretools, tsfresh, and AutoFeat can automatically generate new features based on data relationships.

import featuretools as ft

es = ft.EntitySet(id="data")
es = es.entity_from_dataframe(entity_id="df", dataframe=df, index="id")
feature_matrix, features = ft.dfs(entityset=es, target_entity="df")

5. Feature Selection

After engineering, not all features help. Selection methods include:

from sklearn.feature_selection import SelectKBest, f_classif

selector = SelectKBest(score_func=f_classif, k=10)
X_selected = selector.fit_transform(X, y)

6. Best Practices


📚 Further Resources


<< back to Guides