Encyclopedia ( Tech, Gadgets, Science )

Classification (Machine Learning)

1. Definition

Classification is a supervised learning task where the goal is to assign input data into predefined categories (classes/labels).

  • Example: Spam detection → Email classified as Spam or Not Spam.

👉 It answers the question: “Which class does this data point belong to?”


2. Key Characteristics

  • Input: Features (X)
  • Output: Discrete labels (Y)
  • Training: Uses labeled dataset (input + correct output).
  • Goal: Learn a decision boundary that separates different classes.

3. Types of Classification

  1. Binary Classification → Two classes
    • Example: Yes/No, Spam/Not Spam, Fraud/Not Fraud.
  2. Multiclass Classification → More than two classes
    • Example: Handwriting digit recognition (0–9).
  3. Multilabel Classification → Each input can belong to multiple classes simultaneously
    • Example: A movie tagged as Action + Adventure + Sci-Fi.

4. Common Algorithms

  • Logistic Regression → Simple but effective for binary classification.
  • Decision Trees & Random Forests → Tree-based methods, handle complex patterns.
  • Naive Bayes → Probabilistic model, works well with text data.
  • k-Nearest Neighbors (kNN) → Classifies based on closest data points.
  • Support Vector Machines (SVM) → Finds optimal separating boundary.
  • Neural Networks / Deep Learning → Handles complex, high-dimensional data.

5. Performance Metrics

  • Accuracy → % of correct predictions.
  • Precision → Correct positive predictions / Total predicted positives.
  • Recall (Sensitivity) → Correct positive predictions / Actual positives.
  • F1-Score → Balance of precision and recall.
  • Confusion Matrix → Visual breakdown of true vs. predicted classes.
  • ROC-AUC Curve → Measures classifier performance at different thresholds.

6. Applications

  • 📧 Spam Detection (Email classification).
  • 👩‍⚕️ Medical Diagnosis (Healthy vs. Diseased).
  • 📱 Image Recognition (Cat vs. Dog).
  • 🛒 Customer Segmentation (High-value vs. Low-value customers).
  • 🎵 Music Genre Classification.
  • 🔐 Fraud Detection (Fraudulent vs. Genuine transactions).

7. Challenges

⚠️ Imbalanced datasets (e.g., 95% normal, 5% fraud).
⚠️ Overfitting with complex models.
⚠️ Choosing the right features.
⚠️ High computational cost for large datasets.


✅ In short: Classification = Teaching a machine to assign labels to data based on patterns learned from past labeled examples.

Also Check them

More Terms