Model Performance & Architecture

Our BiLSTM-based deepfake detection model is trained on thousands of audio samples to achieve industry-leading accuracy.

Note: Our model can be wrong sometimes. These metrics are based on our test dataset and real-world performance may vary depending on audio quality and recording conditions. We recommend using this tool alongside other verification methods.

Performance Metrics

93.8%
Accuracy
Overall correctness of predictions
93.7%
Precision
Accuracy of positive predictions
99.0%
Recall
Ability to find all positives
96.3%
F1 Score
Harmonic mean of precision & recall
88.6%
AUC-ROC
Area under ROC curve
<5s
Inference Time
Average prediction speed

ROC Curve Analysis

What is ROC?

The Receiver Operating Characteristic (ROC) curve illustrates the diagnostic ability of our binary classifier. It plots the True Positive Rate against the False Positive Rate at various threshold settings.

AUC Score: 88.6%
Interpretation: Good
True Positive Rate: 99.0%
Precision: 93.7%

AUC Score Interpretation:

  • 0.90 - 1.00: Excellent
  • 0.80 - 0.90: Good
  • 0.70 - 0.80: Fair
  • 0.60 - 0.70: Poor
  • 0.50 - 0.60: Fail
False Positive Rate True Positive Rate 0.0 1.0 0.0 1.0 AUC = 0.89

Training Data

13,185
Training Samples
12,842
Validation Samples
32,746
Test Samples

Dataset: SceneFake

Our model is trained on the SceneFake dataset, which contains authentic and synthetic audio samples covering various scenarios, speakers, and generation techniques.

  • Real Audio: Genuine human speech recordings
  • Fake Audio: AI-generated speech from multiple synthesis methods
  • Diversity: Multiple speakers, accents, and recording conditions

Model Architecture

Total Parameters: 102306
Input Shape: (300, 40)
Model Type: BiLSTM

Network Layers

Input Layer MFCC Features (40, 174) BiLSTM-1 128 Bidirectional LSTM Units Dropout 30% dropout BiLSTM-2 64 Bidirectional LSTM Units Dropout 30% dropout Dense 32 units ReLU Output 2 classes Softmax Real/Fake Data Flow Direction →
1
Input Layer (300, 40)
2
Bidirectional LSTM (64 units, return_sequences=True, recurrent_dropout=0.2)
3
Batch Normalization
4
Dropout (0.4)
5
Bidirectional LSTM (32 units, recurrent_dropout=0.2)
6
Batch Normalization
7
Dropout (0.4)
8
Dense (64 units, ReLU activation)
9
Batch Normalization
10
Dropout (0.4)
11
Dense (32 units, ReLU activation)
12
Dropout (0.4)
13
Dense (2 units, Softmax activation)

Feature Extraction

Type: MFCC
Coefficients: 40
Sample Rate: 16,000 Hz
Max Length: 300 frames

What are MFCCs?

Mel-Frequency Cepstral Coefficients (MFCCs) are features that represent the short-term power spectrum of audio. They capture the unique characteristics of human speech and are highly effective for distinguishing between real and synthetic audio.

Training Configuration

Epochs
50
Batch Size
32
Learning Rate
0.0001
Optimizer
Adam
Loss Function
SCC
Gradient Clipping
1.0

Training Callbacks

  • EarlyStopping (patience=10)
  • ModelCheckpoint (save_best_only=True)
  • ReduceLROnPlateau (factor=0.5, patience=5)

Try It Yourself

Experience our high-performance model in action

Detect Deepfakes Now