Model Performance & Architecture

Our BiLSTM-based deepfake detection model is trained on thousands of audio samples to achieve industry-leading accuracy.

Note: Our model can be wrong sometimes. These metrics are based on our test dataset and real-world performance may vary depending on audio quality and recording conditions. We recommend using this tool alongside other verification methods.

Performance Metrics

93.8%

Accuracy

Overall correctness of predictions

                    
93.7%
Precision
Accuracy of positive predictions

99.0%

Recall

Ability to find all positives

                    
96.3%
F1 Score
Harmonic mean of precision & recall

                    
88.6%
AUC-ROC
Area under ROC curve

<5s

Inference Time

Average prediction speed

ROC Curve Analysis

What is ROC?

The Receiver Operating Characteristic (ROC) curve illustrates the diagnostic ability of our binary classifier. It plots the True Positive Rate against the False Positive Rate at various threshold settings.

AUC Score: 88.6%

Interpretation: Good

True Positive Rate: 99.0%

Precision: 93.7%

AUC Score Interpretation:

0.90 - 1.00: Excellent
0.80 - 0.90: Good
0.70 - 0.80: Fair
0.60 - 0.70: Poor
0.50 - 0.60: Fail

Training Data

13,185

Training Samples

12,842

Validation Samples

32,746

Test Samples

Dataset: SceneFake

Our model is trained on the SceneFake dataset, which contains authentic and synthetic audio samples covering various scenarios, speakers, and generation techniques.

Real Audio: Genuine human speech recordings
Fake Audio: AI-generated speech from multiple synthesis methods
Diversity: Multiple speakers, accents, and recording conditions

Model Architecture

Total Parameters: 102306

Input Shape: (300, 40)

Model Type: BiLSTM

Network Layers

Input Layer (300, 40)

Bidirectional LSTM (64 units, return_sequences=True, recurrent_dropout=0.2)

Batch Normalization

Dropout (0.4)

Bidirectional LSTM (32 units, recurrent_dropout=0.2)

Batch Normalization

Dropout (0.4)

Dense (64 units, ReLU activation)

Batch Normalization

Dropout (0.4)

Dense (32 units, ReLU activation)

Dropout (0.4)

Dense (2 units, Softmax activation)

Feature Extraction

Type: MFCC

Coefficients: 40

Sample Rate: 16,000 Hz

Max Length: 300 frames

What are MFCCs?

Mel-Frequency Cepstral Coefficients (MFCCs) are features that represent the short-term power spectrum of audio. They capture the unique characteristics of human speech and are highly effective for distinguishing between real and synthetic audio.

Training Configuration

Epochs

Batch Size

Learning Rate

0.0001

Optimizer

Adam

Loss Function

SCC

Gradient Clipping

1.0

Training Callbacks

EarlyStopping (patience=10)
ModelCheckpoint (save_best_only=True)
ReduceLROnPlateau (factor=0.5, patience=5)

Try It Yourself

Experience our high-performance model in action

Detect Deepfakes Now