Model Performance Details
← Back

Model Details

Performance metrics, feature importance, and configuration details for all 9 machine learning models used in the phishing detection pipeline. Models span URL features (121), HTML features (100), combined features (221), and character-level CNN approaches.

Detection Pipeline
1URL Input
2Feature Extraction
33 URL Models
4HTML Download
52 HTML + 2 Combined
62 CNN Models
79-Model Consensus
Training Datasets +
Overview of datasets used to train each model category. All datasets are balanced (50/50 phishing/legitimate).
Dataset Samples Features Used By
URL Features 108,034 121 statistical LR, RF URL, XGBoost URL
HTML Features 162,826 77 raw + 23 eng. = 100 RF HTML, XGBoost HTML
Combined Features 107,690 121 URL + 100 HTML = 221 RF Combined, XGBoost Combined
Clean URLs (CNN) 80,825 char-level encoding CNN URL
Raw HTML Files (CNN) 80,652 char-level encoding CNN HTML
Tree-ensemble models use 80/20 stratified train/test split with Optuna hyperparameter tuning (100 trials, 5-fold CV). CNN models use 85/10/5 train/val/test split with EarlyStopping.
URL Features 121 features +
All features extracted from the URL string. Hover over any feature to see its description.
Length & Structure
url_length
domain_length
path_length
query_length
url_length_category
domain_length_category
Character Counts
num_dots
num_hyphens
num_underscores
num_slashes
num_question_marks
num_ampersands
num_equals
num_at
num_percent
num_digits_url
num_letters_url
domain_dots
domain_hyphens
domain_digits
path_slashes
path_dots
path_digits
Character Ratios
digit_ratio_url
letter_ratio_url
special_char_ratio
digit_ratio_domain
symbol_ratio_domain
Domain Structure
num_subdomains
num_domain_parts
tld_length
sld_length
longest_domain_part
avg_domain_part_len
longest_part_gt_20
longest_part_gt_30
longest_part_gt_40
has_suspicious_tld
has_trusted_tld
has_port
has_non_std_port
domain_randomness_score
sld_consonant_cluster_score
sld_keyboard_pattern
sld_has_dictionary_word
sld_pronounceability_score
domain_digit_position_suspicious
Path Analysis
path_depth
max_path_segment_len
avg_path_segment_len
has_extension
extension_category
has_suspicious_extension
has_exe
has_double_slash
path_has_brand_not_domain
path_has_ip_pattern
suspicious_path_extension_combo
Query String
num_params
has_query
query_value_length
max_param_len
query_has_url
Statistical & Entropy
url_entropy
domain_entropy
path_entropy
max_consecutive_digits
max_consecutive_chars
max_consecutive_consonants
char_repeat_rate
unique_bigram_ratio
unique_trigram_ratio
sld_letter_diversity
domain_has_numbers_letters
url_complexity_score
Security Indicators
has_ip_address
has_at_symbol
has_redirect
is_shortened
is_free_hosting
is_free_platform
platform_subdomain_length
has_uuid_subdomain
is_http
Keywords & Brand Detection
num_phishing_keywords
phishing_in_domain
phishing_in_path
num_brands
brand_in_domain
brand_in_path
brand_impersonation
has_login
has_account
has_verify
has_secure
has_update
has_bank
has_password
has_suspend
has_webscr
has_cmd
has_cgi
brand_in_subdomain_not_domain
multiple_brands_in_url
brand_with_hyphen
suspicious_brand_tld
brand_keyword_combo
Encoding & Obfuscation
has_url_encoding
encoding_count
encoding_diff
has_punycode
has_unicode
has_hex_string
has_base64
has_lookalike_chars
mixed_script_score
homograph_brand_risk
suspected_idn_homograph
double_encoding
encoding_in_domain
suspicious_unicode_category
HTML Features 100 features (77 raw + 23 engineered) +
All features extracted from HTML source and DOM structure. Hover over any feature to see its description.
Document Size & Text
dom_depth
html_length
text_length
num_words
text_to_html_ratio
inline_css_length
num_tags
Metadata & Page Identity
has_title
has_description
has_keywords
has_author
has_copyright
has_viewport
has_favicon
num_meta_tags
DOM Elements & Layout
num_divs
num_spans
num_paragraphs
num_headings
num_lists
num_tables
num_images
num_iframes
num_hidden_iframes
num_data_uri_images
num_css_files
num_scripts
num_inline_scripts
num_inline_styles
num_input_fields
Link & Resource Analysis
num_links
num_internal_links
num_external_links
ratio_external_links
num_unique_external_domains
num_mailto_links
num_javascript_links
num_ip_based_links
num_suspicious_tld_links
num_empty_links
num_anchor_text_mismatch
num_external_css
num_external_images
num_external_scripts
Forms & Inputs
has_form
has_login_form
num_forms
num_email_fields
num_password_fields
num_text_fields
num_submit_buttons
num_hidden_fields
num_forms_without_labels
num_empty_form_actions
num_external_form_actions
password_with_external_action
Scripts & Dynamic Behavior
has_eval
has_escape
has_unescape
has_atob
has_base64
has_fromcharcode
has_document_write
has_window_open
has_location_replace
has_meta_refresh
num_onclick_events
num_onload_events
num_onerror_events
Visibility & Interaction Tricks
has_display_none
has_visibility_hidden
has_right_click_disabled
has_status_bar_customization
Contact & Social Engineering Signals
has_email_address
has_phone_number
num_brand_mentions
num_urgency_keywords
Engineered Features (23 computed)
Ratios, interactions, density metrics and risk scores computed from raw features via the engineer_features() pipeline.
Ratios
empty_to_total_links
external_to_total_links
forms_to_inputs_ratio
hidden_to_visible_inputs
iframes_to_tags_ratio
images_to_tags_ratio
password_to_inputs_ratio
scripts_to_tags_ratio
Interaction Features
brand_with_forms
external_scripts_links
forms_with_passwords
hidden_with_external
iframes_with_scripts
urgency_with_forms
Density Metrics
content_density
form_density
links_per_word
scripts_per_form
Risk Scores
form_risk_score
legitimacy_score
obfuscation_score
phishing_risk_score
has_suspicious_elements
3 models trained on 121 URL-based features extracted from the URL string structure, domain properties, encoding analysis, and brand impersonation detection.
Logistic Regression
Baseline
Baseline model using logistic regression on URL features with StandardScaler (z-score normalization). Serves as a benchmark for comparing more complex models.
93.71%
Accuracy
95.40%
Precision
91.84%
Recall
93.59%
F1-Score
0.9789
ROC-AUC
Confusion Matrix
Pred Legit
Pred Phish
Actual Legit
10,326
478
Actual Phish
881
9,922
Training Data
Dataseturl_features_108k.csv
Samples108,034
Train / Test86,427 / 21,607
Features121 URL statistical
PreprocessingStandardScaler
Class Balance50/50
Random Forest
Ensemble
Random Forest classifier trained on 121 URL-based statistical features. Hyperparameters optimized with Optuna (100 trials, 5-fold CV).
97.71%
Accuracy
99.06%
Precision
96.33%
Recall
97.68%
F1-Score
0.9958
ROC-AUC
97.36%
CV F1 (5-fold)
Confusion Matrix
Pred Legit
Pred Phish
Actual Legit
10,700
104
Actual Phish
408
10,395
Training Data
Dataseturl_features_108k.csv
Samples108,034
Train / Test86,427 / 21,607
Features121 URL statistical
Class Balance50/50
Top 20 Features by Importance
1domain_dots
0.0522
2domain_length
0.0488
3num_domain_parts
0.0426
4num_subdomains
0.0408
5digit_ratio_url
0.0333
6num_dots
0.0325
7domain_length_category
0.0321
8domain_entropy
0.0301
9avg_domain_part_len
0.0293
10symbol_ratio_domain
0.0293
11path_length
0.0289
12url_entropy
0.0273
13is_shortened
0.0253
14num_digits_url
0.0239
15max_consecutive_digits
0.0236
16path_entropy
0.0222
17max_path_segment_len
0.0218
18special_char_ratio
0.0212
19num_letters_url
0.0201
20url_complexity_score
0.0198
Hyperparameters
n_estimators610
max_depth43
min_samples_split2
min_samples_leaf1
max_featuressqrt
class_weightbalanced
XGBoost
Gradient Boosting
Gradient boosted decision tree model trained on 121 URL-based statistical features. Hyperparameters optimized with Optuna (100 trials, 5-fold CV).
98.07%
Accuracy
99.12%
Precision
97.00%
Recall
98.05%
F1-Score
0.9963
ROC-AUC
97.90%
CV F1 (5-fold)
Confusion Matrix
Pred Legit
Pred Phish
Actual Legit
10,698
106
Actual Phish
359
10,444
Training Data
Dataseturl_features_108k.csv
Samples108,034
Train / Test86,427 / 21,607
Features121 URL statistical
Class Balance50/50
Top 20 Features by Importance
1domain_dots
0.3048
2is_shortened
0.1855
3num_subdomains
0.0850
4is_free_platform
0.0363
5multiple_brands_in_url
0.0341
6num_domain_parts
0.0302
7encoding_diff
0.0190
8platform_subdomain_length
0.0139
9is_http
0.0138
10domain_hyphens
0.0131
11path_digits
0.0127
12path_slashes
0.0116
13avg_domain_part_len
0.0113
14domain_length
0.0111
15tld_length
0.0093
16path_depth
0.0086
17symbol_ratio_domain
0.0074
18encoding_count
0.0056
19brand_in_path
0.0052
20num_hyphens
0.0050
Hyperparameters
n_estimators626
max_depth10
learning_rate0.074
subsample0.963
colsample_bytree0.670
min_child_weight1
gamma0.043
reg_alpha0.056
reg_lambda0.171
2 models trained on 100 HTML-based features extracted from the page structure, forms, scripts, links, and content analysis of downloaded web pages.
Random Forest HTML
Ensemble
Random Forest classifier trained on 100 HTML content features (77 raw + 23 engineered). Hyperparameters optimized with Optuna (100 trials, 5-fold CV).
89.77%
Accuracy
91.96%
Precision
87.16%
Recall
89.49%
F1-Score
0.9632
ROC-AUC
89.18%
CV F1 (5-fold)
Confusion Matrix
Pred Legit
Pred Phish
Actual Legit
15,012
1,271
Actual Phish
2,099
14,184
Training Data
Datasethtml_features_162k.csv
Samples162,826
Train / Test130,260 / 32,566
Features77 raw + 23 engineered = 100
Class Balance50/50
Hyperparameters
n_estimators512
max_depth43
min_samples_split3
min_samples_leaf2
max_featuressqrt
class_weightnone
Top 15 Features by Importance
1num_links
0.0515
2num_tags
0.0473
3num_words
0.0398
4text_length
0.0375
5html_length
0.0345
6external_scripts_links
0.0341
7num_divs
0.0309
8num_unique_external_domains
0.0301
9num_internal_links
0.0274
10text_to_html_ratio
0.0267
11links_per_word
0.0251
12content_density
0.0230
13num_external_links
0.0224
14num_meta_tags
0.0224
15num_scripts
0.0195
XGBoost HTML
Gradient Boosting
Gradient boosted decision tree model trained on 100 HTML content features (77 raw + 23 engineered). Hyperparameters optimized with Optuna (100 trials, 5-fold CV).
89.75%
Accuracy
90.98%
Precision
88.25%
Recall
89.60%
F1-Score
0.9631
ROC-AUC
89.38%
CV F1 (5-fold)
Confusion Matrix
Pred Legit
Pred Phish
Actual Legit
14,747
1,536
Actual Phish
2,025
14,258
Training Data
Datasethtml_features_162k.csv
Samples162,826
Train / Test130,260 / 32,566
Features77 raw + 23 engineered = 100
Class Balance50/50
Hyperparameters
n_estimators520
max_depth10
learning_rate0.052
subsample0.731
colsample_bytree0.977
min_child_weight1
gamma0.851
reg_alpha0.641
reg_lambda0.038
Top 15 Features by Importance
1num_links
0.0659
2has_suspicious_elements
0.0450
3has_email_address
0.0428
4has_atob
0.0294
5phishing_risk_score
0.0220
6has_description
0.0193
7hidden_to_visible_inputs
0.0171
8num_scripts
0.0171
9num_divs
0.0169
10has_viewport
0.0163
11num_mailto_links
0.0150
12num_internal_links
0.0147
13num_hidden_iframes
0.0147
14external_scripts_links
0.0143
15num_onload_events
0.0143
2 models trained on 221 combined features (121 URL + 100 HTML) for maximum detection accuracy.
Random Forest Combined
Ensemble
Random Forest classifier on combined URL (121) + HTML (100) features = 221 total features. Hyperparameters optimized with Optuna (100 trials, 5-fold CV).
98.60%
Accuracy
99.16%
Precision
98.02%
Recall
98.59%
F1-Score
0.9990
ROC-AUC
98.59%
CV F1 (5-fold)
Confusion Matrix
Pred Legit
Pred Phish
Actual Legit
10,680
89
Actual Phish
213
10,556
Training Data
Datasetcombined_features.csv
Samples107,690
Train / Test86,152 / 21,538
Features121 URL + 100 HTML = 221
Class Balance50/50
Feature Importance Split
URL 29.1%
HTML 70.9%
Top 15 Features by Importance
1html_num_links
0.0640
2html_text_length
0.0577
3html_num_tags
0.0479
4html_num_internal_links
0.0463
5html_num_words
0.0422
6html_external_scripts_links
0.0361
7html_num_divs
0.0297
8html_num_lists
0.0291
9html_num_external_links
0.0276
10html_has_description
0.0258
11html_num_unique_external_domains
0.0236
12html_num_images
0.0231
13html_num_spans
0.0226
14html_num_headings
0.0220
15html_dom_depth
0.0210
Hyperparameters
n_estimators533
max_depth43
min_samples_split2
min_samples_leaf1
max_featuressqrt
class_weightbalanced
XGBoost Combined
Gradient Boosting
Best-performing model. Gradient boosted trees on combined URL (121) + HTML (100) = 221 features. Hyperparameters optimized with Optuna (100 trials, 5-fold CV).
99.01%
Accuracy
99.35%
Precision
98.66%
Recall
99.01%
F1-Score
0.9991
ROC-AUC
98.90%
CV F1 (5-fold)
Confusion Matrix
Pred Legit
Pred Phish
Actual Legit
10,700
69
Actual Phish
144
10,625
Training Data
Datasetcombined_features.csv
Samples107,690
Train / Test86,152 / 21,538
Features121 URL + 100 HTML = 221
Class Balance50/50
Feature Importance Split
URL 37.1%
HTML 62.9%
Top 15 Features by Importance
1html_num_links
0.4420
2url_is_shortened
0.0427
3url_platform_subdomain_length
0.0397
4url_domain_dots
0.0315
5html_has_fromcharcode
0.0296
6url_num_domain_parts
0.0269
7html_has_meta_refresh
0.0148
8url_is_http
0.0126
9url_encoding_diff
0.0124
10url_path_digits
0.0116
11html_text_length
0.0107
12url_path_slashes
0.0105
13url_multiple_brands_in_url
0.0103
14url_brand_in_path
0.0102
15url_domain_hyphens
0.0095
Hyperparameters
n_estimators726
max_depth6
learning_rate0.137
subsample0.698
colsample_bytree0.967
min_child_weight1
gamma0.048
reg_alpha0.413
reg_lambda0.495
2 character-level CNN models that process raw text directly — no hand-crafted features needed. Parallel Conv1D branches capture character n-gram patterns at different scales.
CNN URL (Char-level)
Deep Learning
Character-level CNN that processes raw URL strings without manual feature engineering. Uses parallel convolutional filters to capture character n-gram patterns indicative of phishing URLs.
98.38%
Accuracy
98.88%
Precision
97.86%
Recall
98.37%
F1-Score
0.9976
ROC-AUC
Confusion Matrix
Pred Legit
Pred Phish
Actual Legit
8,013
90
Actual Phish
173
7,930
Training Data
Datasetclean_dataset.csv
Samples80,825
Split85/10/5 (train/val/test)
InputRaw URL chars → integer encoding
Class Balance50/50
Architecture
InputRaw URL characters
max_len800
vocab_size89 (87 ASCII + PAD + UNK)
Embedding89 × 64
Conv1D branches3 × 128 filters (k=2, 3, 5)
PoolingGlobalMaxPool1D per branch
Dense384 → 128 (ReLU) → 1 (sigmoid)
Dropout0.5
OptimizerAdam
Lossbinary_crossentropy
Epochs20 (EarlyStopping)
Batch size256
CNN HTML (Char-level)
Deep Learning
Character-level CNN that processes raw HTML source code. Uses larger convolutional kernels (3, 5, 7) to capture longer HTML patterns like tag structures, form elements, and script obfuscation patterns.
96.33%
Accuracy
98.18%
Precision
94.41%
Recall
96.26%
F1-Score
0.9908
ROC-AUC
Confusion Matrix
Pred Legit
Pred Phish
Actual Legit
5,943
106
Actual Phish
338
5,711
Training Data
Datasethtml/phishing + html/legitimate
Samples80,652
Split85/10/5 (train/val/test)
InputRaw HTML chars → integer encoding
Class Balance50/50 (auto-downsampled)
Architecture
InputRaw HTML source
max_len5,000
vocab_size95 (93 ASCII + PAD + UNK)
Embedding95 × 64
Conv1D branches3 × 128 filters (k=3, 5, 7)
PoolingGlobalMaxPool1D per branch
Dense384 → 128 (ReLU) → 1 (sigmoid)
Dropout0.5
OptimizerAdam
Lossbinary_crossentropy
Epochs20 (EarlyStopping)
Batch size128
Side-by-side comparison of all 9 models across URL, HTML, Combined, and CNN categories.
All Models
Model Category Accuracy Precision Recall F1-Score ROC-AUC Features
Logistic Regression URL 93.71% 95.40% 91.84% 93.59% 0.9789 121
Random Forest URL 97.71% 99.06% 96.33% 97.68% 0.9958 121
XGBoost URL 98.07% 99.12% 97.00% 98.05% 0.9963 121
Random Forest HTML HTML 89.77% 91.96% 87.16% 89.49% 0.9632 100
XGBoost HTML HTML 89.75% 90.98% 88.25% 89.60% 0.9631 100
RF Combined Combined 98.60% 99.16% 98.02% 98.59% 0.9990 221
XGBoost Combined Combined 99.01% 99.35% 98.66% 99.01% 0.9991 221
CNN URL CNN 98.38% 98.88% 97.86% 98.37% 0.9976 chars
CNN HTML CNN 96.33% 98.18% 94.41% 96.26% 0.9908 chars
Key Insights
Best Overall
XGBoost Combined
99.01% accuracy, 99.35% precision — best performance by combining 121 URL + 100 HTML features.
Ensemble Strength
9-Model Consensus
Combining 3 URL + 2 HTML + 2 Combined + 2 CNN models via majority vote maximizes reliability.
Top Signal
html_num_links
Number of links in HTML dominates XGBoost Combined at 44.2% importance — the single strongest feature.
Overfit Warning
HTML Models
RF HTML and XGBoost HTML show 7-8% train-test gap, indicating moderate overfitting. Combined models mitigate this with URL features.