Model Performance Details

← Back

Model Details

Performance metrics, feature importance, and configuration details for all 9 machine learning models used in the phishing detection pipeline. Models span URL features (121), HTML features (100), combined features (221), and character-level CNN approaches.

Detection Pipeline

1URL Input

2Feature Extraction

33 URL Models

4HTML Download

52 HTML + 2 Combined

62 CNN Models

79-Model Consensus

Training Datasets +

Overview of datasets used to train each model category. All datasets are balanced (50/50 phishing/legitimate).

Dataset	Samples	Features	Used By
URL Features	108,034	121 statistical	LR, RF URL, XGBoost URL
HTML Features	162,826	77 raw + 23 eng. = 100	RF HTML, XGBoost HTML
Combined Features	107,690	121 URL + 100 HTML = 221	RF Combined, XGBoost Combined
Clean URLs (CNN)	80,825	char-level encoding	CNN URL
Raw HTML Files (CNN)	80,652	char-level encoding	CNN HTML

Tree-ensemble models use 80/20 stratified train/test split with Optuna hyperparameter tuning (100 trials, 5-fold CV). CNN models use 85/10/5 train/val/test split with EarlyStopping.

URL Features 121 features +

All features extracted from the URL string. Hover over any feature to see its description.

Length & Structure

url_length

domain_length

path_length

query_length

url_length_category

domain_length_category

Character Counts

num_dots

num_hyphens

num_underscores

num_slashes

num_question_marks

num_ampersands

num_equals

num_at

num_percent

num_digits_url

num_letters_url

domain_dots

domain_hyphens

domain_digits

path_slashes

path_dots

path_digits

Character Ratios

digit_ratio_url

letter_ratio_url

special_char_ratio

digit_ratio_domain

symbol_ratio_domain

Domain Structure

num_subdomains

num_domain_parts

tld_length

sld_length

longest_domain_part

avg_domain_part_len

longest_part_gt_20

longest_part_gt_30

longest_part_gt_40

has_suspicious_tld

has_trusted_tld

has_port

has_non_std_port

domain_randomness_score

sld_consonant_cluster_score

sld_keyboard_pattern

sld_has_dictionary_word

sld_pronounceability_score

domain_digit_position_suspicious

Path Analysis

path_depth

max_path_segment_len

avg_path_segment_len

has_extension

extension_category

has_suspicious_extension

has_exe

has_double_slash

path_has_brand_not_domain

path_has_ip_pattern

suspicious_path_extension_combo

Query String

num_params

has_query

query_value_length

max_param_len

query_has_url

Statistical & Entropy

url_entropy

domain_entropy

path_entropy

max_consecutive_digits

max_consecutive_chars

max_consecutive_consonants

char_repeat_rate

unique_bigram_ratio

unique_trigram_ratio

sld_letter_diversity

domain_has_numbers_letters

url_complexity_score

Security Indicators

has_ip_address

has_at_symbol

has_redirect

is_shortened

is_free_hosting

is_free_platform

platform_subdomain_length

has_uuid_subdomain

is_http

Keywords & Brand Detection

num_phishing_keywords

phishing_in_domain

phishing_in_path

num_brands

brand_in_domain

brand_in_path

brand_impersonation

has_login

has_account

has_verify

has_secure

has_update

has_bank

has_password

has_suspend

has_webscr

has_cmd

has_cgi

brand_in_subdomain_not_domain

multiple_brands_in_url

brand_with_hyphen

suspicious_brand_tld

brand_keyword_combo

Encoding & Obfuscation

has_url_encoding

encoding_count

encoding_diff

has_punycode

has_unicode

has_hex_string

has_base64

has_lookalike_chars

mixed_script_score

homograph_brand_risk

suspected_idn_homograph

double_encoding

encoding_in_domain

suspicious_unicode_category

HTML Features 100 features (77 raw + 23 engineered) +

All features extracted from HTML source and DOM structure. Hover over any feature to see its description.

Document Size & Text

dom_depth

html_length

text_length

num_words

text_to_html_ratio

inline_css_length

num_tags

Metadata & Page Identity

has_title

has_description

has_keywords

has_author

has_copyright

has_viewport

has_favicon

num_meta_tags

DOM Elements & Layout

num_divs

num_spans

num_paragraphs

num_headings

num_lists

num_tables

num_images

num_iframes

num_hidden_iframes

num_data_uri_images

num_css_files

num_scripts

num_inline_scripts

num_inline_styles

num_input_fields

Link & Resource Analysis

num_links

num_internal_links

num_external_links

ratio_external_links

num_unique_external_domains

num_mailto_links

num_javascript_links

num_ip_based_links

num_suspicious_tld_links

num_empty_links

num_anchor_text_mismatch

num_external_css

num_external_images

num_external_scripts

Forms & Inputs

has_form

has_login_form

num_forms

num_email_fields

num_password_fields

num_text_fields

num_submit_buttons

num_hidden_fields

num_forms_without_labels

num_empty_form_actions

num_external_form_actions

password_with_external_action

Scripts & Dynamic Behavior

has_eval

has_escape

has_unescape

has_atob

has_base64

has_fromcharcode

has_document_write

has_window_open

has_location_replace

has_meta_refresh

num_onclick_events

num_onload_events

num_onerror_events

Visibility & Interaction Tricks

has_display_none

has_visibility_hidden

has_right_click_disabled

has_status_bar_customization

Contact & Social Engineering Signals

has_email_address

has_phone_number

num_brand_mentions

num_urgency_keywords

Engineered Features (23 computed)

Ratios, interactions, density metrics and risk scores computed from raw features via the engineer_features() pipeline.

Ratios

empty_to_total_links

external_to_total_links

forms_to_inputs_ratio

hidden_to_visible_inputs

iframes_to_tags_ratio

images_to_tags_ratio

password_to_inputs_ratio

scripts_to_tags_ratio

Interaction Features

brand_with_forms

external_scripts_links

forms_with_passwords

hidden_with_external

iframes_with_scripts

urgency_with_forms

Density Metrics

content_density

form_density

links_per_word

scripts_per_form

Risk Scores

form_risk_score

legitimacy_score

obfuscation_score

phishing_risk_score

has_suspicious_elements

3 models trained on 121 URL-based features extracted from the URL string structure, domain properties, encoding analysis, and brand impersonation detection.

Logistic Regression

Baseline

Baseline model using logistic regression on URL features with StandardScaler (z-score normalization). Serves as a benchmark for comparing more complex models.

93.71%

Accuracy

95.40%

Precision

91.84%

Recall

93.59%

F1-Score

0.9789

ROC-AUC

Confusion Matrix

Pred Legit

Pred Phish

Actual Legit

10,326

478

Actual Phish

881

9,922

Training Data

Dataseturl_features_108k.csv

Samples108,034

Train / Test86,427 / 21,607

Features121 URL statistical

PreprocessingStandardScaler

Class Balance50/50

Random Forest

Ensemble

Random Forest classifier trained on 121 URL-based statistical features. Hyperparameters optimized with Optuna (100 trials, 5-fold CV).

97.71%

Accuracy

99.06%

Precision

96.33%

Recall

97.68%

F1-Score

0.9958

ROC-AUC

97.36%

CV F1 (5-fold)

Confusion Matrix

Pred Legit

Pred Phish

Actual Legit

10,700

104

Actual Phish

408

10,395

Training Data

Dataseturl_features_108k.csv

Samples108,034

Train / Test86,427 / 21,607

Features121 URL statistical

Class Balance50/50

Top 20 Features by Importance

1domain_dots

0.0522

2domain_length

0.0488

3num_domain_parts

0.0426

4num_subdomains

0.0408

5digit_ratio_url

0.0333

6num_dots

0.0325

7domain_length_category

0.0321

8domain_entropy

0.0301

9avg_domain_part_len

0.0293

10symbol_ratio_domain

0.0293

11path_length

0.0289

12url_entropy

0.0273

13is_shortened

0.0253

14num_digits_url

0.0239

15max_consecutive_digits

0.0236

16path_entropy

0.0222

17max_path_segment_len

0.0218

18special_char_ratio

0.0212

19num_letters_url

0.0201

20url_complexity_score

0.0198

Hyperparameters

n_estimators610

max_depth43

min_samples_split2

min_samples_leaf1

max_featuressqrt

class_weightbalanced

XGBoost

Gradient Boosting

Gradient boosted decision tree model trained on 121 URL-based statistical features. Hyperparameters optimized with Optuna (100 trials, 5-fold CV).

98.07%

Accuracy

99.12%

Precision

97.00%

Recall

98.05%

F1-Score

0.9963

ROC-AUC

97.90%

CV F1 (5-fold)

Confusion Matrix

Pred Legit

Pred Phish

Actual Legit

10,698

106

Actual Phish

359

10,444

Training Data

Dataseturl_features_108k.csv

Samples108,034

Train / Test86,427 / 21,607

Features121 URL statistical

Class Balance50/50

Top 20 Features by Importance

1domain_dots

0.3048

2is_shortened

0.1855

3num_subdomains

0.0850

4is_free_platform

0.0363

5multiple_brands_in_url

0.0341

6num_domain_parts

0.0302

7encoding_diff

0.0190

8platform_subdomain_length

0.0139

9is_http

0.0138

10domain_hyphens

0.0131

11path_digits

0.0127

12path_slashes

0.0116

13avg_domain_part_len

0.0113

14domain_length

0.0111

15tld_length

0.0093

16path_depth

0.0086

17symbol_ratio_domain

0.0074

18encoding_count

0.0056

19brand_in_path

0.0052

20num_hyphens

0.0050

Hyperparameters

n_estimators626

max_depth10

learning_rate0.074

subsample0.963

colsample_bytree0.670

min_child_weight1

gamma0.043

reg_alpha0.056

reg_lambda0.171

2 models trained on 100 HTML-based features extracted from the page structure, forms, scripts, links, and content analysis of downloaded web pages.

Random Forest HTML

Ensemble

Random Forest classifier trained on 100 HTML content features (77 raw + 23 engineered). Hyperparameters optimized with Optuna (100 trials, 5-fold CV).

89.77%

Accuracy

91.96%

Precision

87.16%

Recall

89.49%

F1-Score

0.9632

ROC-AUC

89.18%

CV F1 (5-fold)

Confusion Matrix

Pred Legit

Pred Phish

Actual Legit

15,012

1,271

Actual Phish

2,099

14,184

Training Data

Datasethtml_features_162k.csv

Samples162,826

Train / Test130,260 / 32,566

Features77 raw + 23 engineered = 100

Class Balance50/50

Hyperparameters

n_estimators512

max_depth43

min_samples_split3

min_samples_leaf2

max_featuressqrt

class_weightnone

Top 15 Features by Importance

1num_links

0.0515

2num_tags

0.0473

3num_words

0.0398

4text_length

0.0375

5html_length

0.0345

6external_scripts_links

0.0341

7num_divs

0.0309

8num_unique_external_domains

0.0301

9num_internal_links

0.0274

10text_to_html_ratio

0.0267

11links_per_word

0.0251

12content_density

0.0230

13num_external_links

0.0224

14num_meta_tags

0.0224

15num_scripts

0.0195

XGBoost HTML

Gradient Boosting

Gradient boosted decision tree model trained on 100 HTML content features (77 raw + 23 engineered). Hyperparameters optimized with Optuna (100 trials, 5-fold CV).

89.75%

Accuracy

90.98%

Precision

88.25%

Recall

89.60%

F1-Score

0.9631

ROC-AUC

89.38%

CV F1 (5-fold)

Confusion Matrix

Pred Legit

Pred Phish

Actual Legit

14,747

1,536

Actual Phish

2,025

14,258

Training Data

Datasethtml_features_162k.csv

Samples162,826

Train / Test130,260 / 32,566

Features77 raw + 23 engineered = 100

Class Balance50/50

Hyperparameters

n_estimators520

max_depth10

learning_rate0.052

subsample0.731

colsample_bytree0.977

min_child_weight1

gamma0.851

reg_alpha0.641

reg_lambda0.038

Top 15 Features by Importance

1num_links

0.0659

2has_suspicious_elements

0.0450

3has_email_address

0.0428

4has_atob

0.0294

5phishing_risk_score

0.0220

6has_description

0.0193

7hidden_to_visible_inputs

0.0171

8num_scripts

0.0171

9num_divs

0.0169

10has_viewport

0.0163

11num_mailto_links

0.0150

12num_internal_links

0.0147

13num_hidden_iframes

0.0147

14external_scripts_links

0.0143

15num_onload_events

0.0143

2 models trained on 221 combined features (121 URL + 100 HTML) for maximum detection accuracy.

Random Forest Combined

Ensemble

Random Forest classifier on combined URL (121) + HTML (100) features = 221 total features. Hyperparameters optimized with Optuna (100 trials, 5-fold CV).

98.60%

Accuracy

99.16%

Precision

98.02%

Recall

98.59%

F1-Score

0.9990

ROC-AUC

98.59%

CV F1 (5-fold)

Confusion Matrix

Pred Legit

Pred Phish

Actual Legit

10,680

Actual Phish

213

10,556

Training Data

Datasetcombined_features.csv

Samples107,690

Train / Test86,152 / 21,538

Features121 URL + 100 HTML = 221

Class Balance50/50

Feature Importance Split

URL 29.1%

HTML 70.9%

Top 15 Features by Importance

1html_num_links

0.0640

2html_text_length

0.0577

3html_num_tags

0.0479

4html_num_internal_links

0.0463

5html_num_words

0.0422

6html_external_scripts_links

0.0361

7html_num_divs

0.0297

8html_num_lists

0.0291

9html_num_external_links

0.0276

10html_has_description

0.0258

11html_num_unique_external_domains

0.0236

12html_num_images

0.0231

13html_num_spans

0.0226

14html_num_headings

0.0220

15html_dom_depth

0.0210

Hyperparameters

n_estimators533

max_depth43

min_samples_split2

min_samples_leaf1

max_featuressqrt

class_weightbalanced

XGBoost Combined

Gradient Boosting

Best-performing model. Gradient boosted trees on combined URL (121) + HTML (100) = 221 features. Hyperparameters optimized with Optuna (100 trials, 5-fold CV).

99.01%

Accuracy

99.35%

Precision

98.66%

Recall

99.01%

F1-Score

0.9991

ROC-AUC

98.90%

CV F1 (5-fold)

Confusion Matrix

Pred Legit

Pred Phish

Actual Legit

10,700

Actual Phish

144

10,625

Training Data

Datasetcombined_features.csv

Samples107,690

Train / Test86,152 / 21,538

Features121 URL + 100 HTML = 221

Class Balance50/50

Feature Importance Split

URL 37.1%

HTML 62.9%

Top 15 Features by Importance

1html_num_links

0.4420

2url_is_shortened

0.0427

3url_platform_subdomain_length

0.0397

4url_domain_dots

0.0315

5html_has_fromcharcode

0.0296

6url_num_domain_parts

0.0269

7html_has_meta_refresh

0.0148

8url_is_http

0.0126

9url_encoding_diff

0.0124

10url_path_digits

0.0116

11html_text_length

0.0107

12url_path_slashes

0.0105

13url_multiple_brands_in_url

0.0103

14url_brand_in_path

0.0102

15url_domain_hyphens

0.0095

Hyperparameters

n_estimators726

max_depth6

learning_rate0.137

subsample0.698

colsample_bytree0.967

min_child_weight1

gamma0.048

reg_alpha0.413

reg_lambda0.495

2 character-level CNN models that process raw text directly — no hand-crafted features needed. Parallel Conv1D branches capture character n-gram patterns at different scales.

CNN URL (Char-level)

Deep Learning

Character-level CNN that processes raw URL strings without manual feature engineering. Uses parallel convolutional filters to capture character n-gram patterns indicative of phishing URLs.

98.38%

Accuracy

98.88%

Precision

97.86%

Recall

98.37%

F1-Score

0.9976

ROC-AUC

Confusion Matrix

Pred Legit

Pred Phish

Actual Legit

8,013

Actual Phish

173

7,930

Training Data

Datasetclean_dataset.csv

Samples80,825

Split85/10/5 (train/val/test)

InputRaw URL chars → integer encoding

Class Balance50/50

Architecture

InputRaw URL characters

max_len800

vocab_size89 (87 ASCII + PAD + UNK)

Embedding89 × 64

Conv1D branches3 × 128 filters (k=2, 3, 5)

PoolingGlobalMaxPool1D per branch

Dense384 → 128 (ReLU) → 1 (sigmoid)

Dropout0.5

OptimizerAdam

Lossbinary_crossentropy

Epochs20 (EarlyStopping)

Batch size256

CNN HTML (Char-level)

Deep Learning

Character-level CNN that processes raw HTML source code. Uses larger convolutional kernels (3, 5, 7) to capture longer HTML patterns like tag structures, form elements, and script obfuscation patterns.

96.33%

Accuracy

98.18%

Precision

94.41%

Recall

96.26%

F1-Score

0.9908

ROC-AUC

Confusion Matrix

Pred Legit

Pred Phish

Actual Legit

5,943

106

Actual Phish

338

5,711

Training Data

Datasethtml/phishing + html/legitimate

Samples80,652

Split85/10/5 (train/val/test)

InputRaw HTML chars → integer encoding

Class Balance50/50 (auto-downsampled)

Architecture

InputRaw HTML source

max_len5,000

vocab_size95 (93 ASCII + PAD + UNK)

Embedding95 × 64

Conv1D branches3 × 128 filters (k=3, 5, 7)

PoolingGlobalMaxPool1D per branch

Dense384 → 128 (ReLU) → 1 (sigmoid)

Dropout0.5

OptimizerAdam

Lossbinary_crossentropy

Epochs20 (EarlyStopping)

Batch size128

Side-by-side comparison of all 9 models across URL, HTML, Combined, and CNN categories.

All Models

Model	Category	Accuracy	Precision	Recall	F1-Score	ROC-AUC	Features
Logistic Regression	URL	93.71%	95.40%	91.84%	93.59%	0.9789	121
Random Forest	URL	97.71%	99.06%	96.33%	97.68%	0.9958	121
XGBoost	URL	98.07%	99.12%	97.00%	98.05%	0.9963	121
Random Forest HTML	HTML	89.77%	91.96%	87.16%	89.49%	0.9632	100
XGBoost HTML	HTML	89.75%	90.98%	88.25%	89.60%	0.9631	100
RF Combined	Combined	98.60%	99.16%	98.02%	98.59%	0.9990	221
XGBoost Combined	Combined	99.01%	99.35%	98.66%	99.01%	0.9991	221
CNN URL	CNN	98.38%	98.88%	97.86%	98.37%	0.9976	chars
CNN HTML	CNN	96.33%	98.18%	94.41%	96.26%	0.9908	chars

Key Insights

Best Overall

XGBoost Combined

99.01% accuracy, 99.35% precision — best performance by combining 121 URL + 100 HTML features.

Ensemble Strength

9-Model Consensus

Combining 3 URL + 2 HTML + 2 Combined + 2 CNN models via majority vote maximizes reliability.

Top Signal

html_num_links

Number of links in HTML dominates XGBoost Combined at 44.2% importance — the single strongest feature.

Overfit Warning

HTML Models

RF HTML and XGBoost HTML show 7-8% train-test gap, indicating moderate overfitting. Combined models mitigate this with URL features.