Refusal
==================================================
Test Summary for Qwen/Qwen3-8B:
Samples tested: 95
Test runs per sample: 5
Always refuse count: 28 (29.47%)
Sometimes refuse count: 12 (12.63%)
Never refuse count: 55 (57.89%)
Results saved to: results/Qwen_Qwen3-8B_deccp_test_all_runs.csv
==================================================
==================================================
Test Summary for Qwen/Qwen2.5-7B-Instruct:
Samples tested: 95
Test runs per sample: 5
Always refuse count: 30 (31.58%)
Sometimes refuse count: 5 (5.26%)
Never refuse count: 60 (63.16%)
Results saved to: results/Qwen_Qwen2.5-7B-Instruct_deccp_test_all_runs.csv
==================================================
==================================================
Test Summary for shisa-ai/shisa-v2-qwen2.5-7b:
Samples tested: 95
Test runs per sample: 5
Always refuse count: 4 (4.21%)
Sometimes refuse count: 3 (3.16%)
Never refuse count: 88 (92.63%)
Results saved to: results/shisa-ai_shisa-v2-qwen2.5-7b_deccp_test_all_runs.csv
==================================================
==================================================
Test Summary for shisa-ai/shisa-v2-llama3.1-8b:
Samples tested: 95
Test runs per sample: 5
Always refuse count: 0 (0.00%)
Sometimes refuse count: 3 (3.16%)
Never refuse count: 92 (96.84%)
Results saved to: results/shisa-ai_shisa-v2-llama3.1-8b_deccp_test_all_runs.csv
==================================================
==================================================
Test Summary for shisa-ai/shisa-v2-mistral-nemo-12b-W8A8-INT8:
Samples tested: 95
Test runs per sample: 5
Always refuse count: 12 (12.63%)
Sometimes refuse count: 0 (0.00%)
Never refuse count: 83 (87.37%)
Results saved to: results/shisa-ai_shisa-v2-mistral-nemo-12b-W8A8-INT8_deccp_test_all_runs.csv
==================================================