使用技術(shù):java+Selenium
廢話:
????????有爬蟲,自然就有反爬蟲,就像病毒和殺毒軟件一樣,有攻就有防,兩者彼此推進(jìn)發(fā)展。而目前最流行的反爬技術(shù)驗(yàn)證碼,為了防止爬蟲自動(dòng)注冊,批量生成垃圾賬號,幾乎所有網(wǎng)站的注冊頁面都會用到驗(yàn)證碼技術(shù)。其實(shí)驗(yàn)證碼的英文為 CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart),翻譯成中文就是全自動(dòng)區(qū)分計(jì)算機(jī)和人類的公開圖靈測試,它是一種可以區(qū)分用戶是計(jì)算機(jī)還是人的測試,只要能通過 CAPTCHA 測試,該用戶就可以被認(rèn)為是人類。由此也可知道破解滑塊驗(yàn)證碼的關(guān)鍵即是讓計(jì)算機(jī)更好的模擬人的行為
破解無缺口滑塊
無缺口滑塊如下圖:
?
?滑塊代碼:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="0">
<meta http-equiv="X-UA-Compatible" content="IE-Edge,chrome=1">
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,minimum-scale=1,user-scalable=no">
<meta content="yes" name="apple-mobile-web-app-capable">
<meta content="black" name="apple-mobile-web-app-status-bar-style">
<meta content="telephone=no" name="format-detection">
<meta content="email=no" name="format-detection">
<title>拖動(dòng)滑塊驗(yàn)證</title>
<meta name="description" content="">
<meta name="keywords" content="">
<link rel="stylesheet" type="text/css" href="">
<style>
* {
margin: 0;
padding: 0;
}
body {
font: 12px/1.125 Microsoft YaHei;
background: #fff;
}
ul, li {
list-style: none;
}
a {
text-decoration: none;
}
.ani {
transition: all .3s;
}
.wrap {
width: 300px;
height: 350px;
text-align: center;
margin: 150px auto;
}
.inner {
padding: 15px;
}
.clearfix {
overflow: hidden;
_zoom: 1;
}
.none {
display: none;
}
#slider {
position: relative;
background-color: #e8e8e8;
width: 300px;
height: 34px;
line-height: 34px;
text-align: center;
}
#slider .handler {
position: absolute;
top: 0px;
left: 0px;
width: 40px;
height: 32px;
border: 1px solid #ccc;
cursor: move;
}
.handler_bg {
background: #fff url("") no-repeat center;
}
.handler_ok_bg {
background: #fff url("") no-repeat center;
}
#slider .drag_bg {
background-color: #7ac23c;
height: 34px;
width: 0px;
}
#slider .drag_text {
position: absolute;
top: 0px;
width: 300px;
-moz-user-select: none;
-webkit-user-select: none;
user-select: none;
-o-user-select: none;
-ms-user-select: none;
}
.unselect {
-moz-user-select: none;
-webkit-user-select: none;
-ms-user-select: none;
}
.slide_ok {
color: #fff;
}
</style>
</head>
<body>
<div class="wrap">
<div id="slider">
<div class="drag_bg"></div>
<div class="drag_text" onselectstart="return false;" unselectable="on">拖動(dòng)滑塊驗(yàn)證</div>
<div class="handler handler_bg"></div>
</div>
</div>
<script>
(function (window, document, undefined) {
var dog = {//聲明一個(gè)命名空間,或者稱為對象
$: function (id) {
return document.querySelector(id);
},
on: function (el, type, handler) {
el.addEventListener(type, handler, false);
},
off: function (el, type, handler) {
el.removeEventListener(type, handler, false);
}
};
//封裝一個(gè)滑塊類
function Slider() {
var args = arguments[0];
for (var i in args) {
this[i] = args[i]; //一種快捷的初始化配置
}
//直接進(jìn)行函數(shù)初始化,表示生成實(shí)例對象就會執(zhí)行初始化
this.init();
}
Slider.prototype = {
constructor: Slider,
init: function () {
this.getDom();
this.dragBar(this.handler);
},
getDom: function () {
this.slider = dog.$('#' + this.id);
this.handler = dog.$('.handler');
this.bg = dog.$('.drag_bg');
},
dragBar: function (handler) {
var that = this,
startX = 0,
lastX = 0,
doc = document,
width = this.slider.offsetWidth,
max = width - handler.offsetWidth,
drag = {
down: function (e) {
var e = e || window.event;
that.slider.classList.add('unselect');
startX = e.clientX - handler.offsetLeft;
console.log('startX: ' + startX + ' px');
dog.on(doc, 'mousemove', drag.move);
dog.on(doc, 'mouseup', drag.up);
return false;
},
move: function (e) {
var e = e || window.event;
lastX = e.clientX - startX;
lastX = Math.max(0, Math.min(max, lastX)); //這一步表示距離大于0小于max,巧妙寫法
console.log('lastX: ' + lastX + ' px');
if (lastX >= max) {
handler.classList.add('handler_ok_bg');
that.slider.classList.add('slide_ok');
dog.off(handler, 'mousedown', drag.down);
drag.up();
}
that.bg.style.width = lastX + 'px';
handler.style.left = lastX + 'px';
},
up: function (e) {
var e = e || window.event;
that.slider.classList.remove('unselect');
if (lastX < width) {
that.bg.classList.add('ani');
handler.classList.add('ani');
that.bg.style.width = 0;
handler.style.left = 0;
setTimeout(function () {
that.bg.classList.remove('ani');
handler.classList.remove('ani');
}, 300);
}
dog.off(doc, 'mousemove', drag.move);
dog.off(doc, 'mouseup', drag.up);
}
};
dog.on(handler, 'mousedown', drag.down);
}
};
window.S = window.Slider = Slider;
})(window, document);
var defaults = {
id: 'slider'
};
new S(defaults);
</script>
</body>
</html>
分析
1.查看滑塊按鈕大小
?2.查看滑塊大小
?從上面2張圖得出拖動(dòng)距離為(300-40)px
爬蟲代碼
public static void main(String[] args) throws Exception {
System.setProperty("webdriver.chrome.driver","D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
try {
driver.get("file:///C:/Users/Administrator/Desktop/index.html");
WebElement Slider = driver.findElement(By.cssSelector(".handler.handler_bg"));// 拿到滑塊按鈕
Thread.sleep(2000L);
// 實(shí)例化鼠標(biāo)操作對象Actions
Actions action = new Actions(driver);
action.dragAndDropBy(Slider,260,0).perform();// 移動(dòng)一定位置
Thread.sleep(5000L);
} catch (InterruptedException e) {
e.printStackTrace();
}finally{
// driver.close();// 關(guān)閉頁面
driver.quit();// 釋放資源
}
}
注意:有的網(wǎng)站拖完后可能驗(yàn)證成功,有的可能失敗,失敗的童鞋也不要慌張,因?yàn)楸痪W(wǎng)站檢測出你用的是爬蟲操作的,我有妙計(jì)!接著往下看!
先分分析一波!1.使用驅(qū)動(dòng)打開瀏覽器
public static void openChrome(){
System.setProperty("webdriver.chrome.driver","D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");
// 1.打開Chrome瀏覽器
chromeDriver = new ChromeDriver();
chromeDriver.get("url...");
}
2.然后 f12打開console控制臺輸入:window.navigator.webdriver
?發(fā)現(xiàn)值是true,但是我們正常手動(dòng)打開瀏覽器他卻是false或者undefined,如下圖
?
所以得出結(jié)論網(wǎng)站通過代碼獲取這個(gè)參數(shù),返回值undefined或者false是正常瀏覽器,返回true說明用的是Selenium模擬瀏覽器,所以解決還是要從驅(qū)動(dòng)瀏覽器解決,在啟動(dòng)Chromedriver之前,來隱藏它
public static void openChrome(){
// 隱藏 window.navigator.webdriver
ChromeOptions option = new ChromeOptions();
option.setExperimentalOption("useAutomationExtension", false);
option.setExperimentalOption("excludeSwitches", Lists.newArrayList("enable-automation"));
option.addArguments("--disable-blink-features=AutomationControlled");//主要是這句是關(guān)鍵
System.setProperty("webdriver.chrome.driver","D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");
// 1.打開Chrome瀏覽器
chromeDriver = new ChromeDriver(option);
chromeDriver.get("URL...");
}
然后再次啟動(dòng)查看就變成了false
破解缺口滑塊
缺口滑塊如下圖:
?分析
我拿某網(wǎng)站的滑塊源代碼來分析,如下圖可以看出缺口滑塊圖是由canvas繪制的。
?1.我們要做的是找到缺口的X坐標(biāo),所以需要拿到完整圖片和缺口圖片進(jìn)行計(jì)算,但是我們只能看見一張缺口圖片,但是我們只要在canvas的css加一行代碼style="display:none"
?然后再看就出現(xiàn)了沒有拼圖阻擋的缺口圖
?
?2.然后在下面的canvas 修改style="display:block"就可以看到完整圖片如下下圖
然后再看發(fā)現(xiàn)看到了完整的圖
?
?3.然后使用selenium的截圖方法,把原圖和缺口圖保存下來,然后再拿著像素對比可以算出按鈕位置與缺口X坐標(biāo)
爬蟲代碼
public class ElementLocate {
private static ChromeDriver chromeDriver;
public static void main(String[] args) throws InterruptedException, IOException {
openChrome();// 打開瀏覽器等操作
try {
chromeDriver.manage().window().maximize();// 瀏覽器最大化
// 等待滑塊加載完畢
new WebDriverWait(chromeDriver, 5)
.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("http://div[@aria-label='點(diǎn)擊按鈕進(jìn)行驗(yàn)證']")));
// 點(diǎn)開滑塊
chromeDriver.findElementByXPath("http://div[@aria-label='點(diǎn)擊按鈕進(jìn)行驗(yàn)證']").click();// 點(diǎn)開驗(yàn)證框
operateSlider();// 操作滑塊
} finally {
chromeDriver.quit();//測試完要停止 不然卡成球
}
}
private static void openChrome() {
// 配置瀏覽器
ChromeOptions option = new ChromeOptions();
option.setExperimentalOption("useAutomationExtension", false);
option.setExperimentalOption("excludeSwitches", Lists.newArrayList("enable-automation"));
option.addArguments("--disable-blink-features=AutomationControlled");//主要是這句是關(guān)鍵,防止網(wǎng)站js檢測出爬蟲
// set瀏覽器驅(qū)動(dòng)
System.setProperty("webdriver.chrome.driver", "D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");
// 打開Chrome瀏覽器
chromeDriver = new ChromeDriver(option);
// 訪問百度
chromeDriver.get("https://account.zbj.com/login?lgtype=1&waytype=603&fromurl=https%3A%2F%2Fxiamen.zbj.com%2F");
}
// 操作元素屬性
private static void setAttribute(WebDriver driver, WebElement element, String attributeName, String value) {
JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript("arguments[0].setAttribute('" + attributeName + "', '" + value + "')", element);
}
//刪除元素屬性
private void removeAttribute(WebDriver driver, WebElement element, String attributeName) {
JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript("argument[0].removeAttribute(argumentp[1]),argument[2]", element, attributeName);
}
// 截圖
private static File captureElement(File screenshot, WebElement element) {
try {
BufferedImage img = ImageIO.read(screenshot);
int width = element.getSize().getWidth();
int height = element.getSize().getHeight();
//獲取指定元素的坐標(biāo)
Point point = element.getLocation();
//從元素左上角坐標(biāo)開始,按照元素的高寬對img進(jìn)行裁剪為符合需要的圖片
BufferedImage dest = img.getSubimage(point.getX(), point.getY(), width, height);
ImageIO.write(dest, "png", screenshot);
} catch (IOException e) {
e.printStackTrace();
}
return screenshot;
}
// 操作滑塊
private static void operateSlider() throws InterruptedException, IOException {
Thread.sleep(1000);// 重復(fù)獲取元素必須sleep,否則會報(bào)錯(cuò)!
//修改元素屬性,顯示缺口滑塊圖,這里需要等圖片加載出來,如果網(wǎng)絡(luò)慢沒加載出來會報(bào)錯(cuò)
WebElement que1 = chromeDriver.findElementByXPath("http://div[@class='geetest_slicebg geetest_absolute']/canvas[@class='geetest_canvas_slice geetest_absolute']");
setAttribute(chromeDriver, que1, "style", "display:none");
// 截圖滑塊缺口圖片
WebElement quekou = chromeDriver.findElementByXPath("http://canvas[@class='geetest_canvas_bg geetest_absolute']");
File src = chromeDriver.getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(src, new File("D:\\result.png"));
FileUtils.copyFile(captureElement(src, quekou), new File("D:\\test.png"));
// 修改元素屬性,顯示完整滑塊圖
WebElement que2 = chromeDriver.findElementByXPath("http://canvas[@class='geetest_canvas_fullbg geetest_fade geetest_absolute']");
setAttribute(chromeDriver, que2, "style", "display:block");
// 截圖滑塊完整圖
WebElement wanzheng = chromeDriver.findElementByXPath("http://canvas[@class='geetest_canvas_bg geetest_absolute']");
File src2 = chromeDriver.getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(src2, new File("D:\\result1.png"));
FileUtils.copyFile(captureElement(src2, wanzheng), new File("D:\\test1.png"));
// 還原滑塊
WebElement huanyuan1 = chromeDriver.findElementByXPath("http://canvas[@class='geetest_canvas_fullbg geetest_fade geetest_absolute']");
setAttribute(chromeDriver, huanyuan1, "style", "display:none");
WebElement huanyuan2 = chromeDriver.findElementByXPath("http://canvas[@class='geetest_canvas_slice geetest_absolute']");
setAttribute(chromeDriver, huanyuan2, "style", "display:block");
// 計(jì)算缺口滑塊圖和完整滑塊圖者差距,5為滑塊按鈕和滑塊圖左邊的差5px
int moveDistance = getMoveDistance() - 5;
// 拿到滑塊按鈕
WebElement btn = chromeDriver.findElementByXPath("http://div[@class='geetest_slider_button']");
// 拿到鼠標(biāo)操作,實(shí)例化Actions
Actions actions = new Actions(chromeDriver);
// 把滑塊->缺口距離分成多份
int[] nums = split(moveDistance);
// 移動(dòng)滑塊按鈕
Random random = new Random();
String time = "35";
for (int i = 0; i < nums.length; i++) {
actions.clickAndHold(btn).moveByOffset(nums[i], 0)
.build().perform();
int times = Integer.parseInt(time + random.nextInt(10));
Thread.sleep(times);
}
// 模擬人操作
actions.clickAndHold(btn).moveByOffset(-1, 0).release()
.build().perform();
Thread.sleep(3000);// 滑塊完成等待2秒判斷是否驗(yàn)證成功
// 是否滑塊成功
String attribute = chromeDriver.findElementByXPath("http://div[@class='geetest_radar_tip']").getAttribute("aria-label");
System.out.println("attribute = " + attribute);
if (attribute.equals("網(wǎng)絡(luò)不給力") ) {
chromeDriver.findElementByXPath("http://div[@class='geetest_radar_tip']").click();
// 再次滑塊
operateSlider();
}
}
// 整數(shù)拆分
private static int[] split(int num) {
int[] nums = new int[5];
Random rand = new Random();
for (int i = 0; i < nums.length - 1; i++) {
nums[i] = rand.nextInt(num);
num -= nums[i];
}
nums[nums.length - 1] = num;
return nums;
}
}
注意:滑塊按鈕滑到指定區(qū)域,可能會出現(xiàn)滑塊被吃掉的情況!這是因?yàn)楸慌卸闄C(jī)器操作,所以要盡量模擬出人的速度滑一定的距離停止n毫秒,經(jīng)過我不斷的調(diào)試,這樣可以減少被誤判的幾率。成功率在80%左右。文章來源:http://www.zghlxwxcb.cn/news/detail-406067.html
?這是小編在開發(fā)學(xué)習(xí)使用和總結(jié)的小Demo, ?這中間或許也存在著不足,希望可以得到大家的理解和建議。如有侵權(quán)聯(lián)系小編!文章來源地址http://www.zghlxwxcb.cn/news/detail-406067.html
到了這里,關(guān)于java爬蟲破解滑塊驗(yàn)證碼的文章就介紹完了。如果您還想了解更多內(nèi)容,請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!