用户名：密码：注册新浪微博帐号登录用QQ帐号登录

龙盟编程博客 | 无障碍搜索 | 云盘搜索神器

SqlServer获取存

约瑟夫环问题

C++设计类不能

linux中tail 命令

使用sqlserver存

C++基本算法思

快速搜索

主页 > web编程 > python编程 >

python使用scrapy解析js示例

时间:2014-05-15 18:23来源:网络整理作者:网络点击: 次

分享到：

这篇文章主要介绍了python使用scrapy解析js的示例，大家参考使用吧

代码如下:

from selenium import selenium

class MySpider(CrawlSpider):
    name = 'cnbeta'
    allowed_domains = ['cnbeta.com']
    start_urls = ['http://www.jb51.net']

    rules = (
        # Extract links matching 'category.php' (but not matching 'subsection.php')
        # and follow links from them (since no callback means follow=True by default).
        Rule(SgmlLinkExtractor(allow=('/articles/.*\.htm', )),
             callback='parse_page', follow=True),

# Extract links matching 'item.php' and parse them with the spider's method parse_item
)

    def __init__(self):
        CrawlSpider.__init__(self)
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.jb51.net")
        self.selenium.start()

    def __del__(self):
        self.selenium.stop()
        print self.verificationErrors
        CrawlSpider.__del__(self)

    def parse_page(self, response):
        self.log('Hi, this is an item page! %s' % response.url)
        sel = Selector(response)
        from webproxy.items import WebproxyItem

        sel = self.selenium
        sel.open(response.url)
        sel.wait_for_page_to_load("30000")
        import time

time.sleep(2.5)

上一篇：使用scrapy实现爬网站例子和实现网络爬虫(蜘蛛)的步骤
下一篇：php使用递归与迭代实现快速排序示例

分享到： QQ空间新浪微博人人网开心网更多

收藏文章

表情删除后不可恢复，是否删除

取消

确定

图片正在上传，请稍后...

取消上传

评论内容为空！

还没有评论，快来抢沙发吧！

畅言云评

站长点击我去广告>

热评话题

按钮内容不能为空！

立刻说两句吧！查看0条评论

精彩图集

成为顶尖算法专家需要知道哪些算法？

成为顶尖算

成为顶尖算法专家需要知道哪些算法？

成为顶尖算

用Python编写一个国际象棋AI程序

用Python编写

跟老齐学Python之永远强大的函数

Python struct模块解析

Python struct模

精彩文章

热点文章

暂无记录。

热门标签

sqlcmd repr linux命令 PHP 针对限 PHP SPS Dis 500错误位段保存方法名 scroll 单击变色 jquery函数消失 AJAX应用 git PHP引用 dbscan 定位 iterator -f 分配内存事件函数替换 apacherewrit J2EE 签到程序同一天多余创建者企业管理器远程修改顺序排序格式化时间默认字符集 layout 打印提示效果 php PHP实现多配置方法 Excel导入导出 memcpy 文件写入全攻略快站图片等比缩放函数参数被注入 xmlrpc LAG 当前页面宽字节字符鼠标停留 apache模块网站备案信息 cursor 吐槽系统C盘时长 Msxml2.XMLHT 替代语法手机内存

赞助商链接

关于我们 - 联系我们 - 广告服务 - 意见反馈 - 网站地图 - 版权声明 - 人才招聘 - 帮助

@CopyRight 2002-2008, 1SOHU.COM, Inc. All Rights Reserved QQ:1010969229

京ICP备18042785号-1