网上爬虫上海交易所交易apps

作者: Furi 原文链接:/en-heng/p/5614814.html
对于给定的大量APP,如何爬取与之对应的(应用市场)分类、描述的信息?且看下面分解。
1. 页面分析
当我们在豌豆荚首页搜索框输入 微信 后,会跳转到搜索结果的页面,其url为 /search?key=%微信 。搜索结果一般是按相关性排序的;所以,我们认为第一条搜索结果为所需要爬取的。紧接着,点进去后会跳转到页面 /apps/com.tencent.mm ,我们会发现豌豆荚的APP的详情页,是 /apps/ + APP package 组成。
让我们退回到搜索结果页面,分析页面元素,如图:
所有搜索结果在 &ul& 无序列表标签中,每一个搜索结果在 &li& 标签中。对应地,CSS选择器应为
&spanclass=&st&&'&span class=&hljs-selector-id&&#j-search-list&/span&&&span class=&hljs-selector-tag&&li&/span&&span class=&hljs-selector-pseudo&&::attr(data-pn)'&/span&&/span&
接下来,我们来分析APP的详情页,APP的名称所对应的HTML元素如图:
APP类别的如图:
APP描述的如图:
不难得到这三类元素所对应的CSS选择器
&spanclass=&fl&&&spanclass=&hljs-selector-class&&.app-name&/span&&/span&&&spanclass=&hljs-selector-tag&&span&/span&&spanclass=&dv&&&spanclass=&hljs-selector-pseudo&&::&/span&&/span&&spanclass=&hljs-selector-pseudo&&text&/span&&spanclass=&fl&&&spanclass=&hljs-selector-class&&.crumb&/span&&/span&&&spanclass=&fl&&&spanclass=&hljs-selector-class&&.second&/span&&/span&&&spanclass=&hljs-selector-tag&&a&/span&&&spanclass=&hljs-selector-tag&&span&/span&&spanclass=&dv&&&spanclass=&hljs-selector-pseudo&&::&/span&&/span&&spanclass=&hljs-selector-pseudo&&text&/span&&spanclass=&fl&&&spanclass=&hljs-selector-class&&.desc-info&/span&&/span&&&spanclass=&fl&&&spanclass=&hljs-selector-class&&.con&/span&&/span&&spanclass=&dv&&&spanclass=&hljs-selector-pseudo&&::&/span&&/span&&spanclass=&hljs-selector-pseudo&&text&/span&
通过上面的分析,确定爬取策略如下:
逐行读取APP文件,拼接搜索页面URL;
分析搜索结果页面,跳转到第一条结果对应的详情页;
爬取详情页相关结果,写到输出文件
2. 爬虫实现
分析完页面,开始coding写爬虫了。若裸写Python实现,下载间隔、请求、页面解析、爬取结果序列化,这些就够让人烦了。 Scrapy 提供一个Python开发的、快速的web爬虫框架,很好地解决了这些问题。 中文doc ,对Scrapy做了比较详尽的介绍。
APP文件中,可能有一些名称不规整,需要做清洗:
&spanclass=&co&&&spanclass=&hljs-comment&&# -*- coding: utf-8 -*-&/span&&/span&&spanclass=&im&&&spanclass=&hljs-keyword&&import&/span&&/span& re&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span&&spanclass=&hljs-function&& &spanclass=&hljs-title&&clean_app_name&/span&&spanclass=&hljs-params&&(app_name)&/span&:&/span&space &spanclass=&op&&=&/span& &spanclass=&st&&&spanclass=&hljs-string&&u'/u00a0'&/span&&/span&app_name &spanclass=&op&&=&/span& app_name.replace(space, &spanclass=&st&&&spanclass=&hljs-string&&''&/span&&/span&)brackets &spanclass=&op&&=&/span& &spanclass=&vs&&&spanclass=&hljs-string&&r'/(.*/)|/[.*/]|【.*】|(.*)'&/span&&/span&&spanclass=&cf&&&spanclass=&hljs-keyword&&return&/span&&/span& re.sub(brackets, &spanclass=&st&&&spanclass=&hljs-string&&''&/span&&/span&, app_name)
拿清洗后APP名称,拼接搜索结果页面URL。因为URL不识别中文等字符,需要用 urllib.quote 做URL编码:
&spanclass=&co&&&spanclass=&hljs-comment&&# -*- coding: utf-8 -*-&/span&&/span&&spanclass=&im&&&spanclass=&hljs-keyword&&from&/span&&/span& appMarket &spanclass=&im&&&spanclass=&hljs-keyword&&import&/span&&/span& clean&spanclass=&im&&&spanclass=&hljs-keyword&&import&/span&&/span& urllib&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span&&spanclass=&hljs-function&& &spanclass=&hljs-title&&get_kw_url&/span&&spanclass=&hljs-params&&(kw)&/span&:&/span&&spanclass=&co&&&spanclass=&hljs-string&&&&&concatenate the url for searching&&&&/span&&/span&base_url &spanclass=&op&&=&/span& &spanclass=&st&&&spanclass=&hljs-string&&u&/search?key=&/span&&/span&&span class=&sc&&&span class=&hljs-string&&%s&/span&&/span&&span class=&st&&&span class=&hljs-string&&&&/span&&/span&kw &spanclass=&op&&=&/span& clean.clean_app_name(kw)&spanclass=&cf&&&spanclass=&hljs-keyword&&return&/span&&/span& base_url &spanclass=&op&&%&/span& (urllib.quote(kw.encode(&spanclass=&st&&&spanclass=&hljs-string&&&utf8&&/span&&/span&)))&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span&&spanclass=&hljs-function&& &spanclass=&hljs-title&&get_pkg_url&/span&&spanclass=&hljs-params&&(pkg)&/span&:&/span&&spanclass=&co&&&spanclass=&hljs-string&&&&&get the detail url according to pkg&&&&/span&&/span&&spanclass=&cf&&&spanclass=&hljs-keyword&&return&/span&&/span& &spanclass=&st&&&spanclass=&hljs-string&&'/apps/&/span&&/span&&span class=&sc&&&span class=&hljs-string&&%s&/span&&/span&&span class=&st&&&span class=&hljs-string&&'&/span&&/span& &spanclass=&op&&%&/span& pkg
Scrapy的爬虫均继承与 scrapy.Spider 类,主要的属性及方法:
name,爬虫的名称, scrapy crawl 命令后可直接跟爬虫的名称,即可启动该爬虫
allowed_domains,允许爬取域名的列表
start_requests(),开始爬取的方法,返回一个可迭代对象(iterable),一般为scrapy.Request对象
parse(response),既可负责处理response并返回处理的数据,也可以跟进的URL(以做下一步处理)
items为保存爬取后数据的容器,类似于Python的dict,
&spanclass=&im&&&spanclass=&hljs-keyword&&import&/span&&/span& scrapy&spanclass=&kw&&&spanclass=&hljs-class&&&spanclass=&hljs-keyword&&class&/span&&/span&&/span&&spanclass=&hljs-class&& &spanclass=&hljs-title&&AppMarketItem&/span&&spanclass=&hljs-params&&(scrapy.Item)&/span&:&/span&&spanclass=&co&&&spanclass=&hljs-comment&&# define the fields for your item here like:&/span&&/span&kw &spanclass=&op&&=&/span& scrapy.Field()&spanclass=&co&&&spanclass=&hljs-comment&&# key word&/span&&/span&name &spanclass=&op&&=&/span& scrapy.Field()&spanclass=&co&&&spanclass=&hljs-comment&&# app name&/span&&/span&tag &spanclass=&op&&=&/span& scrapy.Field()&spanclass=&co&&&spanclass=&hljs-comment&&# app tag&/span&&/span&desc &spanclass=&op&&=&/span& scrapy.Field()&spanclass=&co&&&spanclass=&hljs-comment&&# app description&/span&&/span&
豌豆荚Spider代码:
&spanclass=&co&&&spanclass=&hljs-comment&&# -*- coding: utf-8 -*-&/span&&/span&&spanclass=&co&&&spanclass=&hljs-comment&&# @Time: &/span&&/span&&spanclass=&co&&&spanclass=&hljs-comment&&# @Author: rain&/span&&/span&&spanclass=&im&&&spanclass=&hljs-keyword&&import&/span&&/span& scrapy&spanclass=&im&&&spanclass=&hljs-keyword&&import&/span&&/span& codecs&spanclass=&im&&&spanclass=&hljs-keyword&&from&/span&&/span& appMarket &spanclass=&im&&&spanclass=&hljs-keyword&&import&/span&&/span& util&spanclass=&im&&&spanclass=&hljs-keyword&&from&/span&&/span& appMarket.util &spanclass=&im&&&spanclass=&hljs-keyword&&import&/span&&/span& wandoujia&spanclass=&im&&&spanclass=&hljs-keyword&&from&/span&&/span& appMarket.items &spanclass=&im&&&spanclass=&hljs-keyword&&import&/span&&/span& AppMarketItem&spanclass=&kw&&&spanclass=&hljs-class&&&spanclass=&hljs-keyword&&class&/span&&/span&&/span&&spanclass=&hljs-class&& &spanclass=&hljs-title&&WandoujiaSpider&/span&&spanclass=&hljs-params&&(scrapy.Spider)&/span&:&/span&name &spanclass=&op&&=&/span& &spanclass=&st&&&spanclass=&hljs-string&&&WandoujiaSpider&&/span&&/span&allowed_domains &spanclass=&op&&=&/span& [&spanclass=&st&&&spanclass=&hljs-string&&&&&/span&&/span&]&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span& &spanclass=&fu&&&spanclass=&hljs-function&&&spanclass=&hljs-title&&__init__&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&(&/span&&/span&&spanclass=&va&&&spanclass=&hljs-function&&&spanclass=&hljs-params&&self&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&)&/span&:&/span&&spanclass=&va&&self&/span&.apps_path &spanclass=&op&&=&/span& &spanclass=&st&&&spanclass=&hljs-string&&'./input/apps.txt'&/span&&/span&&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span&&spanclass=&hljs-function&& &spanclass=&hljs-title&&start_requests&/span&&spanclass=&hljs-params&&(&/span&&/span&&spanclass=&va&&&spanclass=&hljs-function&&&spanclass=&hljs-params&&self&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&)&/span&:&/span&&spanclass=&cf&&&spanclass=&hljs-keyword&&with&/span&&/span& codecs.&spanclass=&bu&&open&/span&(&spanclass=&va&&self&/span&.apps_path, &spanclass=&st&&&spanclass=&hljs-string&&'r'&/span&&/span&, &spanclass=&st&&&spanclass=&hljs-string&&'utf-8'&/span&&/span&) &spanclass=&im&&&spanclass=&hljs-keyword&&as&/span&&/span& f:&spanclass=&cf&&&spanclass=&hljs-keyword&&for&/span&&/span& app_name &spanclass=&op&&&spanclass=&hljs-keyword&&in&/span&&/span& f:&spanclass=&cf&&&spanclass=&hljs-keyword&&yield&/span&&/span& scrapy.Request(url&spanclass=&op&&=&/span&wandoujia.get_kw_url(app_name),callback&spanclass=&op&&=&/span&&spanclass=&va&&self&/span&.parse_search_result,meta&spanclass=&op&&=&/span&{&spanclass=&st&&&spanclass=&hljs-string&&'kw'&/span&&/span&: app_name.rstrip()})&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span&&spanclass=&hljs-function&& &spanclass=&hljs-title&&parse&/span&&spanclass=&hljs-params&&(&/span&&/span&&spanclass=&va&&&spanclass=&hljs-function&&&spanclass=&hljs-params&&self&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&, response)&/span&:&/span&item &spanclass=&op&&=&/span& AppMarketItem()item[&spanclass=&st&&&spanclass=&hljs-string&&'kw'&/span&&/span&] &spanclass=&op&&=&/span& response.meta[&spanclass=&st&&&spanclass=&hljs-string&&'kw'&/span&&/span&]item[&spanclass=&st&&&spanclass=&hljs-string&&'name'&/span&&/span&] &spanclass=&op&&=&/span& response.css(&spanclass=&st&&&spanclass=&hljs-string&&'.app-name&span::text'&/span&&/span&).extract_first()item[&spanclass=&st&&&spanclass=&hljs-string&&'tag'&/span&&/span&] &spanclass=&op&&=&/span& response.css(&spanclass=&st&&&spanclass=&hljs-string&&'.crumb&.second&a&span::text'&/span&&/span&).extract_first()desc &spanclass=&op&&=&/span& response.css(&spanclass=&st&&&spanclass=&hljs-string&&'.desc-info&.con::text'&/span&&/span&).extract()item[&spanclass=&st&&&spanclass=&hljs-string&&'desc'&/span&&/span&] &spanclass=&op&&=&/span& util.parse_desc(desc)item[&spanclass=&st&&&spanclass=&hljs-string&&'desc'&/span&&/span&] &spanclass=&op&&=&/span& &spanclass=&st&&&spanclass=&hljs-string&&u&&&/span&&/span& &spanclass=&cf&&&spanclass=&hljs-keyword&&if&/span&&/span& &spanclass=&op&&&spanclass=&hljs-keyword&&not&/span&&/span& item[&spanclass=&st&&&spanclass=&hljs-string&&&desc&&/span&&/span&] &spanclass=&cf&&&spanclass=&hljs-keyword&&else&/span&&/span& item[&spanclass=&st&&&spanclass=&hljs-string&&&desc&&/span&&/span&].strip()&spanclass=&va&&self&/span&.log(&spanclass=&st&&&spanclass=&hljs-string&&u'crawling the app &/span&&/span&&span class=&sc&&&span class=&hljs-string&&%s&/span&&/span&&span class=&st&&&span class=&hljs-string&&'&/span&&/span& &spanclass=&op&&%&/span& item[&spanclass=&st&&&spanclass=&hljs-string&&&name&&/span&&/span&])&spanclass=&cf&&&spanclass=&hljs-keyword&&yield&/span&&/span& item&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span&&spanclass=&hljs-function&& &spanclass=&hljs-title&&parse_search_result&/span&&spanclass=&hljs-params&&(&/span&&/span&&spanclass=&va&&&spanclass=&hljs-function&&&spanclass=&hljs-params&&self&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&, response)&/span&:&/span&pkg &spanclass=&op&&=&/span& response.css(&spanclass=&st&&&spanclass=&hljs-string&&&#j-search-list&li::attr(data-pn)&&/span&&/span&).extract_first()&spanclass=&cf&&&spanclass=&hljs-keyword&&yield&/span&&/span& scrapy.Request(url&spanclass=&op&&=&/span&wandoujia.get_pkg_url(pkg), meta&spanclass=&op&&=&/span&response.meta)
APP文件里的应用名作为搜索词,也应被写在输出文件里。但是,在爬取时URL有跳转,如何在不同层级间的Request传递变量呢?Request中的meta (dict) 参数实现了这种传递。
APP描述 .desc-info&.con::text ,extract返回的是一个list,拼接成string如下:
&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span&&spanclass=&hljs-function&& &spanclass=&hljs-title&&parse_desc&/span&&spanclass=&hljs-params&&(desc)&/span&:&/span&&spanclass=&cf&&&spanclass=&hljs-keyword&&return&/span&&/span& &spanclass=&bu&&reduce&/span&(&spanclass=&kw&&&spanclass=&hljs-keyword&&lambda&/span&&/span& a, b: a.strip()&spanclass=&op&&+&/span&b.strip(), desc, &spanclass=&st&&&spanclass=&hljs-string&&''&/span&&/span&)
Scrapy推荐的序列化方式为Json。Json的好处显而易见:
Schema明晰,较于’/t’分割的纯文本,读取不易出错
爬取结果有可能会有重复的、为空的(无搜索结果的);此外,Python2序列化Json时,对于中文字符,其编码为unicode。对于这些问题,可自定义Pipeline对结果进行处理:
&spanclass=&kw&&&spanclass=&hljs-class&&&spanclass=&hljs-keyword&&class&/span&&/span&&/span&&spanclass=&hljs-class&& &spanclass=&hljs-title&&CheckPipeline&/span&&spanclass=&hljs-params&&(&/span&&/span&&spanclass=&bu&&&spanclass=&hljs-class&&&spanclass=&hljs-params&&object&/span&&/span&&/span&&spanclass=&hljs-class&&&spanclass=&hljs-params&&)&/span&:&/span&&spanclass=&co&&&spanclass=&hljs-string&&&&&check item, and drop the duplicate one&&&&/span&&/span&&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span& &spanclass=&fu&&&spanclass=&hljs-function&&&spanclass=&hljs-title&&__init__&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&(&/span&&/span&&spanclass=&va&&&spanclass=&hljs-function&&&spanclass=&hljs-params&&self&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&)&/span&:&/span&&spanclass=&va&&self&/span&.names_seen &spanclass=&op&&=&/span& &spanclass=&bu&&set&/span&()&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span&&spanclass=&hljs-function&& &spanclass=&hljs-title&&process_item&/span&&spanclass=&hljs-params&&(&/span&&/span&&spanclass=&va&&&spanclass=&hljs-function&&&spanclass=&hljs-params&&self&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&, item, spider)&/span&:&/span&&spanclass=&cf&&&spanclass=&hljs-keyword&&if&/span&&/span& item[&spanclass=&st&&&spanclass=&hljs-string&&'name'&/span&&/span&]:&spanclass=&cf&&&spanclass=&hljs-keyword&&if&/span&&/span& item[&spanclass=&st&&&spanclass=&hljs-string&&'name'&/span&&/span&] &spanclass=&op&&&spanclass=&hljs-keyword&&in&/span&&/span& &spanclass=&va&&self&/span&.names_seen:&spanclass=&cf&&&spanclass=&hljs-keyword&&raise&/span&&/span& DropItem(&spanclass=&st&&&spanclass=&hljs-string&&&Duplicate item found: &/span&&/span&&span class=&sc&&&span class=&hljs-string&&%s&/span&&/span&&span class=&st&&&span class=&hljs-string&&&&/span&&/span& &spanclass=&op&&%&/span& item)&spanclass=&cf&&&spanclass=&hljs-keyword&&else&/span&&/span&:&spanclass=&va&&self&/span&.names_seen.add(item[&spanclass=&st&&&spanclass=&hljs-string&&'name'&/span&&/span&])&spanclass=&cf&&&spanclass=&hljs-keyword&&return&/span&&/span& item&spanclass=&cf&&&spanclass=&hljs-keyword&&else&/span&&/span&:&spanclass=&cf&&&spanclass=&hljs-keyword&&raise&/span&&/span& DropItem(&spanclass=&st&&&spanclass=&hljs-string&&&Missing price in &/span&&/span&&span class=&sc&&&span class=&hljs-string&&%s&/span&&/span&&span class=&st&&&span class=&hljs-string&&&&/span&&/span& &spanclass=&op&&%&/span& item)&spanclass=&kw&&&spanclass=&hljs-class&&&spanclass=&hljs-keyword&&class&/span&&/span&&/span&&spanclass=&hljs-class&& &spanclass=&hljs-title&&JsonWriterPipeline&/span&&spanclass=&hljs-params&&(&/span&&/span&&spanclass=&bu&&&spanclass=&hljs-class&&&spanclass=&hljs-params&&object&/span&&/span&&/span&&spanclass=&hljs-class&&&spanclass=&hljs-params&&)&/span&:&/span&&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span& &spanclass=&fu&&&spanclass=&hljs-function&&&spanclass=&hljs-title&&__init__&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&(&/span&&/span&&spanclass=&va&&&spanclass=&hljs-function&&&spanclass=&hljs-params&&self&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&)&/span&:&/span&&spanclass=&va&&self&/span&.&spanclass=&bu&&file&/span& &spanclass=&op&&=&/span& codecs.&spanclass=&bu&&open&/span&(&spanclass=&st&&&spanclass=&hljs-string&&'./output/output.json'&/span&&/span&, &spanclass=&st&&&spanclass=&hljs-string&&'wb'&/span&&/span&, &spanclass=&st&&&spanclass=&hljs-string&&'utf-8'&/span&&/span&)&spanclass=&kw&&&spanclass=&hljs-function&&&spanclass=&hljs-keyword&&def&/span&&/span&&/span&&spanclass=&hljs-function&& &spanclass=&hljs-title&&process_item&/span&&spanclass=&hljs-params&&(&/span&&/span&&spanclass=&va&&&spanclass=&hljs-function&&&spanclass=&hljs-params&&self&/span&&/span&&/span&&spanclass=&hljs-function&&&spanclass=&hljs-params&&, item, spider)&/span&:&/span&line &spanclass=&op&&=&/span& json.dumps(&spanclass=&bu&&dict&/span&(item), ensure_ascii&spanclass=&op&&=&/span&&spanclass=&va&&&spanclass=&hljs-keyword&&False&/span&&/span&) &spanclass=&op&&+&/span& &spanclass=&st&&&spanclass=&hljs-string&&&&/span&&/span&&span class=&ch&&&span class=&hljs-string&&/n&/span&&/span&&span class=&st&&&span class=&hljs-string&&&&/span&&/span&&spanclass=&va&&self&/span&.&spanclass=&bu&&file&/span&.write(line)&spanclass=&cf&&&spanclass=&hljs-keyword&&return&/span&&/span& item
还需在settings.py中设置
ITEM_PIPELINES &spanclass=&op&&=&/span& {&spanclass=&st&&&spanclass=&hljs-string&&'appMarket.pipelines.CheckPipeline'&/span&&/span&: &spanclass=&dv&&&spanclass=&hljs-number&&300&/span&&/span&,&spanclass=&co&&&spanclass=&hljs-string&&'appMarket.pipelines.JsonWriterPipeline'&/span&&/span&: &spanclass=&dv&&&spanclass=&hljs-number&&800&/span&&/span&,}
分配给每个类的整型值,确定了他们运行的顺序,按数字从低到高的顺序,通过pipeline,通常将这些数字定义在0-1000范围内。
最新教程周点击榜
微信扫一扫stock-fetcher 一个简单的爬虫,可以从新浪财经自动获取股票历史交易数据,并对 数 分析 Windows Develop 265万源代码下载-
&文件名称: stock-fetcher& & [
& & & & &&]
&&所属分类:
&&开发工具: Python
&&文件大小: 3 KB
&&上传时间:
&&下载次数: 30
&&提 供 者:
&详细说明:一个简单的爬虫,可以从新浪财经自动获取股票历史交易数据,并对历史数据做简单分析-Automatically acquire shares historical transaction data
文件列表(点击判断是否您需要的文件,如果是垃圾请在下面评价投诉):
&&analyzer.py&&fetch.py
&[]:很好,推荐下载&[]:一般,勉强可用
&近期下载过的用户:
&&&&&&&&&&&&&&&[]
&相关搜索:
&输入关键字,在本站265万海量源码库中尽情搜索:
&[] - 一个简单实用的股票数据获取代码,实用简洁的python代码编写
&[] - 抓取yahoo 股票数据,用于金融分析等工作.
&[] - python 做的显示股票K线图的源代码。
从通达信软件中读取数据,显示出制定代码的K线
使用Python 2.6 和 wxPython 2.8.1
&[] - 利用Python实现网站蜘蛛爬虫获取整个网页的功能,这只是5个简单的百度帖吧的测试案例,都可以用,爬虫其他的网站也很容易实现,比较适合新手了解和对搜索感兴趣的学习和了解。
&[] - 可以用于腾讯微博或者新浪微博自动登录的软件
&[] - 模拟某知名股票看图软件的功能。还很初步。
&[] - 一个用wxpython实现的股票数据实时获取并呈现的代码
&[] - 爬虫,爬了天猫网的数据,能进行数据统计,图表分析和打印显示
&[] - 炒股软件大智慧数据格式解析,炒股软件大智慧数据格式解析,苹果/安卓/wp
学科带头人
学科带头人
积分 3751, 距离下一级还需 2074 积分
权限: 自定义头衔, 签名中使用图片, 隐身, 设置帖子权限, 设置回复可见
道具: 彩虹炫, 涂鸦板, 雷达卡, 热点灯, 金钱卡, 显身卡, 匿名卡, 抢沙发, 提升卡, 沉默卡, 千斤顶下一级可获得
道具: 变色卡
购买后可立即获得
权限: 隐身
道具: 金钱卡, 彩虹炫, 雷达卡, 热点灯, 涂鸦板
苦逼签到天数: 27 天连续签到: 18 天[LV.4]偶尔看看III
如果没有链接,就请查看原链接
《一只小爬虫》
《一只并发的小爬虫》
《Python与简单网络爬虫的编写》
《Python写爬虫——抓取网页并解析HTML》
《[Python]网络爬虫(一):抓取网页的含义和URL基本构成》
《[Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容》
《[Python]网络爬虫(三):异常的处理和HTTP状态码的分类》
《[Python]网络爬虫(四):Opener与Handler的介绍和实例应用》
《[Python]网络爬虫(五):urllib2的使用细节与抓站技巧》
《[Python]网络爬虫(六):一个简单的百度贴吧的小爬虫》
《[Python]网络爬虫(七):Python中的正则表达式教程》
《[Python]网络爬虫(八):糗事百科的网络爬虫(v0.2)源码及解析》
《[Python]网络爬虫(九):百度贴吧的网络爬虫(v0.4)源码及解析》
《[Python]网络爬虫(十):一个爬虫的诞生全过程(以山东大学绩点运算为例)》
《用python爬虫抓站的一些技巧总结 zz》
《python爬虫高级代码》
正则表达式
《正则表达式30分钟入门教程》
《常用正则表达式》
《Python正则表达式操作指南》
《Python 正则表达式(模式匹配)》
《由浅到深解读Python正则表达式》
《Python正则表达式指南》
支持楼主:、
购买后,论坛将把您花费的资金全部奖励给楼主,以表示您对TA发好贴的支持
载入中......
总评分:&论坛币 + 60&
学术水平 + 4&
热心指数 + 5&
信用等级 + 4&
本帖被以下文库推荐
& |主题: 239, 订阅: 57
Ψ▄┳一大卫卍卐席尔瓦
好贴,我还看了版主针对配对交易的量化matlab code !!!!版主在配对交易的最最核心的部分编写的很牛!!!让人佩服!!比国外编写的配对交易统计套利的matlab code强不少!!!
Ψ▄┳一大卫卍卐席尔瓦
附上国外的基于统计套利的matlab code!!!
(5.92 KB, 售价: 1 个论坛币)
23:53:22 上传
售价: 1 个论坛币
fantuanxiaot 发表于
好贴,我还看了版主针对配对交易的量化matlab code !!!!版主在配对交易的最最核心的部分编写的很牛!! ...谢谢您的夸奖 那段代码是之前做练习时写的 后来也没有深入研究下去 我都不知道对不对 最近有一个朋友做配对交易挣钱了 我也正打算开发一些相关的策略
Ψ▄┳一大卫卍卐席尔瓦
weitingkoala 发表于
谢谢您的夸奖 那段代码是之前做练习时写的 后来也没有深入研究下去 我都不知道对不对 最近有一个朋友做配 ...配对交易样本外还是要扩大止损阈值和入场信号 有限的样本外可以赚钱
好!!!!!!!!!!!
Ψ▄┳一大卫卍卐席尔瓦
weitingkoala 发表于
谢谢您的夸奖 那段代码是之前做练习时写的 后来也没有深入研究下去 我都不知道对不对 最近有一个朋友做配 ...
Ψ▄┳一大卫卍卐席尔瓦
顶上去!!!!
如花 就是不一样
初级学术勋章
初级学术勋章
初级热心勋章
初级热心勋章
中级热心勋章
中级热心勋章
中级学术勋章
中级学术勋章
初级信用勋章
初级信用勋章
中级信用勋章
中级信用勋章
高级热心勋章
高级热心勋章
高级学术勋章
高级学术勋章
特级学术勋章
特级学术勋章
特级热心勋章
高级热心勋章
高级信用勋章
高级信用勋章
特级信用勋章
高级信用勋章
无限扩大经管职场人脉圈!每天抽选10位免费名额,现在就扫& 论坛VIP& 贵宾会员& 可免费加入
&nbsp&nbsp|
&nbsp&nbsp|
&nbsp&nbsp|
&nbsp&nbsp|
&nbsp&nbsp|
&nbsp&nbsp|
如有投资本站或合作意向,请联系(010-);
邮箱:service@pinggu.org
投诉或不良信息处理:(010-)
京ICP证090565号
论坛法律顾问:王进律师

我要回帖

更多关于 二手网上交易市场 的文章

 

随机推荐