您好,欢迎来到三六零分类信息网!老站,搜索引擎当天收录,欢迎发信息

最快的速度获取网页全部图片的长和宽

2025/8/21 5:37:06发布27次查看
最快的速度获取网页所有图片的长和宽。
不知道大家有没有玩过 http://pinterest.com ?注册后,它有一个 add a pin, 当你提交一个网站的url后,按find images时,它可以查找你提交网页上所有图片的(并进行长和宽条件的筛选),整个过程一般在10秒左右。
最近想模仿它,做一个小功能组件。已经摒弃掉万恶的 getimagesize() (需要48.64秒),换用 imagecreatefromstring()(还是需要26.13秒),和它10秒左右的成绩,简直是天壤之别。
要考虑 tcp 连接数,要做到服务器资源最省化,还要考虑执行时间最少化。求助万能的大虾们,如何继续优化代码?可以跑的更快些。
function ranger($url){
$headers = array( range: bytes=0-32768 );
$curl = curl_init($url);
curl_setopt($curl, curlopt_httpheader, $headers);
curl_setopt($curl, curlopt_returntransfer, 1);
return curl_exec($curl);
curl_close($curl);
}//curl设置
require dirname(__file__) . '/simple_html_dom.php';
//采用simple_html_dom.php分析html nod
$url = 'http://www.huffingtonpost.com/';
$html = file_get_html($url);
if($html->find('img')){
foreach($html->find('img') as $element) {
$raw = ranger($element->src);
$im = @imagecreatefromstring($raw);
$width = @imagesx($im);
$height = @imagesy($im);
if($width>=200||$height>=200){
echo $element;//得出长大于大于200,宽大于等于200的图片
}
}
}

------解决思路----------------------
也许能走个弯路,减轻服务器网络压力。
服务器负责解析html数据,统计image标签信息,最后将收集的文本数据送回客户端。
加载图片由客户端来完成,只需读取width,height属性,就完全可以获取图片的原始大小。
好处多多,不过可能的麻烦是防盗链
------解决思路----------------------
顶楼上
php获取资源
javascript 取图片长和宽
------解决思路----------------------
读取并解析 2.8秒
读取图片(138个) 27秒
找到 7 个
仅从优化代码出发,应该油水不大
可考虑多路并发
------解决思路----------------------
读取并解析 3.6秒
启动读取图片进程(138个) 1.3秒
结果文件中记录数 7 个
http://s.huffpost.com/images/v/logos/v4/tagline.gif
http://s.huffpost.com/images/v/logos/v4/homepage.gif?v9
http://i.huffpost.com/gen/559399/thumbs/r-olbermann-huge.jpg
http://s.huffpost.com/images/facebook_promo_connect.png?3
http://images.huffingtonpost.com/2012-04-04-michaeljfoxmarlo2second.jpg
http://images.huffingtonpost.com/2012-04-05-screenshot20120405at9.40.24am.jpg
http://i.huffpost.com/gen/557914/thumbs/s-scorsese-large300.jpg

原循环改为 foreach($html->find('img') as $element) {
tenor(tenorcall.php?v=$element->src);
}
}

tenorcall.phpfunction ranger($url){
$headers = array( range: bytes=0-32768 );
$curl = curl_init($url);
curl_setopt($curl, curlopt_httpheader, $headers);
curl_setopt($curl, curlopt_returntransfer, 1);
return curl_exec($curl);
curl_close($curl);
}//curl设置
$raw = ranger($_get['v']);
$im = @imagecreatefromstring($raw);
$width = @imagesx($im);
$height = @imagesy($im);
if($width>=200
------解决思路----------------------
$height>=200){
file_put_contents('tenorcall.txt', $_get['v'].php_eol, file_append );//得出长大于大于200,宽大于等于200的图片
}

/**
* 函数 tenor
* 功能 启动一个url,但不等待返回
* 参数 $page,待执行的页面程序
* 返回 无
**/
if(! function_exists('tenor')):
function tenor($page) {
$host = $_server[http_host];
$fp = fsockopen($host, 80, $errno, $errmsg);
if(!$fp) {
echo $errstr ($errno)
\n;
} else {
fputs($fp,get /$page http/1.0\nhost: $host\n\n);
fclose($fp);
}
}
endif;

代码还是原代码,非但没减少,反而增加了
但因为是并发,所以速度明显提高
值得注意的是:tenor 函数在某些web服务器中不能稳定的运行(比如iis6)原因不明
------解决思路----------------------
我觉得,让客户端加载的方案是可行的,
客户端再将符合要求的图片信息提交给服务器,服务器端再验证一次后保存。。。
另外32768是怎么得来的?1-200不够吗
------解决思路----------------------
学习! 是用php获取图片url后直接读取图片的头信息吗?
------解决思路----------------------
pinterest那个pin功能创意很好,而且技术很简单,就是书签一串js代码,然后你点这个书签就相当于往当前页面文档append入一个js文件,这个js文件怎么写,就很简单了,主要就是遍历document.getelementsbytagname('img')
------解决思路----------------------
本帖最后由 xuzuning 于 2012-04-06 15:25:06 编辑
138个照片并发,是不是就消耗了138个连接数

是否需要修改php.ini,增加连接数
否,连接是向外的,如果要改,也是对方改
cpu和内存开销如何
这个不太好测试
又,关于使用 js 判断的问题,由于他们没有给出代码,无法测试
自己写了两个方案都不理想,也就作罢了
用js并发和直接php并发,2者从资源消耗角度来比,哪个会更少
资源消耗角度来比 都一样,都要完整的加载图片
不过前者是消耗客户端资源,后者是消耗服务器端资源
另外浏览器的机制不很了解,是否真的是并发也未可知
------解决思路----------------------
这段代码在我这里大约 1.8秒,不计算 file_get_html ( $url ) 时间
$res [] = $url ;//$temp;
这样就是网络地址了
他是保存为本地文件后用 getimagesize 获取尺寸的
他应该是通过 curl 并发的,这个机制我不太了解
------解决思路----------------------
但是 if(in_array($absurl, $visited))continue; 这行报错。 warning: in_array() expects parameter 2 to be array, null。
他的代码中并没有你说的出错的代码
应该是 file_get_html 在报错吧
file_get_html 使用 file_get_contents 读取 url 成功率较低
经常要刷两三次才可独到数据
------解决思路----------------------
js可以通过获取图片的头部信息,而直接获取到图片的高度,
这种方式比用图片加载完成以后在获取他的搞定效率至少快10倍以上,
之前记得有在一个播客里面看到过这么个帖子来着,
没收藏,这一时半会的找不到了,郁闷啊~
------解决思路----------------------
刚注册了http://pinterest.com。 它的做法就是用客户端来加载
点击add 选择pin ,贴上网址 http://www.huffingtonpost.com/
在chrome的network中可以看到有一个请求
get /pin/create/find_images/?url=http%253a%2f%2fwww.huffingtonpost.com http/1.1
返回的信息是一个json对象:
images: [http://s.huffpost.com/images/v/logos/v4/homepage.gif?v9,…]
0: http://s.huffpost.com/images/v/logos/v4/homepage.gif?v9
1: http://s.huffpost.com/images/v/logos/v4/tagline.gif
2: http://s.huffpost.com/images/splash/t_mini-a.png
3: http://s.huffpost.com/images/splash/t_mini-a.png
4: http://s.huffpost.com/images/splash/t_mini-a.png
5: http://s.huffpost.com/images/splash/t_mini-a.png
6: http://s.huffpost.com/images/splash/t_mini-a.png
7: http://s.huffpost.com/images/splash/t_mini-a.png
8: http://s.huffpost.com/images/splash/t_mini-a.png
9: http://s.huffpost.com/images/splash/t_mini-a.png
10: http://s.huffpost.com/images/splash/t_mini-a.png
11: http://s.huffpost.com/images/splash/t_mini-a.png
12: http://s.huffpost.com/images/splash/t_mini-a.png
13: http://s.huffpost.com/images/splash/t_mini-a.png
14: http://s.huffpost.com/images/splash/t_mini-a.png
15: http://s.huffpost.com/images/splash/t_mini-a.png
16: http://s.huffpost.com/images/splash/t_mini-a.png
17: http://i.huffpost.com/gen/560770/thumbs/r-gsa-las-vegas-video-huge.jpg
18: http://s.huffpost.com/images/webslice12x12.png
19: http://s.huffpost.com/images/v/blog_column.png
20: http://s.huffpost.com/contributors/gary-hart/headshot.jpg
21: http://www.huffingtonpost.com/images/trans.gif
22: http://www.huffingtonpost.com/images/trans.gif
23: http://www.huffingtonpost.com/images/trans.gif
24: http://images.huffingtonpost.com/2012-04-06-campbellguitar.jpg
25: http://www.huffingtonpost.com/images/trans.gif
26: http://www.huffingtonpost.com/images/trans.gif
27: http://www.huffingtonpost.com/images/trans.gif
28: http://www.huffingtonpost.com/images/trans.gif
29: http://www.huffingtonpost.com/images/trans.gif
30: http://www.huffingtonpost.com/images/trans.gif
31: http://images.huffingtonpost.com/2012-04-06-screenshot20120406at7.09.17pm.jpg
32: http://www.huffingtonpost.com/images/trans.gif
33: http://www.huffingtonpost.com/images/trans.gif
34: http://www.huffingtonpost.com/images/trans.gif
35: http://www.huffingtonpost.com/images/trans.gif
36: http://www.huffingtonpost.com/images/trans.gif
37: http://www.huffingtonpost.com/images/trans.gif
38: http://www.huffingtonpost.com/images/trans.gif
39: http://www.huffingtonpost.com/images/trans.gif
40: http://www.huffingtonpost.com/images/trans.gif
41: http://www.huffingtonpost.com/images/trans.gif
42: http://www.huffingtonpost.com/images/trans.gif
43: http://www.huffingtonpost.com/images/trans.gif
44: http://www.huffingtonpost.com/images/trans.gif
45: http://www.huffingtonpost.com/images/trans.gif
46: http://www.huffingtonpost.com/images/trans.gif
47: http://www.huffingtonpost.com/images/trans.gif
48: http://www.huffingtonpost.com/images/trans.gif
49: http://www.huffingtonpost.com/images/trans.gif
50: http://www.huffingtonpost.com/images/trans.gif
51: http://www.huffingtonpost.com/images/trans.gif
52: http://www.huffingtonpost.com/images/trans.gif
53: http://www.huffingtonpost.com/images/trans.gif
54: http://www.huffingtonpost.com/images/trans.gif
55: http://www.huffingtonpost.com/images/trans.gif
56: http://www.huffingtonpost.com/images/trans.gif
57: http://www.huffingtonpost.com/images/trans.gif
58: http://www.huffingtonpost.com/images/trans.gif
59: http://www.huffingtonpost.com/images/trans.gif
60: http://www.huffingtonpost.com/images/trans.gif
61: http://www.huffingtonpost.com/images/trans.gif
62: http://www.huffingtonpost.com/images/trans.gif
63: http://www.huffingtonpost.com/images/trans.gif
64: http://www.huffingtonpost.com/images/trans.gif
65: http://www.huffingtonpost.com/images/trans.gif
66: http://www.huffingtonpost.com/images/trans.gif
67: http://www.huffingtonpost.com/images/trans.gif
68: http://www.huffingtonpost.com/images/trans.gif
69: http://www.huffingtonpost.com/images/trans.gif
70: http://www.huffingtonpost.com/images/trans.gif
71: http://www.huffingtonpost.com/images/trans.gif
72: http://www.huffingtonpost.com/images/trans.gif
73: http://www.huffingtonpost.com/images/trans.gif
74: http://www.huffingtonpost.com/images/trans.gif
75: http://s.huffpost.com/images/blank.gif
76: http://s.huffpost.com/images/blank.gif
77: http://s.huffpost.com/images/blank.gif
78: http://s.huffpost.com/images/blank.gif
79: http://s.huffpost.com/images/blank.gif
80: http://s.huffpost.com/images/blank.gif
81: http://s.huffpost.com/images/blank.gif
82: http://s.huffpost.com/images/facebook_promo_connect.png?3
83: http://s.huffpost.com/images/loader.gif
84: http://www.huffingtonpost.com/images/trans.gif
85: http://www.huffingtonpost.com/images/trans.gif
86: http://www.huffingtonpost.com/images/trans.gif
87: http://www.huffingtonpost.com/images/trans.gif
88: http://www.huffingtonpost.com/images/trans.gif
89: http://www.huffingtonpost.com/images/trans.gif
90: http://s.huffpost.com/contributors/gary-hart/headshot.jpg
91: http://s.huffpost.com/contributors/mike-campbell/headshot.jpg
92: http://s.huffpost.com/contributors/roma-downey/headshot.jpg
93: http://s.huffpost.com/contributors/gavin-newsom/headshot.jpg
94: http://s.huffpost.com/contributors/sarah-shourd/headshot.jpg
95: http://s.huffpost.com/contributors/jacqueline-novogratz/headshot.jpg
96: http://s.huffpost.com/contributors/peggy-drexler/headshot.jpg
97: http://s.huffpost.com/contributors/mohamed-a-elerian/headshot.jpg
98: http://s.huffpost.com/contributors/bill-mckibben/headshot.jpg
99: http://s.huffpost.com/contributors/marlo-thomas/headshot.jpg
100: http://www.huffingtonpost.com/images/v/something_to_say_button.png
101: http://www.huffingtonpost.com/images/trans.gif
102: http://www.huffingtonpost.com/images/trans.gif
103: http://www.huffingtonpost.com/images/trans.gif
104: http://www.huffingtonpost.com/images/trans.gif
105: http://www.huffingtonpost.com/images/trans.gif
106: http://www.huffingtonpost.com/images/trans.gif
107: http://www.huffingtonpost.com/images/trans.gif
108: http://www.huffingtonpost.com/images/trans.gif
109: http://www.huffingtonpost.com/images/trans.gif
110: http://www.huffingtonpost.com/images/trans.gif
111: http://www.huffingtonpost.com/images/trans.gif
112: http://www.huffingtonpost.com/images/trans.gif
113: http://www.huffingtonpost.com/images/trans.gif
114: http://www.huffingtonpost.com/images/trans.gif
115: http://www.huffingtonpost.com/images/trans.gif
116: http://www.huffingtonpost.com/images/trans.gif
117: http://www.huffingtonpost.com/images/trans.gif
118: http://www.huffingtonpost.com/images/trans.gif
119: http://www.huffingtonpost.com/images/trans.gif
120: http://www.huffingtonpost.com/images/trans.gif
121: http://www.huffingtonpost.com/images/trans.gif
122: http://www.huffingtonpost.com/images/trans.gif
123: http://www.huffingtonpost.com/images/trans.gif
124: http://www.huffingtonpost.com/images/trans.gif
125: http://www.huffingtonpost.com/images/trans.gif
126: http://www.huffingtonpost.com/images/trans.gif
127: http://www.huffingtonpost.com/images/trans.gif
128: http://www.huffingtonpost.com/images/trans.gif
129: http://www.huffingtonpost.com/images/trans.gif
130: http://www.huffingtonpost.com/images/trans.gif
131: http://www.huffingtonpost.com/images/trans.gif
132: http://www.huffingtonpost.com/images/trans.gif
133: http://www.huffingtonpost.com/images/trans.gif
134: http://b.scorecardresearch.com/p?c1=2&c2=6723616&c3=&c4=&c5=front&c6=&c15=&cj=1
135: http://www.huffingtonpost.com//secure-us.imrworldwide.com/cgi-bin/m?ci=us-703240h&cg=0&cc=1&ts=noscript
136: http://vertical-stats.huffpost.com/?-1&&
137: http://www.huffingtonpost.com//pixel.quantserve.com/pixel/p-6ftutip1smlm2.gif?labels=home
images_count: 138
redirected: false
status: success
title: breaking news and opinion on the huffington post
type: text/html; charset=utf-8
几乎是服务器返回的同时,浏览器开始加载图片。chrome监控如下。黄色的那个线表示提交url获取图片资源,后面的就都是加载图片了,加载的速度还是取决于我这儿的网络。
由于http://pinterest.com/的js代码经过压缩,且使用了jquery,所以找起来特别费劲。其实具体怎么干就很简单,谁都能想到。遍历json数据,创建img标签对象,设置src属性,保存对象。剩下的浏览器就会自己完成。
------解决思路----------------------
引用:引用:
刚注册了http://pinterest.com。 它的做法就是用客户端来加载
点击add 选择pin ,贴上网址 http://www.huffingtonpost.com/
在chrome的network中可以看到有一个请求
get /pin/create/find_images/?url=http%253a%2f%2fwww.huffingtonpo……
什么对象? 
你是说服务器返回的image链接的数据吗?不用保存呀。收到ajax请求后解析返回数据就完了
另外,浏览器加载外部资源都是异步。也就是说,不管是不是用的jquery,都是异步加载的,相互不会影响。和老大写的php端的差不多。
该用户其它信息

VIP推荐

免费发布信息,免费发布B2B信息网站平台 - 三六零分类信息网 沪ICP备09012988号-2
企业名录 Product