PHP解析Apache access

使用php 来解析、读取apache 日志文件的应用很少，一般都是通过服务器端脚本来统计日志文件，但是在特殊情况下，可能php也会需要这个功能，这里我们就分享一下我的脚本给大家，首先apache的访问日志文件一般存放在：apache access log /var/log/httpd/access
使用php 来解析、读取apache 日志文件的应用很少，一般都是通过服务器端脚本来统计日志文件，但是在特殊情况下，可能php也会需要这个功能，这里我们就分享一下我的脚本给大家，首先apache的访问日志文件一般存放在：apache access log – /var/log/httpd/access_log，这个日志文件的格式需要这样：
ip地址 – [服务器日期/时间] get /path/to/page http请求类型 http响应码 http发送给客户端字节引用客户端浏览器
ip address – - server date / time [space] get /path/to/page http/type request success code bytes sent to client referer client browser
我简单的从服务器端访问日志文件中列出2行数据：
123.125.71.83 – - [30/may/2013:00:26:58 +0800] get / http/1.1 301 593 - mozilla/5.0 (compatible; baiduspider/2.0; +http://www.baidu.com/search/spider.html)
65.55.215.72 – - [26/may/2013:11:35:12 +0800] get /robots.txt http/1.1 200 335
主要目的是统计提供给用户下载的文件是不是成功了，下载了多少字节，什么时间下载的等等。
通过解析，最终会得到的结果：
这里的代码就是我做所的工作：
set_time_limit(0);
error_reporting(e_all);
ini_set(‘display_errors’, ‘on’);
$ac_arr = file(drupal_path . ‘/cron/logs/access_log’);
foreach($ac_arr as $key => $record) {
$records = preg_split(/([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)/, $record, -1, preg_split_delim_capture);
$ip = $records[1];
$left_str = $records[2];
// parse other fields
preg_match(/\[(.+)\]/, $left_str, $match);
$access_time = $match[1];
$access_unixtime = strtotime($access_time);
$access_date = date(‘y-m-d’, $access_unixtime);
$yesterday_unixtime = strtotime(date(y-m-d, time()).-1 day);
$yesterday_date = date(‘y-m-d’, $yesterday_unixtime);
//定时任务只保留昨天的访问日志
if ($yesterday_date != $access_date) {
    continue;
}
$left_str = preg_replace(/^([- ]*)\[(.+)\]/, , $left_str);
$left_str = trim($left_str);
preg_match(/^\[a-z]{3,7} (.[^\]+)\/i, $left_str, $match);
$full_path = $match[0];
$http = $match[1];
$link = explode( , $http);
$uaid = ;
//统计某个指定访问路径下的下载
if ($link && preg_match(/^\/course\/automation\/mp+/, $link[0])) {
    preg_match(/uaid=([0-9]+)/, $link[0], $match);
    $uaid = $match[1];
    preg_match(/^\/course\/automation\/(mp[0-9]+\.zip)/, $link[0], $match);
    $course = $match[1];
}
else {
    continue;
}
$left_str = str_replace($full_path, , $left_str);
$left_arr = explode( , trim($left_str));
preg_match(/([0-9]{3})/, $left_arr[0], $match);
$success_code = $match[1];
preg_match(/([0-9]+\b)/, $left_arr[1], $match);
$bytes = $match[1];
$left_str = str_replace($success_code, , $left_str);
$left_str = str_replace($bytes, , $left_str);
$left_str = trim($left_str);
preg_match(/^\(.[^\]+)\/, $left_str, $match);
$ref = $match[1];
$left_str = str_replace($match[0], , $left_str);
preg_match(/\(.[^\]+)/, trim($left_str), $match);
$browser = $match[1];
print(
ip: $ip
access time: $access_time
page: $link[0]
type: $link[1]
success code: $success_code
bytes transferred: $bytes
referer: $ref
browser: $browser
);
//insert into database
//db_query(insert into {automation_file_download} (uaid, course, download_date, ip, access_time, page, type, success_code, bytes, referer, browser) values (‘%s’, ‘%s’, ‘%s’, ‘%s’, %d, ‘%s’, ‘%s’, %d, %d, ‘%s’, ‘%s’), $uaid, $course, $access_date, $ip, $access_unixtime, $link[0], $link[1], $success_code, $bytes, $ref, $browser);
}简单说明一下我的代码：
(...)
read the rest of php解析apache access_log (44 words)
© lixiphp for lixiphp, 2013. | permalink | no comment |add to del.icio.us
post tags: access_log, apache, crontab, linux
feed enhanced by better feed from ozh

PHP解析Apache access_log

VIP推荐