要想完美解决,office转pdf或者html,最好还是用windows office软件,libreoffice不能完美转换,wps没有api.
先确认com模块是不是开启,phpinfo里面如果有com_dotnet模块,说明已开启,如果没有,修改php.ini,com.allow_dcom = true
前面的注释去掉,重启就ok了,php官方网站说,php5.4.5之前,com模块是内置的,其实也不一定全是,官网下的php 5.3.39,com模块就没有内置.
如果不是内置模块的话,php.ini加上,前提你的ext文件夹下,有该扩展.
extension=php_com_dotnet.dll
然后重启就ok了,代码如下:
function word2html($wordname,$htmlname) { $word = new com(word.application) or die(unable to instanciate word); $word->visible = 1; $word->documents->open($wordname); $word->documents[1]->saveas($htmlname,8); $word->quit(); $word = null; unset($word); }word2html('d:/www/test/6.docx','d:/www/test/6.html');
注意:
1,转换出来的html,查看源码,比较乱的
2,转换过程中会调用winword.exe
3,如果页面一直在加载,把文档重命名,然后在重新转.
补充一个例子:
function lego_clean($text) { $text = implode(\r,$text); // normalize white space $text = eregi_replace([[:space:]]+, , $text); $text = str_replace(> \r\r([^\n|\n\015|\015\n]*)
,\\1
,$text); $text = eregi_replace(]*margin-left[^>]*>([^\n|\n\015|\015\n]*)
,\\1,$text); $text = str_replace( ,,$text); //clean up whatever is left inside and $text = eregi_replace(]*>,,$text); $text = eregi_replace(]*>,,$text); // kill unwanted tags $text = eregi_replace(]*>,,$text); $text = eregi_replace(]*>,,$text); $text = eregi_replace(]*>,,$text); $text = eregi_replace(]*>,,$text); $text = eregi_replace(]*>,,$text); // kill style and on mouse* tags $text = eregi_replace(([ \f\r\t\n\'\])style=[^>]+, \\1, $text); $text = eregi_replace(([ \f\r\t\n\'\])on[a-z]+=[^>]+, \\1, $text); //remove empty paragraphs $text = str_replace(
,,$text); //remove closing $text = str_replace(,,$text); //clean up white space again $text = eregi_replace([[:space:]]+, , $text); $text = str_replace(> \r\r<,$text); $text = str_replace(
,
\r,$text); }
文章地址:
转载随意^^请带上本文地址!
