[python-chinese] 答复: python中文分词模块

郝钰 haoyu在csdn.net
星期五 六月 1 15:29:28 HKT 2007


分词的结果不错,请问可以自己增加词条吗

 

发件人: python-chinese-bounces at lists.python.cn
[mailto:python-chinese-bounces at lists.python.cn] 代表 junyi sun
发送时间: 2007年5月31日 21:56
收件人: python-chinese at lists.python.cn
主题: Re: [python-chinese] python中文分词模块

 

我也不知道为什么maillist里面的附件下载不了,所以我上传到csdn了。

下载地址:

http://download.csdn.net/source/187315



 

On 5/30/07, cun heise <cunheise at hotmail.com> wrote: 

发一份给我吧谢谢了
cunheise at hotmail.com


>From: "eking" < eking_he at mezimedia.com>
>Reply-To: python-chinese at lists.python.cn
>To: < <mailto:python-chinese at lists.python.cn>  python-chinese at lists.python.
cn>
>Subject: Re: [python-chinese] python中文分词模块
>Date: Wed, 30 May 2007 18:43:16 +0800
>
>
>
>
>
>
>
><
<http://mail.google.com/mail/?realattid=f_f2anayul&attid=0.1&disp=attd&view=
>
http://mail.google.com/mail/?realattid=f_f2anayul&attid=0.1&disp=attd&view=

>att&th=112d8e60a108c5fa>
>
>
>
>python中文分词.rar
>2324K Download
><
<http://mail.google.com/mail/?realattid=f_f2anayul&attid=0.1&disp=safe&view=
>
http://mail.google.com/mail/?realattid=f_f2anayul&attid=0.1&disp=safe&view=

>att&th=112d8e60a108c5fa>
>
>这个是你自己gmail的链接吧,别人怎么下载得了
>
>
>
>   _____
> 
>From: python-chinese-bounces at lists.python.cn
>[mailto:python-chinese-bounces at lists.python.cn ] On Behalf Of junyi sun
>Sent: 2007年5月30日 18:29
>To: python-chinese at lists.python.cn
>Subject: Re: [python-chinese] python中文分词模块
>
>
>
>我又发了一遍,现在可以看见了吗?
>
>On 5/30/07, junyi sun <ccnusjy at gmail.com> wrote:
>
>这个模块是我的PySozone(python开发的搜索引擎)项目中的一部分,拿出来开源。
>
>
> 
>1.算法采用反向最大匹配算法
>
>2.字典用bsddb的btopen模式
>
>3.词库规模15万词
>
>4.分词方式有冗余方式和保守方式
>
>
>
>使用方法:
>
>d=CDict()
>
>s="我爱北京天安门".decode('gbk').encode('utf-8')) 
>words=d.segWords(s)
>for w in words:
>         print w.decode('utf-8')
>
>
>
>PS:
>
>词库是基于bsddb的btopen模式的,大家可以根据需要添加自定义的新词。
>
>
>
>
>
>
>
>
>
><http://mail.google.com/mail/?realattid=f_f2anayul
<http://mail.google.com/mail/?realattid=f_f2anayul&attid=0.1&disp=attd&view=
> &attid=0.1&disp=attd&view= 

>att&th=112d8e60a108c5fa>
>
>
>
>python中文分词.rar
>2324K Download
><
<http://mail.google.com/mail/?realattid=f_f2anayul&attid=0.1&disp=safe&view=
>
http://mail.google.com/mail/?realattid=f_f2anayul&attid=0.1&disp=safe&view=

>att&th=112d8e60a108c5fa>
>
>
>
>On 5/30/07, 风向标 < vaneoooo at gmail.com <mailto:vaneoooo at gmail.com> >
wrote:
>
>这个我怎么没收到主邮件?光有个Re的了
>_______________________________________________
>python-chinese
>Post: send python-chinese at lists.python.cn 
>Subscribe: send subscribe to python-chinese-request at lists.python.cn
>Unsubscribe: send unsubscribe to
<mailto:python-chinese-request at lists.python.cn>
python-chinese-request at lists.python.cn
>Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
>
>
>
>


>_______________________________________________
>python-chinese
>Post: send python-chinese at lists.python.cn
>Subscribe: send subscribe to python-chinese-request at lists.python.cn
>Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
>Detail Info: http://python.cn/mailman/listinfo/python-chinese

_________________________________________________________________
与联机的朋友进行交流,请使用  Live Messenger;
http://get.live.com/messenger/overview 

_______________________________________________
python-chinese
Post: send python-chinese at lists.python.cn
Subscribe: send subscribe to python-chinese-request at lists.python.cn
Unsubscribe: send unsubscribe to  python-chinese-request at lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://python.cn/pipermail/python-chinese/attachments/20070601/fa22f28f/attachment.htm 


关于邮件列表 python-chinese 的更多信息