Python论坛  - 讨论区

标题:[python-chinese] How to check if the 8th bit of a byte is set?

2004年03月13日 星期六 08:24

Anthony Liu antonyliu2002 at yahoo.com
Sat Mar 13 08:24:58 HKT 2004

First, how do we read one byte each time in Python?

Second, the text file I am processing has both Chinese
characters and ascii characters.  So, I wanna know
when I hit a Chinese character.  The way to detect is
to check if the eighth bit of a byte is set since a
Chinese character has 2 bytes.  So, here is the 2nd
question, how to check if the 8th bit is set in
Python?  Does python have such bit operation
functions?

Thanks.

__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月13日 星期六 08:59

Who Bruce whoonline at msn.com
Sat Mar 13 08:59:49 HKT 2004

tell us what process you want to perform to the document, then we can help 
you. 


>From: Anthony Liu <antonyliu2002 at yahoo.com>
>Reply-To: python-chinese at lists.python.cn
>To: pycn <python-chinese at lists.python.cn>
>Subject: [python-chinese] incomplete multibyte sequence
>Date: Fri, 12 Mar 2004 02:35:51 -0800 (PST)
>
>I am trying to process a huge Chinese document.  The
>single document is in pure text format and it's nearly
>4M.
>
>I always get "incomplete multibyte sequence" error
>when I try to unicode the sentences.
>
>I think the reason is because the Chinese document
>uses both ascii punctuations and 2-byte Chinese
>punctuations.
>
>For example, the single document can both , and
>£¬ and both < > and ¡¶¡·.
>
>Is there anyway, I can go around this?  Don't ask me
>to fix the Chinese document!
>
>__________________________________
>Do you Yahoo!?
>Yahoo! Search - Find what you’re looking for faster
>http://search.yahoo.com
>_______________________________________________
>python-chinese list
>python-chinese at lists.python.cn
>http://python.cn/mailman/listinfo/python-chinese

_________________________________________________________________
ÏíÓÃÊÀ½çÉÏ×î´óµÄµç×ÓÓʼþϵͳ¡ª MSN Hotmail¡£  http://www.hotmail.com  



[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月13日 星期六 09:07

Who Bruce whoonline at msn.com
Sat Mar 13 09:07:48 HKT 2004

>From: Anthony Liu <antonyliu2002 at yahoo.com>
>Reply-To: python-chinese at lists.python.cn
>To: pycn <python-chinese at lists.python.cn>
>Subject: [python-chinese] How to check if the 8th bit of a byte is set?
>Date: Fri, 12 Mar 2004 16:24:58 -0800 (PST)
>
>First, how do we read one byte each time in Python?
s='abc'
s[0] get the first byte


>Second, the text file I am processing has both Chinese
>characters and ascii characters.  So, I wanna know
>when I hit a Chinese character.  The way to detect is
>to check if the eighth bit of a byte is set since a
>Chinese character has 2 bytes.  So, here is the 2nd
>question, how to check if the 8th bit is set in
>Python?  Does python have such bit operation
>functions?

a='a'
ord(a) & 0x80

>Thanks.
>
>__________________________________
>Do you Yahoo!?
>Yahoo! Search - Find what you’re looking for faster
>http://search.yahoo.com
>_______________________________________________
>python-chinese list
>python-chinese at lists.python.cn
>http://python.cn/mailman/listinfo/python-chinese

_________________________________________________________________
ÓëÁª»úµÄÅóÓѽøÐн»Á÷£¬ÇëʹÓà MSN Messenger:  http://messenger.msn.com/cn  



[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月13日 星期六 16:17

Anthony Liu antonyliu2002 at yahoo.com
Sat Mar 13 16:17:42 HKT 2004

--- Who Bruce <whoonline at msn.com> wrote:
> 
> >From: Anthony Liu <antonyliu2002 at yahoo.com>
> >Reply-To: python-chinese at lists.python.cn
> >To: pycn <python-chinese at lists.python.cn>
> >Subject: [python-chinese] How to check if the 8th
> bit of a byte is set?
> >Date: Fri, 12 Mar 2004 16:24:58 -0800 (PST)
> >
> >First, how do we read one byte each time in Python?
> s='abc'
> s[0] get the first byte

Thank you Bruce, yes, this helps.  I can probably just
read line by line by readline() and then check each
character in this line.  A good hint.

> >Second, the text file I am processing has both
> Chinese
> >characters and ascii characters.  So, I wanna know
> >when I hit a Chinese character.  The way to detect
> is
> >to check if the eighth bit of a byte is set since a
> >Chinese character has 2 bytes.  So, here is the 2nd
> >question, how to check if the 8th bit is set in
> >Python?  Does python have such bit operation
> >functions?
> 
> a='a'
> ord(a) & 0x80

This helps a lot.  I believe you are right, but I
don't yet know why you AND 0x80.  Can you explain
please?  Don't laugh at me, I don't know much about
the internal representation of a character.


> >__________________________________
> >Do you Yahoo!?
> >Yahoo! Search - Find what you抮e looking for
faster
> >http://search.yahoo.com
> >_______________________________________________
> >python-chinese list
> >python-chinese at lists.python.cn
> >http://python.cn/mailman/listinfo/python-chinese
> 
>
_________________________________________________________________
>
与联机的朋友进行交流,请使用
MSN Messenger: 
> http://messenger.msn.com/cn  
> 
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese


__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月13日 星期六 16:25

Anthony Liu antonyliu2002 at yahoo.com
Sat Mar 13 16:25:26 HKT 2004

Bruce,

Thank you.  Actually, I just wanna break apart every
chinese sentence using the punctuations as the
delimiter, and then get the initial and final
characters in each clause.  For example, if we have a
Chinese sentence like so (assuming c1 ... cn represent
the 1st through the nth characters in the sentence):

c1c2c3c4,c5c6c7"c8c9c10c11.

I want to break it apart so that I get 3 clauses in
the this case, as follows:

c1c2c3c4
c5c6c7
c8c9c10c11

And then get the initial and final characters of each
clause, i.e., in this case,
c1, c4, c5, c7, c8, c11

I guess I will probably just read line by line and
check each character individually.  When I hit the
characters on both sides of a punctuation, then I
store them.  What do you think?
  
--- Who Bruce <whoonline at msn.com> wrote:
> tell us what process you want to perform to the
> document, then we can help 
> you. 
> 
> 
> >From: Anthony Liu <antonyliu2002 at yahoo.com>
> >Reply-To: python-chinese at lists.python.cn
> >To: pycn <python-chinese at lists.python.cn>
> >Subject: [python-chinese] incomplete multibyte
> sequence
> >Date: Fri, 12 Mar 2004 02:35:51 -0800 (PST)
> >
> >I am trying to process a huge Chinese document. 
> The
> >single document is in pure text format and it's
> nearly
> >4M.
> >
> >I always get "incomplete multibyte sequence" error
> >when I try to unicode the sentences.
> >
> >I think the reason is because the Chinese document
> >uses both ascii punctuations and 2-byte Chinese
> >punctuations.
> >
> >For example, the single document can both , and
> >, and both < > and 《》.
> >
> >Is there anyway, I can go around this?  Don't ask
> me
> >to fix the Chinese document!
> >
> >__________________________________
> >Do you Yahoo!?
> >Yahoo! Search - Find what you抮e looking for
faster
> >http://search.yahoo.com
> >_______________________________________________
> >python-chinese list
> >python-chinese at lists.python.cn
> >http://python.cn/mailman/listinfo/python-chinese
> 
>
_________________________________________________________________
>
享用世界上最大的电子邮件系统—
MSN Hotmail。 
> http://www.hotmail.com  
> 
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese


__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月13日 星期六 17:15

Anthony Liu antonyliu2002 at yahoo.com
Sat Mar 13 17:15:37 HKT 2004

> a='a'
> ord(a) & 0x80

ord(a) & 10000000_binary = 0 if 'a' is an ascii
character.

else if 'a' is the 1st byte of a Chinese character,

ord(a) & 10000000_binary = 128

Right?

__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月14日 星期日 12:34

靳云 zhuwangsheng at hotmail.com
Sun Mar 14 12:34:48 HKT 2004

本人十分希望学习Python语言,哪位能告诉我在哪能买到或下载???????????????
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.exoweb.net/pipermail/python-chinese/attachments/20040314/8ec29b2e/attachment.htm

[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月15日 星期一 10:09

Zoom.Quiet zoomq at infopro.cn
Mon Mar 15 10:09:15 HKT 2004

Hello 靳云,

Faint!!!!

大侠!稍微搜索一下子先!

Python GNU 系统的开源工程!

www.python.org

自由下载,学习,使用!

=== [ 12:34 ; 04-03-14 ] you wrote:

? 本人十分希望学习Python语言,哪位能告诉我在哪能买到或下载???????????????

=== === === === === === === === === === 

-- 
Best regards,
 Zoom.Quiet                            

 /=======================================\
]Time is unimportant, only life important![
 \=======================================/



[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2024

    京ICP备05028076号