展会信息港展会大全

Linuxeden 新闻 文档 资料 教程 Linux伊甸园
来源:互联网   发布日期:2011-09-07 14:03:59   浏览:48167次  

导读:htmlbrbrheadbrmeta http-equiv=Content-Language content=en-usbrmeta http-equiv=Co...

JAVA的中文处理学习笔记  找linux工作,招linux人才,到Linuxeden人才频道

2003-01-05    chedong       点击: 7343








Hello Unicode

    ——JAVA的中文处理学习笔记

作者: 车东 chedong@bigfoot.com



最后更新:2002-12-30 13:20:57


版权声明:可以任意转载,转载时请务必标明原始出处和作者信息

关键词:linux java mutlibyte encoding locale i18n i10n chinese 


内容摘要:通过2个测试程序说明系统缺省编码方式和应用的编码策略对字符处理的影响,选择合适的编码处理策略,构建更符合国际化规范的通用应用。


测试程序-1

==========


为了了解JAVA应用的编码处理的机制,首先要了解操作系统对JVM缺省编码方式的影响,因此我做了一个Env.java,用于打印显示不同系统下JVM的属性和系统支持的LOCALE。程序很简单:


/*
* Copyright (c) 2002 chedong@bigfoot.com
* $Id: Env.java,v 1.1 2002/07/30 09:48:12 chedong Exp $
*/

import java.util.*;
import java.text.*;

/**
* 目的:
* 显示环境变量和JVM的缺省属性
* 输入:无
* 输出:
* 1 支持的LOCALE
* 2 JVM的缺省属性
*/

public class Env {
/**
* main entrance
*/
public static void main(String[] args) {

System.out.println("Hello, it's: " + new Date());

//print available locales
Locale list[] = DateFormat.getAvailableLocales();
System.out.println("======System available locales:======== ");
for (int i = 0; i < list.length; i++) {
System.out.println(list[i].toString() + "\t" + list[i].getDisplayName());
}

//print JVM default properties
System.out.println("======System property======== ");
System.getProperties().list(System.out);
}
}

最需要注意的是JVM的file.encoding属性,这个属性确定了JVM的缺省的编码/解码方式:从而影响应用中所有字节流==>字符流的解码方式 
字符流==>字节流的编码方式。


    LINUX下的LOCALE可以通过 LANG=zh_CN; LC_ALL=zh_CN.GBK;
export LANG LC_ALL 设置。locale 命令可以显示系统当前的环境设置

    Windows的LOCALE可以通过 控制面板==>区域设置
设置实现




Linux(J2SE1.3.1)
LANG=en_US LC_ALL=en_US
Linux(J2SE1.3.1)
LANG=zh_CN LC_ALL=zh_CN.GBK
Windows(J2SE1.3.0) 区域设置:中国 
中文
Windows(J2SE1.3.0) 区域设置:英国 英文



Hello, it's: Tue Jul 30 11:05:44 CST 2002

======System available locales:======== 

en English

en_US English (United States)

ar Arabic

ar_AE Arabic (United Arab Emirates)

ar_BH Arabic (Bahrain)

ar_DZ Arabic (Algeria)

ar_EG Arabic (Egypt)

ar_IQ Arabic (Iraq)

ar_JO Arabic (Jordan)

ar_KW Arabic (Kuwait)

ar_LB Arabic (Lebanon)

ar_LY Arabic (Libya)

ar_MA Arabic (Morocco)

ar_OM Arabic (Oman)

ar_QA Arabic (Qatar)

ar_SA Arabic (Saudi Arabia)

ar_SD Arabic (Sudan)

ar_SY Arabic (Syria)

ar_TN Arabic (Tunisia)

ar_YE Arabic (Yemen)

be Byelorussian

be_BY Byelorussian (Belarus)

bg Bulgarian

bg_BG Bulgarian (Bulgaria)

ca Catalan

ca_ES Catalan (Spain)

ca_ES_EURO Catalan (Spain,Euro)

cs Czech

cs_CZ Czech (Czech Republic)

da Danish

da_DK Danish (Denmark)

de German

de_AT German (Austria)

de_AT_EURO German (Austria,Euro)

de_CH German (Switzerland)

de_DE German (Germany)

de_DE_EURO German (Germany,Euro)

de_LU German (Luxembourg)

de_LU_EURO German (Luxembourg,Euro)

el Greek

el_GR Greek (Greece)

en_AU English (Australia)

en_CA English (Canada)

en_GB English (United Kingdom)

en_IE English (Ireland)

en_IE_EURO English (Ireland,Euro)

en_NZ English (New Zealand)

en_ZA English (South Africa)

es Spanish

es_BO Spanish (Bolivia)

es_AR Spanish (Argentina)

es_CL Spanish (Chile)

es_CO Spanish (Colombia)

es_CR Spanish (Costa Rica)

es_DO Spanish (Dominican Republic)

es_EC Spanish (Ecuador)

es_ES Spanish (Spain)

es_ES_EURO Spanish (Spain,Euro)

es_GT Spanish (Guatemala)

es_HN Spanish (Honduras)

es_MX Spanish (Mexico)

es_NI Spanish (Nicaragua)

et Estonian

es_PA Spanish (Panama)

es_PE Spanish (Peru)

es_PR Spanish (Puerto Rico)

es_PY Spanish (Paraguay)

es_SV Spanish (El Salvador)

es_UY Spanish (Uruguay)

es_VE Spanish (Venezuela)

et_EE Estonian (Estonia)

fi Finnish

fi_FI Finnish (Finland)

fi_FI_EURO Finnish (Finland,Euro)

fr French

fr_BE French (Belgium)

fr_BE_EURO French (Belgium,Euro)

fr_CA French (Canada)

fr_CH French (Switzerland)

fr_FR French (France)

fr_FR_EURO French (France,Euro)

fr_LU French (Luxembourg)

fr_LU_EURO French (Luxembourg,Euro)

hr Croatian

hr_HR Croatian (Croatia)

hu Hungarian

hu_HU Hungarian (Hungary)

is Icelandic

is_IS Icelandic (Iceland)

it Italian

it_CH Italian (Switzerland)

it_IT Italian (Italy)

it_IT_EURO Italian (Italy,Euro)

iw Hebrew

iw_IL Hebrew (Israel)

ja Japanese

ja_JP Japanese (Japan)

ko Korean

ko_KR Korean (South Korea)

lt Lithuanian

lt_LT Lithuanian (Lithuania)

lv Latvian (Lettish)

lv_LV Latvian (Lettish) (Latvia)

mk Macedonian

mk_MK Macedonian (Macedonia)

nl Dutch

nl_BE Dutch (Belgium)

nl_BE_EURO Dutch (Belgium,Euro)

nl_NL Dutch (Netherlands)

nl_NL_EURO Dutch (Netherlands,Euro)

no Norwegian

no_NO Norwegian (Norway)

no_NO_NY Norwegian (Norway,Nynorsk)

pl Polish

pl_PL Polish (Poland)

pt Portuguese

pt_BR Portuguese (Brazil)

pt_PT Portuguese (Portugal)

pt_PT_EURO Portuguese (Portugal,Euro)

ro Romanian

ro_RO Romanian (Romania)

ru Russian

ru_RU Russian (Russia)

sh Serbo-Croatian

sh_YU Serbo-Croatian (Yugoslavia)

sk Slovak

sk_SK Slovak (Slovakia)

sl Slovenian

sl_SI Slovenian (Slovenia)

sq Albanian

sq_AL Albanian (Albania)

sr Serbian

sr_YU Serbian (Yugoslavia)

sv Swedish

sv_SE Swedish (Sweden)

th Thai

th_TH Thai (Thailand)

tr Turkish

tr_TR Turkish (Turkey)

uk Ukrainian

uk_UA Ukrainian (Ukraine)

zh Chinese

zh_CN Chinese (China)

zh_HK Chinese (Hong Kong)

zh_TW Chinese (Taiwan)

======System property======== 

-- listing properties --

java.runtime.name=Java(TM) 2 Runtime Environment, Stand...

sun.boot.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386

java.vm.version=1.3.1_04-b02

java.vm.vendor=Sun Microsystems Inc.

java.vendor.url=

path.separator=:

java.vm.name=Java HotSpot(TM) Client VM

file.encoding.pkg=sun.io

java.vm.specification.name=Java Virtual Machine Specification

user.dir=/home/chedong/src/char_test

java.runtime.version=1.3.1_04-b02

java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment

os.arch=i386

java.io.tmpdir=/tmp

line.separator=



java.vm.specification.vendor=Sun Microsystems Inc.

java.awt.fonts=

os.name=Linux

java.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386:/u...

java.specification.name=Java Platform API Specification

java.class.version=47.0

os.version=2.4.7-10

user.home=/home/chedong

user.timezone=Asia/Shanghai

java.awt.printerjob=sun.awt.motif.PSPrinterJob

file.encoding=ISO-8859-1

java.specification.version=1.3

user.name=chedong

java.class.path=/home/chedong/classes

java.vm.specification.version=1.0

java.home=/usr/java/jdk1.3.1_04/jre

user.language=en

java.specification.vendor=Sun Microsystems Inc.

java.vm.info=mixed mode

java.version=1.3.1_04

java.ext.dirs=/usr/java/jdk1.3.1_04/jre/lib/ext

sun.boot.class.path=/usr/java/jdk1.3.1_04/jre/lib/rt.jar:...

java.vendor=Sun Microsystems Inc.

file.separator=/

java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport...

sun.cpu.endian=little

sun.io.unicode.encoding=UnicodeLittle

user.region=US

sun.cpu.isalist=



Hello, it's: Tue Jul 30 11:07:34 CST 2002

======System available locales:======== 

en 英文

en_US 英文 (美国)

ar 阿拉伯文

ar_AE 阿拉伯文 (阿拉伯联合酋长国)

ar_BH 阿拉伯文 (巴林)

ar_DZ 阿拉伯文 (阿尔及利亚)

ar_EG 阿拉伯文 (埃及)

ar_IQ 阿拉伯文 (伊拉克)

ar_JO 阿拉伯文 (约旦)

ar_KW 阿拉伯文 (科威特)

ar_LB 阿拉伯文 (黎巴嫩)

ar_LY 阿拉伯文 (利比亚)

ar_MA 阿拉伯文 (摩洛哥)

ar_OM 阿拉伯文 (阿曼)

ar_QA 阿拉伯文 (卡塔尔)

ar_SA 阿拉伯文 (沙特阿拉伯)

ar_SD 阿拉伯文 (苏丹)

ar_SY 阿拉伯文 (叙利亚)

ar_TN 阿拉伯文 (突尼斯)

ar_YE 阿拉伯文 (也门)

be 白俄罗斯文

be_BY 白俄罗斯文 (白俄罗斯)

bg 保加利亚文

bg_BG 保加利亚文 (保加利亚)

ca 加泰罗尼亚文

ca_ES 加泰罗尼亚文 (西班牙)

ca_ES_EURO 加泰罗尼亚文 (西班牙,Euro)

cs 捷克文

cs_CZ 捷克文 (捷克共和国)

da 丹麦文

da_DK 丹麦文 (丹麦)

de 德文

de_AT 德文 (奥地利)

de_AT_EURO 德文 (奥地利,Euro)

de_CH 德文 (瑞士)

de_DE 德文 (德国)

de_DE_EURO 德文 (德国,Euro)

de_LU 德文 (卢森堡)

de_LU_EURO 德文 (卢森堡,Euro)

el 希腊文

el_GR 希腊文 (希腊)

en_AU 英文 (澳大利亚)

en_CA 英文 (加拿大)

en_GB 英文 (英国)

en_IE 英文 (爱尔兰)

en_IE_EURO 英文 (爱尔兰,Euro)

en_NZ 英文 (新西兰)

en_ZA 英文 (南非)

es 西班牙文

es_BO 西班牙文 (玻利维亚)

es_AR 西班牙文 (阿根廷)

es_CL 西班牙文 (智利)

es_CO 西班牙文 (哥伦比亚)

es_CR 西班牙文 (哥斯达黎加)

es_DO 西班牙文 (多米尼加共和国)

es_EC 西班牙文 (厄瓜多尔)

es_ES 西班牙文 (西班牙)

es_ES_EURO 西班牙文 (西班牙,Euro)

es_GT 西班牙文 (危地马拉)

es_HN 西班牙文 (洪都拉斯)

es_MX 西班牙文 (墨西哥)

es_NI 西班牙文 (尼加拉瓜)

et 爱沙尼亚文

es_PA 西班牙文 (巴拿马)

es_PE 西班牙文 (秘鲁)

es_PR 西班牙文 (波多黎哥)

es_PY 西班牙文 (巴拉圭)

es_SV 西班牙文 (萨尔瓦多)

es_UY 西班牙文 (乌拉圭)

es_VE 西班牙文 (委内瑞拉)

et_EE 爱沙尼亚文 (爱沙尼亚)

fi 芬兰文

fi_FI 芬兰文 (芬兰)

fi_FI_EURO 芬兰文 (芬兰,Euro)

fr 法文

fr_BE 法文 (比利时)

fr_BE_EURO 法文 (比利时,Euro)

fr_CA 法文 (加拿大)

fr_CH 法文 (瑞士)

fr_FR 法文 (法国)

fr_FR_EURO 法文 (法国,Euro)

fr_LU 法文 (卢森堡)

fr_LU_EURO 法文 (卢森堡,Euro)

hr 克罗地亚文

hr_HR 克罗地亚文 (克罗地亚)

hu 匈牙利文

hu_HU 匈牙利文 (匈牙利)

is 冰岛文

is_IS 冰岛文 (冰岛)

it 意大利文

it_CH 意大利文 (瑞士)

it_IT 意大利文 (意大利)

it_IT_EURO 意大利文 (意大利,Euro)

iw 希伯来文

iw_IL 希伯来文 (以色列)

ja 日文

ja_JP 日文 (日本)

ko 朝鲜文

ko_KR 朝鲜文 (南朝鲜)

lt 立陶宛文

lt_LT 立陶宛文 (立陶宛)

lv 拉托维亚文(列托)

lv_LV 拉托维亚文(列托) (拉脱维亚)

mk 马其顿文

mk_MK 马其顿文 (马其顿王国)

nl 荷兰文

nl_BE 荷兰文 (比利时)

nl_BE_EURO 荷兰文 (比利时,Euro)

nl_NL 荷兰文 (荷兰)

nl_NL_EURO 荷兰文 (荷兰,Euro)

no 挪威文

no_NO 挪威文 (挪威)

no_NO_NY 挪威文 (挪威,Nynorsk)

pl 波兰文

pl_PL 波兰文 (波兰)

pt 葡萄牙文

pt_BR 葡萄牙文 (巴西)

pt_PT 葡萄牙文 (葡萄牙)

pt_PT_EURO 葡萄牙文 (葡萄牙,Euro)

ro 罗马尼亚文

ro_RO 罗马尼亚文 (罗马尼亚)

ru 俄文

ru_RU 俄文 (俄罗斯)

sh 塞波尼斯-克罗地亚文

sh_YU 塞波尼斯-克罗地亚文 (南斯拉夫)

sk 斯洛伐克文

sk_SK 斯洛伐克文 (斯洛伐克)

sl 斯洛文尼亚文

sl_SI 斯洛文尼亚文 (斯洛文尼亚)

sq 阿尔巴尼亚文

sq_AL 阿尔巴尼亚文 (阿尔巴尼亚)

sr 塞尔维亚文

sr_YU 塞尔维亚文 (南斯拉夫)

sv 瑞典文

sv_SE 瑞典文 (瑞典)

th 泰文

th_TH 泰文 (泰国)

tr 土耳其文

tr_TR 土耳其文 (土耳其)

uk 乌克兰文

uk_UA 乌克兰文 (乌克兰)

zh 中文

zh_CN 中文 (中国)

zh_HK 中文 (香港)

zh_TW 中文 (台湾)

======System property======== 

-- listing properties --

java.runtime.name=Java(TM) 2 Runtime Environment, Stand...

sun.boot.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386

java.vm.version=1.3.1_04-b02

java.vm.vendor=Sun Microsystems Inc.

java.vendor.url=

path.separator=:

java.vm.name=Java HotSpot(TM) Client VM

file.encoding.pkg=sun.io

java.vm.specification.name=Java Virtual Machine Specification

user.dir=/home/chedong/src/char_test

java.runtime.version=1.3.1_04-b02

java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment

os.arch=i386

java.io.tmpdir=/tmp

line.separator=



java.vm.specification.vendor=Sun Microsystems Inc.

java.awt.fonts=

os.name=Linux

java.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386:/u...

java.specification.name=Java Platform API Specification

java.class.version=47.0

os.version=2.4.7-10

user.home=/home/chedong

user.timezone=Asia/Shanghai

java.awt.printerjob=sun.awt.motif.PSPrinterJob

file.encoding=GBK

java.specification.version=1.3

user.name=chedong

java.class.path=/home/chedong/classes

java.vm.specification.version=1.0

java.home=/usr/java/jdk1.3.1_04/jre

user.language=zh

java.specification.vendor=Sun Microsystems Inc.

java.vm.info=mixed mode

java.version=1.3.1_04

java.ext.dirs=/usr/java/jdk1.3.1_04/jre/lib/ext

sun.boot.class.path=/usr/java/jdk1.3.1_04/jre/lib/rt.jar:...

java.vendor=Sun Microsystems Inc.

file.separator=/

java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport...

sun.cpu.endian=little

sun.io.unicode.encoding=UnicodeLittle

user.region=CN

sun.cpu.isalist=



Hello, it's: Tue Jul 30 11:49:36 CST 2002

======System available locales:======== 

en English

en_US English (United States)

ar Arabic

ar_AE Arabic (United Arab Emirates)

ar_BH Arabic (Bahrain)

ar_DZ Arabic (Algeria)

ar_EG Arabic (Egypt)

ar_IQ Arabic (Iraq)

ar_JO Arabic (Jordan)

ar_KW Arabic (Kuwait)

ar_LB Arabic (Lebanon)

ar_LY Arabic (Libya)

ar_MA Arabic (Morocco)

ar_OM Arabic (Oman)

ar_QA Arabic (Qatar)

ar_SA Arabic (Saudi Arabia)

ar_SD Arabic (Sudan)

ar_SY Arabic (Syria)

ar_TN Arabic (Tunisia)

ar_YE Arabic (Yemen)

be Byelorussian

be_BY Byelorussian (Belarus)

bg Bulgarian

bg_BG Bulgarian (Bulgaria)

ca Catalan

ca_ES Catalan (Spain)

ca_ES_EURO Catalan (Spain,Euro)

cs Czech

cs_CZ Czech (Czech Republic)

da Danish

da_DK Danish (Denmark)

de German

de_AT German (Austria)

de_AT_EURO German (Austria,Euro)

de_CH German (Switzerland)

de_DE German (Germany)

de_DE_EURO German (Germany,Euro)

de_LU German (Luxembourg)

de_LU_EURO German (Luxembourg,Euro)

el Greek

el_GR Greek (Greece)

en_AU English (Australia)

en_CA English (Canada)

en_GB English (United Kingdom)

en_IE English (Ireland)

en_IE_EURO English (Ireland,Euro)

en_NZ English (New Zealand)

en_ZA English (South Africa)

es Spanish

es_AR Spanish (Argentina)

es_BO Spanish (Bolivia)

es_CL Spanish (Chile)

es_CO Spanish (Colombia)

es_CR Spanish (Costa Rica)

es_DO Spanish (Dominican Republic)

es_EC Spanish (Ecuador)

es_ES Spanish (Spain)

es_ES_EURO Spanish (Spain,Euro)

es_GT Spanish (Guatemala)

es_HN Spanish (Honduras)

es_MX Spanish (Mexico)

es_NI Spanish (Nicaragua)

es_PA Spanish (Panama)

es_PE Spanish (Peru)

es_PR Spanish (Puerto Rico)

es_PY Spanish (Paraguay)

es_SV Spanish (El Salvador)

es_UY Spanish (Uruguay)

es_VE Spanish (Venezuela)

et Estonian

et_EE Estonian (Estonia)

fi Finnish

fi_FI Finnish (Finland)

fi_FI_EURO Finnish (Finland,Euro)

fr French

fr_BE French (Belgium)

fr_BE_EURO French (Belgium,Euro)

fr_CA French (Canada)

fr_CH French (Switzerland)

fr_FR French (France)

fr_FR_EURO French (France,Euro)

fr_LU French (Luxembourg)

fr_LU_EURO French (Luxembourg,Euro)

hr Croatian

hr_HR Croatian (Croatia)

hu Hungarian

hu_HU Hungarian (Hungary)

is Icelandic

is_IS Icelandic (Iceland)

it Italian

it_CH Italian (Switzerland)

it_IT Italian (Italy)

it_IT_EURO Italian (Italy,Euro)

iw Hebrew

iw_IL Hebrew (Israel)

ja Japanese

ja_JP Japanese (Japan)

ko 韩文

ko_KR 韩文 (大韩民国)

lt Lithuanian

lt_LT Lithuanian (Lithuania)

lv Latvian (Lettish)

lv_LV Latvian (Lettish) (Latvia)

mk Macedonian

mk_MK Macedonian (Macedonia)

nl Dutch

nl_BE Dutch (Belgium)

nl_BE_EURO Dutch (Belgium,Euro)

nl_NL Dutch (Netherlands)

nl_NL_EURO Dutch (Netherlands,Euro)

no Norwegian

no_NO Norwegian (Norway)

no_NO_NY Norwegian (Norway,Nynorsk)

pl Polish

pl_PL Polish (Poland)

pt Portuguese

pt_BR Portuguese (Brazil)

pt_PT Portuguese (Portugal)

pt_PT_EURO Portuguese (Portugal,Euro)

ro Romanian

ro_RO Romanian (Romania)

ru Russian

ru_RU Russian (Russia)

sh Serbo-Croatian

sh_YU Serbo-Croatian (Yugoslavia)

sk Slovak

sk_SK Slovak (Slovakia)

sl Slovenian

sl_SI Slovenian (Slovenia)

sq Albanian

sq_AL Albanian (Albania)

sr Serbian

sr_YU Serbian (Yugoslavia)

sv Swedish

sv_SE Swedish (Sweden)

th Thai

th_TH Thai (Thailand)

tr Turkish

tr_TR Turkish (Turkey)

uk Ukrainian

uk_UA Ukrainian (Ukraine)

zh 中文

zh_CN 中文 (中华人民共和国)

zh_HK 中文 (香港)

zh_TW 中文 (台湾)

======System property======== 

-- listing properties --

java.runtime.name=Java(TM) 2 Runtime Environment, Stand...

sun.boot.library.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

java.vm.version=1.3.0_02

java.vm.vendor=Sun Microsystems Inc.

java.vendor.url=

path.separator=;

java.vm.name=Java HotSpot(TM) Client VM

file.encoding.pkg=sun.io

java.vm.specification.name=Java Virtual Machine Specification

user.dir=D:\java\src\char_test

java.runtime.version=1.3.0_02

java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment

os.arch=x86

java.io.tmpdir=D:\TEMP\

line.separator=



java.vm.specification.vendor=Sun Microsystems Inc.

java.awt.fonts=

os.name=Windows 98

java.library.path=C:\WINDOWS;.;C:\WINDOWS\SYSTEM;C:\WIN...

java.specification.name=Java Platform API Specification

java.class.version=47.0

os.version=4.90

user.home=C:\WINDOWS

user.timezone=Asia/Shanghai

java.awt.printerjob=sun.awt.windows.WPrinterJob

file.encoding=GBK

java.specification.version=1.3

user.name=Sicci

java.class.path=d:\java\classes

java.vm.specification.version=1.0

java.home=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_02

user.language=zh

java.specification.vendor=Sun Microsystems Inc.

awt.toolkit=sun.awt.windows.WToolkit

java.vm.info=mixed mode

java.version=1.3.0_02

java.ext.dirs=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

sun.boot.class.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

java.vendor=Sun Microsystems Inc.

file.separator=\

java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport...

sun.cpu.endian=little

sun.io.unicode.encoding=UnicodeLittle

user.region=CN

sun.cpu.isalist=pentium i486 i386



Hello, it's: Tue Jul 30 11:53:27 CST 2002

======System available locales:======== 

en English

en_US English (United States)

ar Arabic

ar_AE Arabic (United Arab Emirates)

ar_BH Arabic (Bahrain)

ar_DZ Arabic (Algeria)

ar_EG Arabic (Egypt)

ar_IQ Arabic (Iraq)

ar_JO Arabic (Jordan)

ar_KW Arabic (Kuwait)

ar_LB Arabic (Lebanon)

ar_LY Arabic (Libya)

ar_MA Arabic (Morocco)

ar_OM Arabic (Oman)

ar_QA Arabic (Qatar)

ar_SA Arabic (Saudi Arabia)

ar_SD Arabic (Sudan)

ar_SY Arabic (Syria)

ar_TN Arabic (Tunisia)

ar_YE Arabic (Yemen)

be Byelorussian

be_BY Byelorussian (Belarus)

bg Bulgarian

bg_BG Bulgarian (Bulgaria)

ca Catalan

ca_ES Catalan (Spain)

ca_ES_EURO Catalan (Spain,Euro)

cs Czech

cs_CZ Czech (Czech Republic)

da Danish

da_DK Danish (Denmark)

de German

de_AT German (Austria)

de_AT_EURO German (Austria,Euro)

de_CH German (Switzerland)

de_DE German (Germany)

de_DE_EURO German (Germany,Euro)

de_LU German (Luxembourg)

de_LU_EURO German (Luxembourg,Euro)

el Greek

el_GR Greek (Greece)

en_AU English (Australia)

en_CA English (Canada)

en_GB English (United Kingdom)

en_IE English (Ireland)

en_IE_EURO English (Ireland,Euro)

en_NZ English (New Zealand)

en_ZA English (South Africa)

es Spanish

es_AR Spanish (Argentina)

es_BO Spanish (Bolivia)

es_CL Spanish (Chile)

es_CO Spanish (Colombia)

es_CR Spanish (Costa Rica)

es_DO Spanish (Dominican Republic)

es_EC Spanish (Ecuador)

es_ES Spanish (Spain)

es_ES_EURO Spanish (Spain,Euro)

es_GT Spanish (Guatemala)

es_HN Spanish (Honduras)

es_MX Spanish (Mexico)

es_NI Spanish (Nicaragua)

es_PA Spanish (Panama)

es_PE Spanish (Peru)

es_PR Spanish (Puerto Rico)

es_PY Spanish (Paraguay)

es_SV Spanish (El Salvador)

es_UY Spanish (Uruguay)

es_VE Spanish (Venezuela)

et Estonian

et_EE Estonian (Estonia)

fi Finnish

fi_FI Finnish (Finland)

fi_FI_EURO Finnish (Finland,Euro)

fr French

fr_BE French (Belgium)

fr_BE_EURO French (Belgium,Euro)

fr_CA French (Canada)

fr_CH French (Switzerland)

fr_FR French (France)

fr_FR_EURO French (France,Euro)

fr_LU French (Luxembourg)

fr_LU_EURO French (Luxembourg,Euro)

hr Croatian

hr_HR Croatian (Croatia)

hu Hungarian

hu_HU Hungarian (Hungary)

is Icelandic

is_IS Icelandic (Iceland)

it Italian

it_CH Italian (Switzerland)

it_IT Italian (Italy)

it_IT_EURO Italian (Italy,Euro)

iw Hebrew

iw_IL Hebrew (Israel)

ja Japanese

ja_JP Japanese (Japan)

ko Korean

ko_KR Korean (South Korea)

lt Lithuanian

lt_LT Lithuanian (Lithuania)

lv Latvian (Lettish)

lv_LV Latvian (Lettish) (Latvia)

mk Macedonian

mk_MK Macedonian (Macedonia)

nl Dutch

nl_BE Dutch (Belgium)

nl_BE_EURO Dutch (Belgium,Euro)

nl_NL Dutch (Netherlands)

nl_NL_EURO Dutch (Netherlands,Euro)

no Norwegian

no_NO Norwegian (Norway)

no_NO_NY Norwegian (Norway,Nynorsk)

pl Polish

pl_PL Polish (Poland)

pt Portuguese

pt_BR Portuguese (Brazil)

pt_PT Portuguese (Portugal)

pt_PT_EURO Portuguese (Portugal,Euro)

ro Romanian

ro_RO Romanian (Romania)

ru Russian

ru_RU Russian (Russia)

sh Serbo-Croatian

sh_YU Serbo-Croatian (Yugoslavia)

sk Slovak

sk_SK Slovak (Slovakia)

sl Slovenian

sl_SI Slovenian (Slovenia)

sq Albanian

sq_AL Albanian (Albania)

sr Serbian

sr_YU Serbian (Yugoslavia)

sv Swedish

sv_SE Swedish (Sweden)

th Thai

th_TH Thai (Thailand)

tr Turkish

tr_TR Turkish (Turkey)

uk Ukrainian

uk_UA Ukrainian (Ukraine)

zh Chinese

zh_CN Chinese (China)

zh_HK Chinese (Hong Kong)

zh_TW Chinese (Taiwan)

======System property======== 

-- listing properties --

java.runtime.name=Java(TM) 2 Runtime Environment, Stand...

sun.boot.library.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

java.vm.version=1.3.0_02

java.vm.vendor=Sun Microsystems Inc.

java.vendor.url=

path.separator=;

java.vm.name=Java HotSpot(TM) Client VM

file.encoding.pkg=sun.io

java.vm.specification.name=Java Virtual Machine Specification

user.dir=D:\java\src\char_test

java.runtime.version=1.3.0_02

java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment

os.arch=x86

java.io.tmpdir=D:\TEMP\

line.separator=



java.vm.specification.vendor=Sun Microsystems Inc.

java.awt.fonts=

os.name=Windows 98

java.library.path=C:\WINDOWS;.;C:\WINDOWS\SYSTEM;C:\WIN...

java.specification.name=Java Platform API Specification

java.class.version=47.0

os.version=4.90

user.home=C:\WINDOWS

user.timezone=Asia/Shanghai

java.awt.printerjob=sun.awt.windows.WPrinterJob

file.encoding=Cp1252

java.specification.version=1.3

user.name=Sicci

java.class.path=d:\java\classes

java.vm.specification.version=1.0

java.home=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_02

user.language=en

java.specification.vendor=Sun Microsystems Inc.

awt.toolkit=sun.awt.windows.WToolkit

java.vm.info=mixed mode

java.version=1.3.0_02

java.ext.dirs=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

sun.boot.class.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

java.vendor=Sun Microsystems Inc.

file.separator=\

java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport...

sun.cpu.endian=little

sun.io.unicode.encoding=UnicodeLittle

user.region=GB

sun.cpu.isalist=pentium i486 i386





结论:


JVM的缺省编码方式由系统的LOCALE设置确定,所以当设置成相同的LOCALE时,Linux和Windows下的缺省编码方式是没有区别的(可以认为cp1252=ISO-8859-1都是一样的西文编码方式,只包含255以下的拉丁字符),因此测试2我只列出了LINUX下LOCALE分别设置成zh_CN和en_US测试结果输出和在WINDOWS下分别按照不同的区域设置试验的输出结果是一样的。


测试程序-2

==========


通过HelloUnicode.java程序,演示说明"Hello
world 世界你好"这个字符串(16个字符)在不同缺省系统编码方式下的处理效果。在编码解码的每个步骤之后,都打印出了相应字符串每个字符(charactor)的byte值,short值和所在的UNICODE区间。




Linux(J2SE1.3.1)
LANG=en_US LC_ALL=en_US
Linux(J2SE1.3.1)
LANG=zh_CN LC_ALL=zh_CN.GBK



====write hello world to files======

[test 1-1]: with system default encoding=ISO-8859-1

string=Hello world 世界你好 length=20

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='? byte=-54 short=202 LATIN_1_SUPPLEMENT

char[13]='? byte=-64 short=192 LATIN_1_SUPPLEMENT

char[14]='? byte=-67 short=189 LATIN_1_SUPPLEMENT

char[15]='? byte=-25 short=231 LATIN_1_SUPPLEMENT

char[16]='? byte=-60 short=196 LATIN_1_SUPPLEMENT

char[17]='? byte=-29 short=227 LATIN_1_SUPPLEMENT

char[18]='? byte=-70 short=186 LATIN_1_SUPPLEMENT

char[19]='? byte=-61 short=195 LATIN_1_SUPPLEMENT



第1步:在英文编码环境下,虽然屏幕上正确的显示了中文,但实际上它打印的是“半个”汉字,将结果写入第1个文件
hello.orig.html

[test 1-2]: getBytes with platform default encoding and decoding as gb2312:

string=Hello world ???? length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='?' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='?' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='?' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='?' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS



按系统缺省编码重新变成字节流,然后按照GB2312方式解码,这里虽然打印出的是问号(因为在相应环境下系统对于255以上的字符全部用?显示),但从相应的UNICODE
MAPPING和SHORT值我们可以知道字符是正确的中文

但下一步的写入第2个文件html.gb2312.html,没有指定编码方式(按系统缺省的ISO-8859-1编码方式),因此从后面的测试2-2读取的结果是真的'?'了


[test 1-3]: convert string to UTF8

string=Hello world 涓栫晫浣犲ソ length=24

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='? byte=-28 short=228 LATIN_1_SUPPLEMENT

char[13]='? byte=-72 short=184 LATIN_1_SUPPLEMENT

char[14]='? byte=-106 short=150 LATIN_1_SUPPLEMENT

char[15]='? byte=-25 short=231 LATIN_1_SUPPLEMENT

char[16]='? byte=-107 short=149 LATIN_1_SUPPLEMENT

char[17]='? byte=-116 short=140 LATIN_1_SUPPLEMENT

char[18]='? byte=-28 short=228 LATIN_1_SUPPLEMENT

char[19]='? byte=-67 short=189 LATIN_1_SUPPLEMENT

char[20]='? byte=-96 short=160 LATIN_1_SUPPLEMENT

char[21]='? byte=-27 short=229 LATIN_1_SUPPLEMENT

char[22]='? byte=-91 short=165 LATIN_1_SUPPLEMENT

char[23]='? byte=-67 short=189 LATIN_1_SUPPLEMENT



第3个试验,将字符流按照UTF8方式编码后,写入第3个测试文件hello.utf8.html,我们可以看到UTF8对英文没有影响,但对于其他文字使用了3字节编码方式,因此比GB2312编码方式的存储要大50%,


====reading and decoding from files======

[test 2-1]: read hello.orig.html: decoding with system default encoding

string=Hello world 世界你好 length=20

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='? byte=-54 short=202 LATIN_1_SUPPLEMENT

char[13]='? byte=-64 short=192 LATIN_1_SUPPLEMENT

char[14]='? byte=-67 short=189 LATIN_1_SUPPLEMENT

char[15]='? byte=-25 short=231 LATIN_1_SUPPLEMENT

char[16]='? byte=-60 short=196 LATIN_1_SUPPLEMENT

char[17]='? byte=-29 short=227 LATIN_1_SUPPLEMENT

char[18]='? byte=-70 short=186 LATIN_1_SUPPLEMENT

char[19]='? byte=-61 short=195 LATIN_1_SUPPLEMENT



按系统从中间存储hello.orig.html文件中读取相应文件,虽然是半个字读取的,但由于能完整的还原,因此输出显示没有错误。

其实PHP等应用很少出现字符集问题其实就是这个原因,全程都是按字节流方式处理,很好的还原了输入,但同时也失去了对字符的控制


[test 2-2]: read hello.gb2312.html: decoding as GB2312

string=Hello world ???? length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='?' byte=63 short=63 BASIC_LATIN

char[13]='?' byte=63 short=63 BASIC_LATIN

char[14]='?' byte=63 short=63 BASIC_LATIN

char[15]='?' byte=63 short=63 BASIC_LATIN


这个'?'真的是问号char(63)了,很多数据就是这样没救了,



[test 2-3]: read hello.utf8.html: decoding as UTF8

string=Hello world ???? length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='?' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='?' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='?' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='?' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS



great!
字符虽然显示为'?',但实际上字符的解码是正确的,从相应的UNICODE
MAPPING就可以看的出来。



====write hello world to files======

[test 1-1]: with system default encoding=GBK

string=Hello world 世界你好 length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS

注意:在一个新的LOCALE下需要将源程序重新编译,最早的字节流到字符流的解码过程从JAVAC就开始了



[test 1-2]: getBytes with platform default encoding and decoding as gb2312:

string=Hello world 世界你好 length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS



在中文环境下,和上面缺省的编码解码结果是一致的


[test 1-3]: convert string to UTF8

string=Hello world 涓栫晫浣犲ソ length=18

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='涓' byte=-109 short=28051 CJK_UNIFIED_IDEOGRAPHS

char[13]='栫' byte=43 short=26667 CJK_UNIFIED_IDEOGRAPHS

char[14]='晫' byte=107 short=26219 CJK_UNIFIED_IDEOGRAPHS

char[15]='浣' byte=99 short=28003 CJK_UNIFIED_IDEOGRAPHS

char[16]='犲' byte=-78 short=29362 CJK_UNIFIED_IDEOGRAPHS

char[17]='ソ' byte=-67 short=12477 KATAKANA





====reading and decoding from files======

[test 2-1]: read hello.orig.html: decoding with system default encoding

string=Hello world 世界你好 length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS





[test 2-2]: read hello.gb2312.html: decoding as GB2312

string=Hello world 世界你好 length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS





[test 2-3]: read hello.utf8.html: decoding as UTF8

string=Hello world 世界你好 length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS



UNICODE方式的存储几乎可以不受环境字符集设置的影响



 

试验2的一些结论:



  • 所有的应用都是按照字节流=>字符流=>字节流方式进行的处理的:

    byte_stream

    赞助本站

  • 相关内容
    AiLab云推荐
    推荐内容
    展开

    热门栏目HotCates

    Copyright © 2010-2024 AiLab Team. 人工智能实验室 版权所有    关于我们 | 联系我们 | 广告服务 | 公司动态 | 免责声明 | 隐私条款 | 工作机会 | 展会港