Page not found: because of Cyrillic symbol

Question

asked 2018-05-28 18:27:14 +0800

updated 2018-05-28 21:18:01 +0800

Hello everybody! I got a legacy project (2016 year) with zul-pages. The problem is some pages issue "Page not found" with com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence (...stack trace follows). I found out that problem arises only on pages with "И" symbol (Cyrillic Capital Letter I, unicode code is U+0418). So, if I replace "И" with & # x0418; (without spaces) on .zul page - it works. It looks strange because all other Cyrillic symbols are working without any problems. Zk version is 7.0.3, Tomcat version 8.0.27, OS Windows 10 1803 build 17134.48 with russian locale. Every zul-page has <?xml version="1.0" encoding="UTF-8"?> as a first line. Is it a ZK framework bug?

Accepted Answer

answered 2018-07-02 15:41:09 +0800

everdelightedone
111 ● 4

updated 2018-07-02 18:24:55 +0800

your recent stack looks unrelated to index.zul: WARNING: Unable to load ...\central-admin-2.4.8-SNAPSHOT\zk\report\eventLog.zul Yes, my bad, I just replaced problem zul's contents with index.zul's contents.

I finally found the solution to that problem. Problem arised only on Windows machines because of system encoding cp1251. Building the same project under Linux was successful. Solution is add -Dfile.encoding=UTF-8 property to MAVEN_OPTS. Exact solution for Netbeans users is: Right Click on project -> Properties -> Actions -> Clean And Build Project (or Build Project, or both) -> Add Env.MAVEN_OPTS=-Dfile.encoding=UTF-8 to Set Properties.

Thank you, cor3000 for your unvaluable help!

that's great news, thanks for the update... I agree file encoding issues can be a nightmare
cor3000 ( 2018-07-02 16:19:19 +0800 )edit

Answer 2

answered 2018-06-07 10:50:47 +0800

cor3000
6280 ● 2 ● 7

To me it simply sounds like your file might not be encoded/saved correctly as UTF-8.

I tried it on zkfiddle and it works. Could be I tested not the exact case. So can you please add what's missing in case I forgot a detail, update the fiddle and provide the link here?

Answer 3

answered 2018-06-07 20:16:10 +0800

cor3000
6280 ● 2 ● 7

updated 2018-06-07 20:19:21 +0800

I found this post in another forum which also has a problem around the "И" character:

https://www.java-forums.org/new-java/51981-problem-encoding-russian-text-between-utf-8-unicode.html

Could it be that your default OS encoding is other than UTF-8? You might have to run the Java process specifically with UTF-8?

According to this you can run your application server with the JVM parameter ...

-Dfile.encoding=UTF-8

... to set it explicitely.

Answer 4

answered 2018-06-29 10:53:03 +0800

cor3000
6280 ● 2 ● 7

updated 2018-06-29 10:53:24 +0800

took me a while to revisit this one... in your first post you mentioned:

... "Page not found" with com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence (...stack trace follows) ...

Can you provide the full stack trace? (somehow I forgot to ask right away ... I thought it will "follow" later) I'd like to see which method(s) inside ZK lead to this behavior. There are multiple starting points for ZK to load/read a zul file (initial page, include, macro, Executions.createCompontentsDirectly, template ) ... to do some deeper analysis this stacktrace might help. So in case you can still reproduce the error please provide the stack trace information.

Also the JAVA version might be related (just to be sure).

Answer 5

answered 2018-06-29 11:44:06 +0800

cor3000
6280 ● 2 ● 7

updated 2018-06-29 11:45:36 +0800

The specific problem with the "И" character happens when deliberately using cp1251 when converting the bytes or setting the default it via -Dfile.encoding=cp1251:

String s = "";
for (char ch = 0x0410; ch <= 0x044F; ch++)
    s += ch;

String s2 = s;

System.out.println(Charset.defaultCharset());

System.out.println(s);
s = new String(s.getBytes("utf-8"));
System.out.println(s);
s = new String(s.getBytes(), "utf-8");
System.out.println(s);

System.out.println(s2);
s2 = new String(s2.getBytes("utf-8"), "cp1251");
System.out.println(s2);
s2 = new String(s2.getBytes("cp1251"), "utf-8");
System.out.println(s2);


windows-1251
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
РђР‘Р’Р“Р”Р•Р–Р—Р?Р™РљР›РњРќРћРџР РЎРўРЈР¤РҐР¦Р§РЁР©РЄР«Р¬РР®РЇР°Р±РІРіРґРµР¶Р·РёР№РєР»РјРЅРѕРїСЂСЃС‚СѓС„С…С†С‡С€С‰СЉС‹СЊСЌСЋСЏ
АБВГДЕЖЗ??ЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
РђР‘Р’Р“Р”Р•Р–Р—Р?Р™РљР›РњРќРћРџР РЎРўРЈР¤РҐР¦Р§РЁР©РЄР«Р¬РР®РЇР°Р±РІРіРґРµР¶Р·РёР№РєР»РјРЅРѕРїСЂСЃС‚СѓС„С…С†С‡С€С‰СЉС‹СЊСЌСЋСЏ
АБВГДЕЖЗ??ЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя

This kind of invalid byte/string conversion between cp1251 and utf-8 might explain this ... So at least this happens in this little test case.

How/where this can occur inside your or potentially any other ZK application I hope to find out with your stack trace.

I quite like those kind of mysteries so hang in.

Answer 6

answered 2018-06-29 17:34:07 +0800

cor3000
6280 ● 2 ● 7

according to your stack trace setSrc on an <include> component is called

org.zkoss.zul.Include.setSrc(Include.java:280)

So created a simple case doing exactly that when clicking a button.

http://zkfiddle.org/sample/3jvph99/2-cyrillic-chars-and-default-charset

It will include a zul file containing cyrillic characters. Please also run this in your tomcat, maybe this simple case (reducing the problem space) also breaks in your application and we'll see which character encoding is set during runtime.

Looking forward to your results.

Answer 7

answered 2018-07-02 10:07:10 +0800

cor3000
6280 ● 2 ● 7

your recent stack looks unrelated to index.zul: WARNING: Unable to load ...\central-admin-2.4.8-SNAPSHOT\zk\report\eventLog.zul

If you think it's a Netbeans problem I'd suggest to build/run the application from command line. Then you can identify or rule out Netbeans as the source.

A common command to build the war file is to execute ...

> mvn clean package

.. and the war file should be the target folder.

You also made a few vague statements:

So page has windows-1251 encoding for some reason ... I opened problem .zul with Notepad++ and it looked OK.

What does Notepad++ say about the character encoding? Can you try converting it to UTF-8?

The file encoding Notepad++ detected (it doesn't care about the XML prolog) is shown at the bottom right: image description

If that's not UTF-8 then you can convert your file to UTF-8 with the menu option "convert to utf-8":

image description

Then save the file again.

If you think it's a maven settings issue then please compare the source and target files. In any case you need to first be certain about your file encodings. No tool can 100% help you if you don't know which file encoding your text files have or should have.

Answer 8

answered 2018-07-02 16:16:30 +0800

cor3000
6280 ● 2 ● 7

instead of double quotes you can use backticks to format source code:

"Env.MAVEN_OPTS=-Dfile.encoding=UTF-8"

vs

Env.MAVEN_OPTS=-Dfile.encoding=UTF-8

Answer 9

answered 2018-06-07 19:38:41 +0800

everdelightedone
111 ● 4

updated 2018-06-07 19:42:53 +0800

Thank you for the answer! I am new to ZK and possibly don't understand something simple. The project I got have been uploaded from git repository and I run it with Netbeans IDE 8.2 under Apache Tomcat 8.0.27. I have tried to run project under different browsers (Chrome 66.0.3359.181 and Firefox 59.0.2). Maven snippet from pom.xml:

        <properties>
            <zk.spring.version>3.0</zk.spring.version>
            <zk.version>7.0.3</zk.version>
        </properties>
    <dependency>
        <groupId>org.zkoss.theme</groupId>
        <artifactId>sapphire</artifactId>
        <version>${zk.version}</version>
    </dependency>

    <dependency>
        <groupId>org.zkoss.zk</groupId>
        <artifactId>zkplus</artifactId>
        <version>${zk.version}</version>
    </dependency>

    <dependency>
        <groupId>org.zkoss.zk</groupId>
        <artifactId>zkbind</artifactId>
        <version>${zk.version}</version>
    </dependency>

    <dependency>
        <groupId>org.zkoss.zk</groupId>
        <artifactId>zhtml</artifactId>
        <version>${zk.version}</version>
    </dependency>

    <dependency>
        <groupId>org.zkoss.zk</groupId>
        <artifactId>zkspring-core</artifactId>
        <version>${zk.spring.version}</version>
    </dependency>

    <dependency>
        <groupId>org.zkoss.zk</groupId>
        <artifactId>zkspring-security</artifactId>
        <version>${zk.spring.version}</version>
    </dependency>

The first line in .zul file is

<?xml version="1.0" encoding="UTF-8"?>

I can open .zul in Notepad++ without any encoding problem with UTF-8 encoding. I also can see ploblem symbol in Netbeans editor. I don't understand how is it possible for every cyrillic symbol to work without any problem, but just one of them and only with uppercase case to fail. My .zul snippet:

        <south height="250px" splittable="true" minsize="250" collapsible="true" style="border-bottom: 0; border-left: 0; border-right: 0;"
               autoscroll="true"
               open="@load(vm.addingNewItemMode or vm.editorOpen) @bind(vm.editorOpen)">
            <groupbox id="addItemWindow" form="@id('newItem') @load(vm.selectedItem) @save(vm.selectedItem, before='save')"
                      style="border: 0" visible="@load(not empty vm.selectedItem)" width="100%" height="100%" contentStyle="border: 0; padding: 0;">
                <grid width="100%" height="100%" style="border: 0">
                    <columns>
                        <column hflex="min"/> <column/>
                    </columns>
                    <rows>
                        <row>
                            <label value="Код"/>
                            <vlayout>
                                <textbox width="" value="@bind(newItem.code) @validator(vm.codeValidator,length='30')" disabled="@load(!vm.canEdit)"  hflex="1"/>
                                <label sclass="error" value="@bind(vmsgs['code'])"/>
                            </vlayout>
                        </row>
                        <row>
                            <label value="Имя"/>
                            <vlayout>
                                <textbox value="@bind(newItem.name) @validator(vm.uniqueValidator,length='255')" disabled="@load(!vm.canEdit)" hflex="1"/>
                                <label sclass="error" value="@bind(vmsgs['name'])"/>
                            </vlayout>
                        </row>
                        <row>
                            <label value="Описание"/>
                            <vlayout>
                                <textbox value="@bind(newItem.description) @validator(vm.lengthValidator,length='255')" disabled="@load(!vm.canEdit)" hflex="1"/>
                                <label sclass="error" value="@bind(vmsgs['description'])"/>
                            </vlayout>
                        </row>
                        <row>
                            <label value="Цвет"/>
                            <vlayout>
                            <combobox id="comboColors" model="@load(c:split(configuration.getString('ui.priority.work.mode.colors'),','))" readonly="true"
                                      value="@bind(newItem.color) @converter(vm.comboBoxSelectedItemConverter) @validator(vm.uniqueValidator,length='255')"
                                      selectedItem="@load(newItem.color)"
                                      disabled="@load(!vm.canEdit)" itemRenderer="@load(vm.colorRenderer)"
                                      style="@bind((newItem.color eq null or c:trim(newItem.color) eq '') ? '' : c:cat('color: #',c:replace(c:substring(newItem.color, 1, 7), 'FFFFFF', '000000')))"/>
                            <label sclass="error" value="@bind(vmsgs['color'])"/>
                            </vlayout>
                        </row>
                    </rows>
                </grid>
            </groupbox>
        </south>

(I can't insert a full .zul's text in my post, because it contains links I'm getting error "Your karma is insufficient to publish links, please remove the link and post again".) Problem symbol is in the row <label value="Имя"/> I run it under Windows 10. I tried it to run of two different computers (both with the same hardware and OS). Opening page results "Page not found: /zk/priority/workMode.zul" error message.

BTW I increased your Karma so you can post more information from now on.
cor3000 ( 2018-06-07 20:19:50 +0800 )edit

Answer 10

answered 2018-06-08 00:31:57 +0800

everdelightedone
111 ● 4

Thank your very much for your help! I have tried to set VM Option for Tomcat as -Dfile.encoding=UTF-8 and Dfile.encoding=UTF8. I've also tried to set environmental variable JAVA_OPTS as -Dfile.encoding=UTF8. But unfortunately, it's not working in my case. The code from your first link

class Basic 
{ 
public static void main(String[] args) throws Exception 
{ 
String s = "";
for(char ch=0x0410; ch<=0x044F; ch++)
s += ch;
System.out.println(s); 
s = new String(s.getBytes("UTF-8"));
System.out.println(s); 
s = new String(s.getBytes(), "UTF-8");
System.out.println(s);
} 
}

is working good in my case and yields

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя

CORE FRAMEWORK

FEATURED COMPONENTS

TOOLS

EXPLORE ZK

RESOURCES

LEARN ZK

RESOURCES

GET HELP

RESOURCES

MORE ABOUT ZK

MORE ABOUT ZK

Page not found: because of Cyrillic symbol

13 Answers

Comments

Comments

Question tools

Stats

Related questions