Importing XML file with invalid character 22 (U+0016)

Von Tobias Arnhold 1.15.2016
I have to import a set of XML files from time to time. Most of those XML files can be imported with out any problems. But at least one file includes a special character U+0016 which occurs randomly some where inside the file.

When I try to import that file I get this ORA- error message:
ORA-31011: XML-Parsing nicht erfolgreich
ORA-19202: Fehler bei XML-Verarbeitung
LPX-00217: Ungültiges Zeichen 22 (U+0016)
Error at line 39409 aufgetreten

What I need to do is to remove this character and re-import the XML file again.

Luckily notepad++ brings a "regular expression" search engine which makes it easy to find:


Search for: \x16
With "Search Mode" > "Extended"
Error will be displayed as: SYN

The error happened in a XML file with more then 25 MB. Without notepad++ it would be hard to find. :)

BTW:
The same solution works for those issues as well:
LPX-00217: Invalid character 3 (U+0003)
Just search for: \x03
Character will be displayed as ETX


LPX-00217: Invalid character 6 (U+0006)
Just search for: \x06
Character will be displayed as ACK

Update 08.07.2016:
I found a way to fix this character issue with some pretty SQL code:
URL: Finding and removing non ascii characters from an Oracle Varchar2

select
XMLTYPE(
REGEXP_REPLACE('... Your XML code....',
'[^[:print:]]',
'')
)
from dual

2 Comments " Importing XML file with invalid character 22 (U+0016) "

Anonymous

I have a load that regularly has invalid characters. To resolve this I have a pre-processing script that perls the file:

e.g.
for i in `ls *.xml`
do
sed -e 's/ href="file:\/\/"//g' $i > tmp_file
mv tmp_file $i
perl -pe 's/\x7E//g' $i > tmp_file
mv tmp_file $i
perl -pe 's/\x86//g' $i > tmp_file
mv tmp_file $i
perl -pe 's/\x92//g' $i > tmp_file
mv tmp_file $i
done

occasionally sqlldr will point to the invalid char:
-LPX-00217: invalid character 139 (U+008B) in which case I'll just update my script.

I was hoping this post would offer a better solution but for me it'd certainly be a step back.

Tobias Arnhold 08 July, 2016 11:19

I found a way to avoid this issue with a nice SQL trick:
select
XMLTYPE(
REGEXP_REPLACE('... Your XML code....',
'[^[:print:]]',
'')
)
from dual