Of rdf feeds and high-bit characters
|
|
By Mysidia , Section Code [] Posted on Fri Mar 22, 2002 at 12:00:00 PM PST
|
|
So I tried to do an rdf_fetch on k5's backend, after all it's one of the defaults.. previously it worked, but then a few days ago or so, the fetch failed with a not-well-formed error message.
|
When I hit fetch I was greeted with:
Edit RDF Feeds
Error doing refetch: error parsing: not well-formed (invalid token) at line 95, column 454, byte
6707 at /usr/lib/perl5/site_perl/5.6.0/i686-linux/XML/Parser.pm
In short the xml parse failed, so I went and grabbed the rdf file by hand... and to my amazement text appeared with the high bit set.
So it seems as though scoop will pass through bit patterns that are invalid sometimes (or at least that are treated invalid by expat).. the entry containing offending characters seems to be the one found at http://www.kuro5hin.org/?op=displaystory;sid=2002/3/20/2379/74464
and is seemingly caused by the 'é' in Vélez and the the 'í' in Medellín-bound.
The web browser seems perfectly content to display those bit patterns in the ascii set (or whatever), but the XML parser is returning an error.. are these not properly being translated to character entities? or maybe they need to be filtered totally?
|
|
Story Views
|
23 Scoop users have viewed this story.
|
|