Convert character entities to their unicode equivalents(将字符实体转换为其 Unicode 等效项)
问题描述
我在数据库中有 html 编码的字符串,但许多字符实体不仅仅是标准的 &
和 <
.“
和 —
等实体.不幸的是,我们需要将这些数据提供给基于 flash 的 rss 阅读器,而 flash 不会读取这些实体,但它们会读取等效的 unicode(例如 “
).
I have html encoded strings in a database, but many of the character entities are not just the standard &
and <
. Entities like “
and —
. Unfortunately we need to feed this data into a flash based rss reader and flash doesn't read these entities, but they do read the unicode equivalent (ex “
).
使用 .Net 4.0,是否有任何实用方法可以将 html 编码的字符串转换为使用 unicode 编码的字符实体?
Using .Net 4.0, is there any utility method that will convert the html encoded string to use unicode encoded character entities?
这是我需要的一个更好的例子.该数据库具有 html 字符串,例如: John &莎拉去看 $ldquo;Scream 4$rdquo;.</p>
而我需要在 rss/xml 文档中用
标签输出的是: <p>John &#38;莎拉去看了&#8220;Scream 4&#8221;.</p>
Here is a better example of what I need. The db has html strings like: <p>John & Sarah went to see $ldquo;Scream 4$rdquo;.</p>
and what I need to output in the rss/xml document with in the <description>
tag is: <p>John &#38; Sarah went to see &#8220;Scream 4&#8221;.</p>
我正在使用 XmlTextWriter 从类似于此示例代码的数据库记录创建 xml 文档 http://www.dotnettutorials.com/tutorials/advanced/rss-feed-asp-net-csharp.aspx
I'm using an XmlTextWriter to create the xml document from the database records similar to this example code http://www.dotnettutorials.com/tutorials/advanced/rss-feed-asp-net-csharp.aspx
所以我需要用他们的 unicode equivilant 替换来自 db 的 html 字符串中的所有字符实体,因为基于 flash 的 rss 阅读器无法识别任何实体,而不是最常见的实体,例如 &代码>.
So I need to replace all of the character entities within the html string from the db with their unicode equivilant because the flash based rss reader doesn't recognize any entities beyond the most common like &
.
推荐答案
我的第一个想法是,你的 RSS 阅读器能接受实际的字符吗?如果是这样,您可以使用 HtmlDecode 和提要直接进去.
My first thought is, can your RSS reader accept the actual characters? If so, you can use HtmlDecode and feed it directly in.
如果确实需要将其转换为数字表示,则可以解析出每个实体,HtmlDecode
,然后将其转换为 int
以获得基数-10 Unicode 值.然后重新插入到字符串中.
If you do need to convert it to the numeric representations, you could parse out each entity, HtmlDecode
it, and then cast it to an int
to get the base-10 unicode value. Then re-insert it into the string.
下面是一些代码来演示我的意思(未经测试,但可以理解):
Here's some code to demonstrate what I mean (it is untested, but gets the idea across):
string input = "Something with — or other character entities.";
StringBuilder output = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '&')
{
int startOfEntity = i; // just for easier reading
int endOfEntity = input.IndexOf(';', startOfEntity);
string entity = input.Substring(startOfEntity, endOfEntity - startOfEntity);
int unicodeNumber = (int)(HttpUtility.HtmlDecode(entity)[0]);
output.Append("&#" + unicodeNumber + ";");
i = endOfEntity; // continue parsing after the end of the entity
}
else
output.Append(input[i]);
}
我可能在某个地方有一个逐一错误,但应该很接近.
I may have an off-by-one error somewhere in there, but it should be close.
这篇关于将字符实体转换为其 Unicode 等效项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:将字符实体转换为其 Unicode 等效项
基础教程推荐
- MS Visual Studio .NET 的替代品 2022-01-01
- 将 Office 安装到 Windows 容器 (servercore:ltsc2019) 失败,错误代码为 17002 2022-01-01
- 有没有办法忽略 2GB 文件上传的 maxRequestLength 限制? 2022-01-01
- 将 XML 转换为通用列表 2022-01-01
- SSE 浮点算术是否可重现? 2022-01-01
- 如何激活MC67中的红灯 2022-01-01
- 为什么Flurl.Http DownloadFileAsync/Http客户端GetAsync需要 2022-09-30
- c# Math.Sqrt 实现 2022-01-01
- rabbitmq 的 REST API 2022-01-01
- 如何在 IDE 中获取 Xamarin Studio C# 输出? 2022-01-01