如何过滤字符串串列？-编程知识-白鹭情

我有一个包含非英语/英语单词的字符串串列。我只想过滤掉英文单词。

例子：


phrases = [
    "S/O ???? ?????, ????? ?.-4??, S/O Ashok Kumar, Block no.-4D.",
    "???????-15, ????? 5. ????? ????? Street-15, sector -5, Civic Centre",
    "?????, ?????, ?????, ?????????, Bhilai, Durg. Bhilai, Chhattisgarh,",
]

到目前为止我的代码：

import re
regex = re.compile("[^a-zA-Z0-9!@#$&()\\-`. ,/\"] ")
for i in phrases:
    print(regex.sub(' ', i))

我的输出：

["S/O , .-4 , S/O Ashok Kumar, Block no.-4D.",
  "-15, 5. Street-15, sector -5, Civic Centre",
  ", , , , Bhilai, Durg. Bhilai, Chhattisgarh",]

我的愿望输出

["S/O Ashok Kumar, Block no.-4D.",
 "Street-15, sector -5, Civic Centre",
 "Bhilai, Durg. Bhilai, Chhattisgarh,"]

uj5u.com热心网友回复：

如果我查看您的资料，您似乎可以使用以下内容：

import regex as re
lst=["S/O ???? ?????, ????? ?.-4??, S/O Ashok Kumar, Block no.-4D.",
      "???????-15, ????? 5. ????? ????? Street-15, sector -5, Civic Centre",
      "?????, ?????, ?????, ?????????, Bhilai, Durg. Bhilai, Chhattisgarh,",]
for i in lst:
    print(re.sub(r'^.*\p{Devanagari}. ?\b', '', i))

印刷：

S/O Ashok Kumar, Block no.-4D.
Street-15, sector -5, Civic Centre
Bhilai, Durg. Bhilai, Chhattisgarh,

查看在线正则表达式演示

^ - 开始字符串锚；
.*\p{Devanagari} - 0 （贪婪）字符直到最后一个梵文字母；
. ?\b - 1 （懒惰）字符直到第一个字边界

uj5u.com热心网友回复：

如果您的意思是您的字符可能只是标准英文字母，而您的正则表达式适用于此，而您只想过滤掉有问题的“, , , ,”值，您可以执行以下操作：

def format_output(current_output):
    results = []
    for row in current_output:
        # split on the ","
        sub_elements = row.split(",").
        # this will leave the empty ones as "" in the list which can be filtered
        filtered = list(filter(key=lambda x: len(x) > 0, sub_elements))
        # then join the elements togheter and append to the final results array
        results.append(",".join(filtered))

uj5u.com热心网友回复：

在我看来，串列中每个元素的第一部分是第二部分的印地语翻译，单词数量之间存在一一对应关系。

因此，对于您提供的示例以及任何遵循完全相同模式的示例（如果不这样做，它将中断），您所要做的就是获取阵列每个元素的第二部分。

phrases = ["S/O ???? ?????, ????? ?.-4??, S/O Ashok Kumar, Block no.-4D.",
  "???????-15, ????? 5. ????? ????? Street-15, sector -5, Civic Centre",
  "?????, ?????, ?????, ?????????, Bhilai, Durg. Bhilai, Chhattisgarh,",]


mod_list = []
for s in list:
    tmp_list = []
    strg = s.split()
    n = len(strg)
    for i in range(int(n/2),n):
        tmp_list.append(strg[i])
    tmp_list = ' '.join(tmp_list)
    mod_list.append(tmp_list)

print(mod_list)

输出：

['S/O Ashok Kumar, Block no.-4D.', 
'Street-15, sector -5, Civic Centre', 
'Bhilai, Durg. Bhilai, Chhattisgarh,']

如何过滤字符串串列？

0 评论

发表评论

最新文章

斥350亿美元建新航厦，迪拜将打造世界最大机场

Windows系统安装最详细教程，基于U盘方式

分手后仍难以与前任断绝联系的三大星座，纠缠不清的情感纠葛！

优秀的女人，必须坚持的11个生活习惯！

此刻，像宋人一样热爱生活！

唐诗中描写爱情的6句诗，最深的情遇到最美的诗！

随机推荐

如何减小 PDF 文件的大小

如何为 StockX 推荐某人

如何在 Google 表格中引用另一个文件

如何在 Shopify 上删除“结账时计算运费”

如何删除 Google 文档中的所有格式

如何在 Google Drive 中删除连接的应用程序

热门分类

热门标签

如何减小 PDF 文件的大小

如何为 StockX 推荐某人

如何在 Google 表格中引用另一个文件

如何过滤字符串串列？

Blazor服务器：用于区分客户端的唯一ID

合并具有重复键的json阵列

0 评论

发表评论

最新文章

随机推荐

热门分类

热门标签