发出6个字符代码的正则表达式

nrerum 发布于 2019-03-09 python 最后更新 2019-03-09 14:38 0 浏览

我有以下格式的数据文件:

   1 AA/BB                  0C89JG
   2 ABANO/ANA VICTORIA     F12LFJ
   3 ABBOUDLASTNAME/ABBOUDF DWPTHC
   4 ABDALLAH/SIJAM         H0ZDM9
   5 ABDEL MESSIH/DINA      T0SF8N
   6 ABHISHEK/PRAMANIK      7SLKXV
   7 ABHYANKAR/DHANANJAY    7SM0BV
   8 ABOUSALAMA/FEMKE       LTTRQC
   9 ABRAMOVA/NATALIA       77LCPZ
  10 ABRANTES/JOAO          KXZC7Q
  11 ABRATH/LUC             D5J99J
  12 ABREO/HECTOR           CXDH4G
  13 ABREU/ANDREA           242GRC
  14 ABREU/MARCELO          2436R7
  15 ABREU/VANDA            3HDNQQ
  16 ABTS/NATHALIE          DSK9TN
  17 ABTS/NATHALIE          FZ0LN4 
我试图提取最后6个字符,例如FZ0LN4来自第17行。 我提出的正则表达式是:
([0-9]{1,5})([A-Z /]) ([0-9A-Z]{6})
但目前它不工作。任何人都可以请指出什么是问题?
已邀请:

est_ut

赞同来自:

有几个问题:

  • 你没有匹配一些空格。
  • [A-Z /]缺少重复运算符。
我会像这样重写正则表达式:
In [8]: re.match(r'\s*(\d+)\s*([A-Z /]+?)\s*(\w+)$', '  15 ABREU/VANDA            3HDNQQ').groups()
Out[8]: ('15', 'ABREU/VANDA', '3HDNQQ')
如果你只需要最后六个字符,则不需要正则表达式:
In [15]: s = '  15 ABREU/VANDA            3HDNQQ'
In [16]: s[-6:]
Out[16]: '3HDNQQ'

eomnis

赞同来自:

这很容易在没有正则表达式的情况下完成:

st='''\
   1 AA/BB                  0C89JG
   2 ABANO/ANA VICTORIA     F12LFJ
   3 ABBOUDLASTNAME/ABBOUDF DWPTHC
   4 ABDALLAH/SIJAM         H0ZDM9
   5 ABDEL MESSIH/DINA      T0SF8N
   6 ABHISHEK/PRAMANIK      7SLKXV
   7 ABHYANKAR/DHANANJAY    7SM0BV
   8 ABOUSALAMA/FEMKE       LTTRQC
   9 ABRAMOVA/NATALIA       77LCPZ
  10 ABRANTES/JOAO          KXZC7Q
  11 ABRATH/LUC             D5J99J
  12 ABREO/HECTOR           CXDH4G
  13 ABREU/ANDREA           242GRC
  14 ABREU/MARCELO          2436R7
  15 ABREU/VANDA            3HDNQQ
  16 ABTS/NATHALIE          DSK9TN
  17 ABTS/NATHALIE          FZ0LN4'''
for line in st.splitlines():
    print line.split()[-1]
打印:
0C89JG
F12LFJ
DWPTHC
H0ZDM9
T0SF8N
7SLKXV
7SM0BV
LTTRQC
77LCPZ
KXZC7Q
D5J99J
CXDH4G
242GRC
2436R7
3HDNQQ
DSK9TN
FZ0LN4
或者,如果你只想要'第n'个,就像这样:
>>> li=[line.split()[-1] for line in st.splitlines()]
>>> li[-1]
'FZ0LN4'
>>> li[-2]
'DSK9TN'    # etc etc
或者,如果你真的想要一个正则表达式:
>>> re.findall(r'\s(\S{6})$',st,re.MULTILINE)
['0C89JG', 'F12LFJ', 'DWPTHC', 'H0ZDM9', 'T0SF8N', '7SLKXV', '7SM0BV', 'LTTRQC', '77LCPZ', 'KXZC7Q', 'D5J99J', 'CXDH4G', '242GRC', '2436R7', '3HDNQQ', 'DSK9TN', 'FZ0LN4']
>>> re.findall(r'\s(\S{6})$',st,re.MULTILINE)[-1]
'FZ0LN4'

eet

赞同来自:

使用$字符作为行和\S用于非whiteSpace字符

import re
>>> s = s = '''   1 AA/BB                  0C89JG
   2 ABANO/ANA VICTORIA     F12LFJ
   3 ABBOUDLASTNAME/ABBOUDF DWPTHC
   4 ABDALLAH/SIJAM         H0ZDM9
   5 ABDEL MESSIH/DINA      T0SF8N
   6 ABHISHEK/PRAMANIK      7SLKXV
   7 ABHYANKAR/DHANANJAY    7SM0BV
   8 ABOUSALAMA/FEMKE       LTTRQC
   9 ABRAMOVA/NATALIA       77LCPZ
  10 ABRANTES/JOAO          KXZC7Q
  11 ABRATH/LUC             D5J99J
  12 ABREO/HECTOR           CXDH4G
  13 ABREU/ANDREA           242GRC
  14 ABREU/MARCELO          2436R7
  15 ABREU/VANDA            3HDNQQ
  16 ABTS/NATHALIE          DSK9TN
  17 ABTS/NATHALIE          FZ0LN4'''
>>> re.findall('\\S{6}$', s, re.MULTILINE)
['0C89JG', 'F12LFJ', 'DWPTHC', 'H0ZDM9', 'T0SF8N', '7SLKXV', '7SM0BV', 'LTTRQC', '77LCPZ', 'KXZC7Q', 'D5J99J', 'CXDH4G', '242GRC', '2436R7', '3HDNQQ', 'DSK9TN', 'FZ0LN4']

et_et

赞同来自:

如果您只需要该行末尾的字符串,则可以使用更简单的正则表达式,例如:\b\w{6}\b$

vanimi

赞同来自:

你在寻找最后一行(17)吗?如果是这样,请重新搜索整个字符串:

import re
myString="""
   1 AA/BB                  0C89JG
   2 ABANO/ANA VICTORIA     F12LFJ
   3 ABBOUDLASTNAME/ABBOUDF DWPTHC
   4 ABDALLAH/SIJAM         H0ZDM9
   5 ABDEL MESSIH/DINA      T0SF8N
   6 ABHISHEK/PRAMANIK      7SLKXV
   7 ABHYANKAR/DHANANJAY    7SM0BV
   8 ABOUSALAMA/FEMKE       LTTRQC
   9 ABRAMOVA/NATALIA       77LCPZ
  10 ABRANTES/JOAO          KXZC7Q
  11 ABRATH/LUC             D5J99J
  12 ABREO/HECTOR           CXDH4G
  13 ABREU/ANDREA           242GRC
  14 ABREU/MARCELO          2436R7
  15 ABREU/VANDA            3HDNQQ
  16 ABTS/NATHALIE          DSK9TN
  17 ABTS/NATHALIE          FZ0LN4
"""
m = re.search("(\S{6})$", myString)
if m:
    print m.group(1)
如果需要查找特定行,则应单独迭代这些行:
for line in myString.split("\n"):
    m = re.search("^\s*17\s*.*(\S{6})$", line)
    if m:
        print m.group(1)