Skip to content Skip to sidebar Skip to footer

Regex To Match Key In Yaml

I have a yaml which looks like this..! User can define N number of xyz_flovor_id where _flovor_id key will be common. Aim is to grab *_flavor_id key and extract value out of it.

Solution 1:

You get that error, because the value for the key server is not a string, but a dict (or a subclass of dict). That is what the YAML mapping in your input, which includes the key abc_flavor_id, is loaded as.

Apart from that it is always a bad idea to use regular expressions to parse YAML (or any other structured text format like HTML, XML, CVS), as it is difficult, if not impossible, to capture all nuance of the grammar. If it wasn't you would not need a parser.

E.g a minor change to the file, just adding a comment on which value needs updating for some user editing the file, breaks the simplistic regular expression approaches:

server:tenant:"admin"availability_zone:"nova"cpu_overcommit_ratio:1:1memory_overcommit_ratio:1:1xyz_flovor_id:1abc_flavor_id:# extract the value for this key2

This YAML documenta above, is semantically identical to yours, but will no longer work with the currently posted other answers.

If some YAML load/save operation transforms your input into (again semantically equivalent):

server: {abc_flavor_id: 2, availability_zone: nova,
  cpu_overcommit_ratio: 61, memory_overcommit_ratio: 61,
  tenant: admin, xyz_flovor_id: 1} then tweaking a dumb regular expression will not begin to suffice (thisis not a construed example, thisis the default way to dump your data structure in PyYAML and in ruamel.yaml using 'safe'-mode).

What you need to do, is regular expression match the keys of the value associated with server, not the whole document:

import re
import sys
from ruamel.yaml import YAML

yaml_str = """\
server:
  tenant: "admin"
  availability_zone: "nova"
  cpu_overcommit_ratio: 1:1
  memory_overcommit_ratio: 1:1
  xyz_flovor_id: 1
  abc_flavor_id:  # extract the value for this key
    2
"""defget_flavor_keys(params):
    pattern = re.compile(r'(?P<key>.*)_flavor_id')
    ret_val = {}
    for key in params['server']:
        m = pattern.match(key)
        if m isnotNone:
            ret_val[m.group('key')] = params['server'][key]
            print('test', m.group('key'))
    return ret_val

yaml = YAML(typ='safe')
data = yaml.load(yaml_str)
keys = get_flavor_keys(data)
print(keys)

this gives you:

{'abc': 2}

( the xyz_flovor_id of course doesn't match, but maybe that is a typo in your post).

Solution 2:

You need this regex. I grouped it to key-value pair:

^\s*(?P<key>\w+_flavor_id):\s*(?P<value>\d+)

Python demo: https://repl.it/Lk5W/0

import re

regex = r"^\s*(?P<key>\w+_flavor_id):\s*(?P<value>\d+)"

test_str = ("  server:\n""    tenant: \"admin\"\n""    availability_zone: \"nova\"\n""    cpu_overcommit_ratio: 1:1\n""    memory_overcommit_ratio: 1:1\n""    xyz_flavor_id: 1\n""    abc_flavor_id: 2\n")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match inenumerate(matches):
    print ("{key}:{value}".format(key = match.group('key'), value=match.group('value')))

Solution 3:

You can use this regex:

\b[^_\n]+_flavor_id:\s*(\d+)

Click for Demo

Regex Explanation:

  • \b - word boundary
  • [^_\n]+ - 1+ occurrences of any character which is not an _ nor a newline character
  • _flavor_id: - matches _flavor_id: literally
  • \s* - matches 0+ occurences of a white space character
  • (\d+) - matches and captures 1+ digits. This is the value that you needed.

I am not well versed with python but regex101 allows us to generate the code. So, I am pasting the code here which you can use.

import re

regex = r"\b[^_\n]+_flavor_id:\s*(\d+)"

test_str = ("server:\n""    tenant: \"admin\"\n""    availability_zone: \"nova\"\n""    cpu_overcommit_ratio: 1:1\n""    memory_overcommit_ratio: 1:1\n""    xyz_flavor_id: 1\n""    abc_flavor_id: 2")

matches = re.finditer(regex, test_str)

for matchNum, match inenumerate(matches):
    matchNum = matchNum + 1print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum inrange(0, len(match.groups())):
        groupNum = groupNum + 1print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

This is the output I got: enter image description here

Post a Comment for "Regex To Match Key In Yaml"