Regular Expression Text Function for substring extraction
A regular express text function would be very useful to extract content from a Text field into a column.
For example, if a Description field has:
I can write a regex formula to extract out the value for Name: into a Column.
Will add another function to extract a matching substring. A longer workaround is already available: please see the comment.
Thanks for the kind words :) Yes, please do post ideas!
Mikal H Henriksen commented
Thanks for the quick reply, Igor! The REPLACE trick can do basically all I asked for, and even handles multiple capture groups :) And thanks for the tip about the flags. I'm familiar with Java's Matcher, so knowing that that's what's behind the curtain is a huge help.
10/10 service A++++ would post idea again!
David Yu commented
Thanks Igor, the REPLACE with regex backreference is a neat-trick. Would be useful to have it doc'ed because I reference the material all the time.
David, Mikal, thank you! That's a great idea.
I think it's currently possible, although it's not very convenient and, frankly, poorly documented. What you can do is to use the formula column and REPLACE function, using regular expressions. See https://wiki.almworks.com/display/structure/Expr+Pattern+Matching for starters.
What we failed to mention is that you can use $1, $2, etc. in the replacement string to refer to the groups in the regex – since this is all done with the Java's Matcher.replaceAll() method.
So, for example, if you have field value "Name: Value", you can extract the value with a formula
REPLACE(Field; '/Name: (.*)/'; '$1')
Note that it will only replace the matching parts, which means that if you want to _extract_ the value, you will need to match the whole string, probably using ".*" parts. If you're dealing with Description or other multi-line fields, you will need to add a flag to have "." match the end-of-line characters ("\n"). So if you have a Description that somewhere contains this name: value text, you would use
REPLACE(Description; '''/(?sm).*Name:([^\n]+).*/'; '$1')
Also, if there's no match, no replacement takes place and you get the original value of the field. To have an empty result in such cases, you will need to use IF() and MATCH() with a similar regex.
We could have a function like EXTRACT(), which would make this all much simpler. We'll add it to our plans! Also, we'll need to update the documentation with the information I wrote here.
Mikal H Henriksen commented
I think it would need to be more flexible than just a fixed pattern like that. For example it would be fantastic to be able to capture release note text from description with a pattern like "h4\. Release Notes (.*-) h4\.", or pick out specific labels like "theme/(.*)" and "feedback (.*)"
Dunno if you want to go for something simpler, than regular expression syntax, but it would be a really powerful tool to have, especially for fields that can have multiple values like links, labels, fix version etc.
AdminJulia (Product Owner, ALM Works) commented
Thank you for your request! Small question: do you know the exact pattern, is it always "Name: Value" format?
ALM Works Team