I'm taking a quick hiatus from our Trigger Optimization 101 series to discuss how not to write triggers. I had the good fortune, recently, of crushing an ExtraHop appliance through a subtle mistake in some trigger code. Take a quick peek at the incoming data breakdown.
If I'm doing my math right, 170Mb/s is less than the advertised 20Gb/s. How did I achieve such poor performance?
Regular Expression + Infinite Loops => Profit
Both exec and match let us use regular expression to pull fields from a string using capture groups. However, with exec, we can loop to extract multiple matches. Take the following string:
username:kenp username:dillonf
I want to pull out the usernames so I'll use /username:(\w+)/g
for my regex. Breaking it down:
/
-- start the regex definitionusername:
-- look for the exact string "username:"(
-- start capture group\w
-- match word character (all letters, numbers, and the underscore)+
-- keep matching preceding character (word characters) until we hit a non-match (a non-word character))
-- end capture group/g
-- end the regex definition and apply the global flag, indicating that we want to find as many matches in the string as possible
If we run the following code:
var str = 'username:kenp username:dillonf';
console.log(/username:(\w+)/g.exec(str));
console.log(str.match(/username:(\w+)/g));
Output:
["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"]
["username:kenp", "username:dillonf"]
Comparing the two methods, both returned an array of values but exec pulled out the username "kenp" but did not match "dillonf", while match returned both username matches but didn't extract the actual usernames.
To work around these limitations, exec can be called multiple times in a loop and will continue to find patterns which fit the regex while extracting capture groups. When the string is exhausted and there are no more matches, exec returns null. With this information, I naively put together code like the below:
var str = 'username:kenp username:dillonf';
var match;
while ( (match = /username:(\w+)/g.exec(str)) !== null) {
console.log(match);
}
Did you try it? Did the results look something like this forever?
...
["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"]
["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"]
["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"]
["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"]
["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"]
["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"]
["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"]
["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"]
...
What Happened?
To understand what's happening, let's break down the while loop
/username:(\w+)/g
-- create a new regexexec(str)
-- execute regex against our stringmatch =
-- assign value to match!== null
-- check the match is not null (the regex didn't match anything)while (...) {
-- while the match is not null, keep looping
When we create a new regex for each iteration of the loop, we are losing the index of the previous match. Instead the regex matches the same substring again, causing an infinite loop. The fix is easy enough: scope the regex outside of the loop:
var str = 'username:kenp username:dillonf';
var str_re = /username:(\w+)/g;
var match;
while ( (match = str_re.exec(str)) !== null) {
console.log(match);
}
Output:
["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"]
["username:dillonf", "dillonf", index: 14, input: "username:kenp username:dillonf"]
Moral of the story: be careful when using regular expression and loops. That's it for now, until I make another catastrophic mistake. Have a good week everybody!