Regular expressions are a developer’s best friend. Seasoned programmers can wield regular expressions to extract structured information from often near random input. And Ruby’s explicit syntax for regular expressions makes adding a little order to your data chaos a pinch. Turns out, the syntax works well for the opposite as well — creating random data from simple expressions.
Randexp allows you to use regular expression to generate a random string that matches the regular expression. Say you have a model with a serial_number property that validates against a regular expression.
With our regular expression, we can generate random strings that always matches this expression using the generate (or gen, for short) method.
/XX\d{4}-\w-\d{5}/.generate #=> "XX3770-M-33114"
The generate and gen methods are added to the Regexp class when the regexp gem is required, and construct a Randexp object with the regex’s source, which is ‘reduced’ into a string. The Randgen class is used to generate the actual random values, which can be extended to allow for more complex expressions, which covered later.
Right now, there is support for the single character matchers: word(\w), whitespace(\s), and decimal(\d), along with literals and multiplicity operators(*, +, ?, {}). One caveat though, most expressions raise errors when combined with the * or + operator.
/Aa{3}h*!/.gen
# => RuntimeError: Sorry, "h*" is too vague, try setting a range: "h{0,3}"
/Aa{3}h{3,15}!/.gen
# => "Aaaahhhhh!"
/(never gonna (give you up|let you down), )*/.gen
# => RuntimeError: Sorry, "(...)*" is too vague, try setting a range:
"(...){0, 3}"
/(never gonna (give you up|let you down), ){3,5}/.gen
# => "never gonna give you up, never gonna let you down, never gonna give
you up, never gonna give you up, "
The exception being the word matcher, which is treated as a random word. If a specific length or range is given for a word matcher, a word of suitable length is generated.
/\w{10}/.gen # a word with 10 letters
# => "Chaucerism"
/\w{5,15}/.gen
# => "cabalistic"
This is still a bit cryptic, but the [:method_name:] syntax can be used to clean it up a bit, which calls the class level method of the Randgen class.
/[:word:]/.gen
# => "deutomala"
/[:sentence:]/.gen
# => "Antiphonically electrotellurograph chromatype proczarist plumet"
/[:paragraph:]/.gen
# => "Sesquioxide conationalistic paragoge dingus unsteadfast tenophyte
goetic phytonomy hebephrenia rix uninjured biventral. Householdry clunk
amateur ramekin baronet chirotonsory mythical hobbist semblative
cubonavicular outbrother templeward thaumatology velutina dharmasmriti
kassak. Persecutor wudu bertie deputative carburant."
Extending Randgen
You can add class level methods to the Randgen class which can be used within your regular expression using the [:xxx:] syntax.
class Randgen
def self.serial_number(options = {})
/XX\d{4}-\w-\d{5}/.gen
end
end
/[:serial_number:]/.gen
#=> "XX3770-M-33114"
Under the Hood
There are two major steps involved in generating the random string. First, the regular expression is converted into a nifty little s-expression with the Parser class that is stored in the Randexp instance.
Randexp.new("(a|b)\\w*").sexp
# => [:union,
[:intersection,
[:literal, "a"],
[:literal, "b"]],
[:quantify,
[:random, :w],
:*]]
This sexp is then ‘reduced’ to the random output by the Reducer class, which walks the sexp constructing the string.
Dictionary
Randomly generated words are not actually generated. Instead they are picked from a dictionary of words loaded from your local words file, which typically holds thousands of words. So it’s got plenty to choose from. The words are also mapped by size, allowing you to generate words of a specific length, or within a range.
/\w{2,6} \w{10,20}, inc/.gen
#=> "mold forethoughtfulness, inc"
Note Right now randexps looks for your words file in /usr/share/dict/ or /usr/dict/. This works on OSX and most *nix distros, although I had to create a symlink on gentoo. Windows users are S.O.L., unless there’s a way to get a words file with cygwin.
Installation & Use
It’s published on github’s gemserver for the time being, it will be on rubyforge soon as well. Here’s the command to install it from github’s gemserver. Note You must be running the latest version of rubygems for this to work.
gem sources -a http://gems.github.com/
gem install benburkert-randexp
Load randexp from irb with the following.
gem "benburkert-randexp"
require "randexp"
Raison d’être
I started writing randexp because of another gem I was working on, can_has_fixtures, yet another alternative to fixtures. CHF replaces fixtures by generating pseudo-random data for model instances. By itself, randexp probably isn’t very useful, but combined with a model generator it can be quite helpful.
User.fixture(:employee) {{
:first_name => (first_name = Randgen.word).capitalize,
:last_name => (last_name = Randgen.word).capitalize,
:username => username = "#{last_name}#{first_name[0, 1]}",
:email => /#{username}@(corp|subsidiary|partner)\.com/.gen,
:ssn => /\d{3}-\d{2}-\d{4}/.gen,
:addr1 => /\d{2,4} (\w+ ){1,3}(street|lane|way), \w+, \d{5}/.gen,
:records => (1..5).of { Record.generate(:employee) }
}}
User.generate(:employee).ssn
# => "735-50-9234"
User.generate(:employee).addr1
# => "8829 yearbook way, unconvenable, 29290"
Pretty cool, especially when you start extending the Randgen class.
I’m about to start rewriting CHF due to a few nasty bugs caused by single table inheritance models on DataMapper edge. It will probably be DataMapper specific because DataMapper is now our ORM of choice. (you can follow along here, but right now it’s just vaporware).
Comments: 2