A client of mine has over a million scientific article summaries that need important into a database, for some reason the only format they can get them in is the text from emails, each email having over 8,000 articles. Seems to me the best way to extract the data is using regular expressions. I've split the articles into an array, and I'm trying to get the data out of them, here's a sample of one: I'm having trouble getting even something as simple as the the paper title out of it, here's what I've tried: PHP: //Paper:(?P<paper>[-a-zA-Z0-9/\w]*)[\r\n\t\w]*From: preg_match_all('Paper:([-/a-zA-Z0-9]*)', $art, $data); // Paper:(?P<paper>[\w\s\d/-]?)From:(?P<from>\w\s\d/-@<>?)Date: print_r($data); The comments are variations I've tried, but even using '[a-z]*' as the expression returns an empty array. I'm sure I'm missing something very simple here... but I'm stuck?