If you have once crossed paths with data entry and extraction tasks either as a freelancing virtual assistant or for a personal/work project, you have understood the importance of data organization and management. The Linux operating system always transforms its users into data specialists.
An important piece of data that is irreplaceable in both personal and work projects is the email address. It uniquely identifies and links specific user information within a data management system.
Using an email address as a primary data reference point meets the following objectives.
- Users engaged in business projects like e-commerce can easily communicate and process information related to their clients and staff without breaching any security protocol.
- Emails are faster and hence easy to transmit. Also, a duplicate email message is always saved on the receiver’s machine for future reference.
- Email addresses have a global outreach meaning you can grow your projects on a worldwide scale.
- Emails are paperless. You do not have to worry about any ambiguous paperwork associated with it.
- Email messages can be automated for different calendar dates whether sending or replying to a received message.
This article guide will address possible approaches to extracting Email addresses from a file using the grep command in Linux.
Grepping Emails from a File in Linux
All email addresses adhere to a similar syntax/format as highlighted below:
<username>@<domain_name>.<domain_name_extension>
The representation of the above Email Address format as a regular expression is as follows:
[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+
The regular expression portion for <username>
accommodates lowercase letters, uppercase letters, all numerals, a period, and an underscore. The <domain_name>
and <domain_name_extension>
portions accommodate lowercase and uppercase letters. The portion \+
accommodates the repetition of numbers and alphabets.
Let us create a sample text file with information mixed with some email addresses and preview it using the cat command.
$ nano i_have_emails.txt
Now that we have our sample text file populated with some email address data, it is time to reveal the grep command syntax that will be responsible for only extracting the email address details from this file.
$ grep -e -o <"pattern"> <filename>
The <pattern> portion of this syntax is equivalent to the regular expression [a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+
. The <filename> portion is equivalent to the sample text file we just created and populated.
The grep command manual page ($ man grep)
describes it as a useful command for printing lines that are 100% matched for a specific pattern.
The command option -e
tells grep to use the provided pattern or regular expression while querying the contents of the present text file. The command option -o
tells grep to only output matched patterns from the query’s findings.
We can implement the above grep syntax to suit our scenario in the following manner.
$ grep -oe "[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+" i_have_emails.txt
As you can see, all the five emails present in the text file have been printed on the Linux terminal. To save the above output to a new file, we will slightly modify our grep command in the following manner.
$ grep -oe "[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+" i_have_emails.txt > received_emails.txt $ cat received_emails.txt
Executing the grep command without the command option -o
will only highlight portions of the text file with email address information.
$ grep -e "[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+" i_have_emails.txt
We can now comfortably extract Emails from a file using the Grep command.