Skip to main content

Parsing and reporting on hyperlinks in email using EWS and REST (eg looking for baseStriker) in Exchange and Office365

Its been quite a busy week in Email security the pass 7 days with 2 new vulnerabilities released in the last week first BaseStriker https://www.avanan.com/resources/basestriker-vulnerability-office-365 and now EFail https://searchsecurity.techtarget.com/news/252441096/Efail-flaws-highlight-risky-implementations-of-PGP-and-S-MIME . While its still too early to gauge the implications of both of these flaws what they both have in common is using the HTML body of a message and underlying html markup tags to make these exploits work. With baseStriker its the use of the Base Href tag in a HTML document and with EFail using an Img Src tag to send decrypted email contents to an external server (this is an over simplification).

In this post I'm going to look at how you can parse the HTML Links, Image SRC tags from messages that are sitting in a Mailbox (so post any Transport pipeline filtering) and provide a level or reporting on these. Or basically because we are going to be using the Mailbox API's for this we are looking directly at what's available to any Email Client in terms of Link and Images.

The Challenge

The challenge with this type of problem is that by their very nature the payload your looking for will vary that any form of formal search for a static URL will fail as Phisher and spammers have developed ways of getting around scanning methods that just look statically for values (basically ruling out Search-Mailbox). So one way to attack this is to get the Body Content one by one (which is an expensive thing to do in terms of time and resources) and do the scanning at the client end.

Different types Messages Bodies

The format for Message bodies can vary depending on the Mail Agent (eg the email client) that is sending the Message for example in Exchange you could have a Native Body type of RTF,HTML or Text (or it could be multi part). If for example you are using Outlook and you have chosen RTF as the Body type when sending a Message to another user locally on the same Exchange server. Then only the native body RTF will be stored for the Message and the Exchange Store will do an on the fly conversion of the RTF body to HTML when the first client requests the HTML body. The Best body algorithm describes this problem in more detail https://msdn.microsoft.com/en-us/library/cc463905(v=exchg.80).aspx . With my scripts I've chosen to use the PidBodyHTML Extended Property for the HTML body because I found this gave me the most raw version of the BodyHTML which was important to getting the most accurate link report.

Parsing

You would think that parsing HTML would be a pretty basic and easy thing to do in any API and it is up to point. Eg a lot of people point towards using this method in PowerShell to parse HTML

$HTMDoc = New-Object -com "HTMLFILE"
$HTMDoc.IHTMLDocument2_write($html)

While this works okay and produces a nice result with all the Links and Images in a collection because this is also essentially rendering the HTML it will execute any javascript in the HTML (which shouldn't be there for Email) but also it downloads all the images in the src links. On suspect content this isn't what your really want to be doing and even on Marketing type emails because often images in emails are used to perform beaconing so if your looking to do something simular to this yourself be very careful of using any objects that are going to parse (especially those that reuse browser objects like the above example) to a dom as there might be unintended consequences you didn't expect if you don't fully understand how the object you using is parsing the content. With my script I'm just relying on firstly a very simple RegEx to get all the HTML tags and then some other filtering code to pull the attributes out for href links, base  and src links and then some further code to expand any base url links. While this isn't perfect and does fail in some instances its at least safe as it won't activate any content and generally you can just tweak the code to workaround any failures.

Scripts

I've created an EWS version and a Graph/Rest version of this code which should be useable in both OnPrem or Office365. The EWS version can be found in GitHub here https://github.com/gscales/Powershell-Scripts/blob/master/ExchangeBodyLinks.ps1 the Graph version is in my Exch-REST module which  is available from the PowerShell Gallery https://www.powershellgallery.com/packages/Exch-Rest and GitHub https://github.com/gscales/Exch-Rest (version 3.8)

The Code

With the code I've written its separated into two function the first function

EWS
 Get-EWSBodyLinks -MailboxName jcool@datarumble.com -FolderPath \Inbox -MessageCount 500
REST

 Get-EXREmailBodyLinks -MailboxName jcool@datarumble.com -FolderPath \Inbox -MessageCount 500
The inputs are relatively simple it will take the FolderPath and MessagCount for the number of messages you want scanned. Then the function does the parsing of the Message Body and builds 3 dictionary objects with the Links,Images and Basehref details of the underlying HTML body of the messages that are scanned. This property is the added back to the EWS Managed API or Custom Rest object so it available for further pipeline or script processing in PowerShell.  eg


theses properties are collections or URI objects so you can do further things like

$Messages[0].ParsedLinks.Links | select absoluteuri

to just show the absolute URI on a message or if you where just interested in links from a particular URLShortner you could use

$Messages[0].ParsedLinks.Links | where-object dnsSafehost -eq "Aka.ms"

And a whole number of other things

BaseStriker Reporting

In the instance where you want to see which emails are using the base href tags (which may or may not be related to basestriker you can use the following)

EWS
$BaseHrefMessages = Get-EWSBodyLinks -MailboxName gscales@domain.com -FolderPath \Inbox -MessageCount 10000 | where-object {$_.ParsedLinks.HasBaseURL -eq $true} 
REST

$BaseHrefMessages =  Get-EXREmailBodyLinks -MailboxName jcool@domain.com -FolderPath \Inbox -MessageCount 500 | where-object {$_.ParsedLinks.HasBaseURL -eq $true}  
These examples will return a collection of Messages that are using the BaseURL which you can then have a look at further. For example if you had a Mail that was matching Avanan's sample for BaseStriker the ParsedLinks property on a returned message would look like


In the parsing code I expand out the relative URL's that are used when there is BaseURL in the document.

In most of the scanning that I did on my email there where a few companies that used the BASEURL legitimately for instance it seems to be used in OneDrive where you share a item in the invitation message that gets sent out.

Reporting

The second cmdlets I've written takes the data from the above functions and then preforms a consolidation report on the Domains in the href links, the domain in the Img src links, the href and img src's. For each of these reporting areas it counts the number of times the link appears and the number of messages that the link or domain appears in. To run the Reports

EWS
$Report = Get-LinkReport -MailboxName mailbox@domain.com -FolderPath \Inbox -MessageCount 100
REST

$Report = Get-EXREmailLinkReport -MailboxName mailbox@domain.com -FolderPath \Inbox -MessageCount 100
In these examples you will end up with a $Report variable that contains collections that you could export to CSV or do some further manipulation eg

$Report = Get-EXREmailLinkReport -MailboxName mailbox@domain.com -FolderPath \Inbox -MessageCount 100
$report.Domains | Sort-Object MessageCount -Descending

Results

There are a lot of Links and Images used within email so this type of parsing of Email will produce a lot of data that you need to filter or process further. Eg if you started to find links that you think might be suspect then you may want to look at using a service link VirusTotal
https://www.virustotal.com which has the ability to scan suspect links and return the results using an API. They also provide a paid for private API's if your going to do this in a high volume nature. The other thing is downloading the body of each email is a pretty costly process so watch out for throttling if your doing this on a large scale basis.


Popular posts from this blog

Testing and Sending email via SMTP using Opportunistic TLS and oAuth in Office365 with PowerShell

As well as EWS and Remote PowerShell (RPS) other mail protocols POP3, IMAP and SMTP have had OAuth authentication enabled in Exchange Online (Official announcement here ). A while ago I created  this script that used Opportunistic TLS to perform a Telnet style test against a SMTP server using SMTP AUTH. Now that oAuth authentication has been enabled in office365 I've updated this script to be able to use oAuth instead of SMTP Auth to test against Office365. I've also included a function to actually send a Message. Token Acquisition  To Send a Mail using oAuth you first need to get an Access token from Azure AD there are plenty of ways of doing this in PowerShell. You could use a library like MSAL or ADAL (just google your favoured method) or use a library less approach which I've included with this script . Whatever way you do this you need to make sure that your application registration  https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-

How to test SMTP using Opportunistic TLS with Powershell and grab the public certificate a SMTP server is using

Most email services these day employ Opportunistic TLS when trying to send Messages which means that wherever possible the Messages will be encrypted rather then the plain text legacy of SMTP.  This method was defined in RFC 3207 "SMTP Service Extension for Secure SMTP over Transport Layer Security" and  there's a quite a good explanation of Opportunistic TLS on Wikipedia  https://en.wikipedia.org/wiki/Opportunistic_TLS .  This is used for both Server to Server (eg MTA to MTA) and Client to server (Eg a Message client like Outlook which acts as a MSA) the later being generally Authenticated. Basically it allows you to have a normal plain text SMTP conversation that is then upgraded to TLS using the STARTTLS verb. Not all servers will support this verb so if its not supported then a message is just sent as Plain text. TLS relies on PKI certificates and the administrative issue s that come around certificate management like expired certificates which is why I wrote th

The MailboxConcurrency limit and using Batching in the Microsoft Graph API

If your getting an error such as Application is over its MailboxConcurrency limit while using the Microsoft Graph API this post may help you understand why. Background   The Mailbox  concurrency limit when your using the Graph API is 4 as per https://docs.microsoft.com/en-us/graph/throttling#outlook-service-limits . This is evaluated for each app ID and mailbox combination so this means you can have different apps running under the same credentials and the poor behavior of one won't cause the other to be throttled. If you compared that to EWS you could have up to 27 concurrent connections but they are shared across all apps on a first come first served basis. Batching Batching in the Graph API is a way of combining multiple requests into a single HTTP request. Batching in the Exchange Mail API's EWS and MAPI has been around for a long time and its common, for email Apps to process large numbers of smaller items for a variety of reasons.  Batching in the Graph is limited to a m
All sample scripts and source code is provided by for illustrative purposes only. All examples are untested in different environments and therefore, I cannot guarantee or imply reliability, serviceability, or function of these programs.

All code contained herein is provided to you "AS IS" without any warranties of any kind. The implied warranties of non-infringement, merchantability and fitness for a particular purpose are expressly disclaimed.