top of page
Writer's pictureThe Tech Platform

What is YARA? YARA Rules for Malware Detection.

YARA - Yet Another Recursive/Ridiculous Acronym




YARA is a tool which is used for malware detection and research. YARA by default comes with module to process PE, ELF analysis and also support for open-source cuckoo sandbox

  • PE- The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems.

  • ELF- In computing, the Executable and Linkable Format, is a common standard file format for executable files, object code, shared libraries, and core dumps.

  • Sandbox- In computer security, a sandbox is a security mechanism for separating running programs, usually in an effort to mitigate system failures or software vulnerabilities from spreading


YARA Rules for Malware Detection

We all know it's way more fun to hack shit than to patch shit. That said, not all employers will be satisfied with a hacker who can only compromise systems. Some companies want security researchers that are able to apply patches based on malware samples/breach data they have collected or have found in the wild.

Author Assigned Level: Newbie or Wannabe


Required Skills

There really aren't too many skills required for this. The deeper you understand malware analysis and reverse engineering, the more capable you'll be at finding unique ways to catch malware. But this won't hinder you from writing amazing yara rules. Most of the rules I've seen are pretty basic. Most look like a python script that took 5 minutes to write. The skill and detail comes in the analysis. Not in the actual yara rule itself.

  • GNU Linux

  • Familiar with C syntax (not required, but useful)

  • Regex (not required, but useful)


Writing Yara Rules

Yara mostly resembles the syntax of the C language. Here is a simple rule that does nothing.

rule HelloRule
{
condition:
false
}

1. Rule Identifier

The word the follows rule, in this case "dummy", is known as the rule identifier. They can be:

  • alphanumeric characters

  • underscore character

  • first char can't be a digit

  • case-sensitive

  • cannot exceed 128 characters


2. Yara Keywords

The following can't be used as a rule identifier because they're special to the yara language.

all, and, any, ascii, at, condition, contains entrypoint, false, filesize, fullword, for, global, in import, include, int8, int16, int32, int8be, int16be int32be, matches, meta, nocase, not, or, of private, rule, strings, them, true, uint8, uint16 uint32, uint8be, uint16be, uint32be, wide

Generally, yara has two sections: strings definition **and **condition.


rule HelloRule2    // This is an example
{
    strings:
        $my_text_string = "text here"
        $my_hex_string = { E2 34 A1 C8 23 FB }

    condition:
        $my_text_string or $my_hex_string
}

This rule will be active when either string is found. As you can see, you can also add comments.


a. Hexadecimal Strings


Wildcards Acceptable uses for hex-strings are wildcards, which are represented with a "?" mark.

rule GambitWildcard
{
    strings:
       $hex_string = { EF 44 ?? D8 A? FB }

    condition:
       $hex_string
}

This will catch any of the following:

EF 44 01 D8 AA FB
EF 44 AA D8 AB FB

**Unknown Length of Wildcard **

Strings with an unknown length can be represented as the following:

rule MarioJump
{
        strings:
           $hex_string = { F4 23 [4-6] 62 B4 }

        condition:
           $hex_string
}

This will catch any of the following:

F4 23 01 02 03 04 62 B4
F4 23 AA BB CC DD EE FF 62 B4

Infinite is also possible

rule BuzzLightyear
{
        strings:
           $hex_string = { F4 23 [-] 62 B4 }

        condition:
           $hex_string
}

This will catch any of the following:

F4 23 AA FF 62 B4
F4 23 AA AA AA AA AA...FF FF 62 B4

Conditional Strings

You can create 1 to as many statements as you like

rule WorriedRabbit
{
    strings:
       $hex_string = { BA 21 ( DA BC | C6 ) A5 }

    condition:
       $hex_string
}

This will catch any of the following:

BA 21 DA BC A5
BA 21 C6 A5

Mixing it all up

You can also combine them all, of course

rule WorriedGabmitLightyearJump
{
    strings:
       $hex_string = { BA ?? ( DA [2-4] | C6 ) A5 }

    condition:
       $hex_string
}

This will catch any of the following:

BA 01 DA 01 02 03 04 A5
BA AA C6 A5
BA FF DA 01 02 A5


b. Text Strings

An alternative to hex-strings, one can also use text strings.

rule KimPossible
{
    strings:
        $alert_string = "Whats the Sitch"

    condition:
       $alert_string
}

One can also use the following escape sequences, just like in C:

**\* **Double Quotes ** Backslash \t Horizontal Tab \n New line \xdd Any byte in hexadecimal notation


c. Modifiers

Case-insensitive strings

By default, Yara is case-sensitive, but you can turn that off.

rule ThickSkin
{
    strings:
        $strong_string = "Iron" nocase

    condition:
        $strong_string
}

d. Wide-character strings

The wide modifer can be used to search for strings encoded with two bytes per character, something typically in many executable binaries. If the string "FatTony" appears encoded as two bytes per character, it will be caught if we use the modifer wide. Let's also add the nocase modifier as "FatTony" might be "fattony" and we wouldn't want to miss that.

rule FatTony
{
    strings:
        $fat_villain = "FatTony" wide nocase

    condition:
        $fat_villain
}

[ ! ] Important: Keep in mind that this modifier interleaves the ASCII codes of the characters in the string with zeroes, it does not support truly UTF-16 strings containing non-English characters. To add a search for strings in both ASCII and wide, use the following:

rule ASCIIFatTony
{
    strings:
        $fat_villain = "FatTony" wide ascii nocase

    condition:
        $fat_villain
}

ASCII is assumed by default so you don't have to add ascii if you want to search for FatTony by ascii alone.

rule ASCIIFatTony
{
    strings:
        $fat_villain = "FatTony"

    condition:
        $fat_villain
}

This works if you want to search without the wide **and nocase** modifiers.


e. Fullwords Modifier

This modifier will catch on words that DO NOT have prepend and append the word with a character.

rule ShadyDomain
{
    strings:
        $shady_domain = "faceebook" fullword

    condition:
       $shady_domain
}

This will catch any of the following:

www.faceebook.com
www.myportal.faceebook.com
https://secure.faceebook.com

This will** NOT catch** any of the following:

www.myfaceebook.com
thefaceebook.com

The difference is that that the fullword is prepended or appended by a special character, not a regular character.


f. Regular Expression

Enclosed in forward slashes instead of double quotes, (like Perl Programming), yara allows for RegEx.

rule RegularShow
{
    strings:
        $re1 = /md5: [0-9a-fA-F]{32}/
        $re2 = /state: (on|off)/

    condition:
        $re1 and $re2
}

This will catch any md5 string it finds, in either state.

One can also apply text modifiers such as nocase,** ascii**,** wide**,** **and **fullword **to RegEx as well.


g. Metacharacters:


A metacharacter is a character that has a special meaning (instead of a literal meaning) to a computer program. For RegEx, these are the following meanings

** Quote the next metacharacter ^ Match the beginning of the file $ Match the end of the file **\|** Alternation () Grouping [] Bracketed character class

The following quantifiers are also recognized:

* Match 0 or more times + Match 1 or more times ? Match 0 or 1 times {n} Match exactly n-times {n, } Match at least n-times { ,m} Match at most m-times {n, m} Match n to m-times.

The following escape sequences are recognized:

\t Tab (HT, TAB) \n New Line (LF, NL) **\r **Return (CR) \f Form feed (FF) \a Alarm bell \xNN Character whose ordinal number is the given hexadecimal number

These are the recognized character classes:

\w Match a _word _character (alphanumeric plus "_") \W Match a non-word character \s Match a whitespace character \S Match a non-whitespace character **\d **Match a decimal digit character \D Match a non-digit character \b Match a word boundary \B Match except at a word boundary


h. Sets of strings


If the event where you want a certain number of strings from a list to be hit, you can implement the following:

rule MigosPresent
{
    strings:
        $m1 = "Quavo"
        $m2 = "Offset"
        $m3 = "Takeoff"

    condition:
        2 of ($m1,$m2,$m3)
}

If any of the two Migos members are present, then the Migos are present.

You can also use wildcards to represent a set. Used this way, you would use the * wildcard.

rule MigosPresent
{
    strings:
        $m1 = "Quavo"
        $m2 = "Offset"
        $m3 = "Takeoff"

    condition:
        2 of ($m*)
}

To represent all variables in strings, you can use the them keyword.

rule ThreeRappersPresent
{
    strings:
        $m1 = "Quavo"
        $m2 = "Offset"
        $m3 = "Takeoff"
        $q1 = "Cardi B"

    condition:
        3 of them // equivalent to 3 of ($*)
}

Any expression that returns a numeric value can be used. Here is an example of the keywords any and **all **being used.

rule Squad
{
    strings:
        $m1 = "Quavo"
        $m2 = "Offset"
        $m3 = "Takeoff"
        $q1 = "Cardi B"

    condition:
        3 of them // equivalent to 3 of ($*)
        all of them
        any of ($*) and 2 of ($*)    // Fancy way of using any in a rule that requires 3.
}

Anonymous strings with of and for...of

If the event where you are not specifically referencing strings, you can just use $ to reference them all.

rule AnonymousStrings
{
    strings:
        $ = "dummy1"
        $ = "dummy2"

    condition:
        1 of them
}


4. Conditions

Yara allows for boolean expressions via the operators, andor, and not **and relational. Arithmetic operators (+,-,*,\,%) and bitwise operators (&, \|, <<, >>, ~, ^**) can also be used on numerical expressions.


a. Boolean


String identifiers can also be used within a condition, acting as a Boolean variables whose value depends on the presence or not of the associated string in a file.

rule Example
{
    strings:
        $hero1a = "Batman"
        $hero1b = "Robin"
        $hero2a = "Edward"
        $hero2b = "Alphonse"

    condition:
        ($hero1a or $hero1b) and ($hero2a or $hero2b)
}

b. Counting string instances


Sometimes we need to know not only if a certain string is present or not, but how many times the string appears in the file or process memory. The number of occurrences of each string is represented by a variable whose name is the string identifier but with a # character in place of the $ character. For example:

rule Ransomware
{
    strings:
        $a = "encrypted"
        $b = "btc"

    condition:
        #a == 2 and #b > 2
}

This rule matches any file or process containing the string $a exactly two times, and more than two occurrences of string $b.


c. String offsets or virtual addresses


In the majority of cases, when a string identifier is used in a condition, we are willing to know if the associated string is anywhere within the file or process memory, but sometimes we need to know if the string is at some specific offset of the file or at some virtual address within the process address space. In such situations the operator at is what we need.

rule Offset
{
    strings:
        $a = "encrypted"
        $b = "btc"

    condition:
        $a at 100 and $b at 200
}

If string $a is found at offset 100 within the file (or at virtual address 100 if applied to a running process), it will catch. The string $b should also be at offset 200. You can also use hexadecimal instead of decimal notation.

rule Offset
{
    strings:
        $a = "encrypted"
        $b = "btc"

    condition:
        $a at 0x64 and $b at 0xC8
}

While the at operator is very specific, you can use the **in **operator to specify a range the string can be located at.

rule InExample
{
    strings:
        $a = "encrypted"
        $b = "btc"

    condition:
        $a in (0..100) and $b in (100..filesize)
}

String $a must be found at offset between 0-100, while string $b must be at an offset between 100 and the end of the file EOF.


You can also get the offset or virtual address of the i-th occurrence of string $a by using @a[ i ]. The indexes are one-based, so the first occurrence would be @a[1], the second being @a[2], and so on. It doesn't start at @a[0]. If you provide an index greater than the number of occurrences of the string, the result will be a NaN (Not a Number) value.


d. Match Length


For many regular expressions and hex strings containing jumps, the length of the match is variable. If you have the regular expression /fo*/ the strings “fo”, “foo” and “fooo” can be matches, all of them with a different length.


You can use the length of the matches as part of your condition by using the character ! in front of the string identifier, in a similar way you use the @ character for the offset. !a[1] is the length for the first match of $a!a[2] is the length for the second match, and so on. !a is a abbreviated form of !a[1].

rule Hak5
{
    strings:
        $re1 = /hack*/    // Will catch on hacker, hacked, hack, hack*

    condition:
        !re1[1] == 4 and !re1[2] > 6
}

This will catch the following:

We hack things. We are hackers.

The first instance of 'hack' is re1 and it's equal to length 4. the second instance of 'hack' has at least length 6.


e. File size


String identifiers are not the only variables that can appear in the condition (in fact, rules can be defined without any string definition), there are other special variables that can be used as well. filesize holds the size of the file being scanned. The size is expressed in bytes.

rule FileSizeExample
{
    condition:
       filesize > 200KB
}

We use the KB postfix to set the size in which the file will be caught on to 200KB. It automatically multiples the value of the constant by 1024. The MB postfix can be used to multiply the value by 2^20. Both prefixes can be used only with decimal constants.


[ ! ] Important:** filesize **only works when the rule is applied to a file. If applied to a running process, it won't ever match.


f. Executable entry_point


If the file is a Portable Executable **(PE**) or Executable and Linkable Format (ELF), this variable holds the raw offset of the executable's entry point in case we are scanning a file. If we're scanning a running process, the entry_point will hold the virtual address of the main executable's entry point. _A typical use of this variable is to look for some pattern at the entry point to detect packers or simple file infectors. _The current way to use entry_point is by importing the lib for PE and/or ELF and use their respective functions. Yara's entrypoint function is depreciated starting at version 3. This is how it looks pre-version 3.

rule EntryPointExample1
{
    strings:
        $a = { E8 00 00 00 00 }

    condition:
       $a at entrypoint
}

rule EntryPointExample2
{
    strings:
        $a = { 9C 50 66 A1 ?? ?? ?? 00 66 A9 ?? ?? 58 0F 85 }

    condition:
       $a in (entrypoint..entrypoint + 10)
}

[ ! ] Important - **Again, **don't use yara's entrypoint. Import PE and/or ELF and use pe.entry_point and/or elf.entry_point.


g. Accessing data at a given position


If you want to read data from a specific offset and save it as a variable you can use one of the following:

int8(<offset or virtual address>)
int16(<offset or virtual address>)
int32(<offset or virtual address>)

uint8(<offset or virtual address>)
uint16(<offset or virtual address>)
uint32(<offset or virtual address>)

int8be(<offset or virtual address>)
int16be(<offset or virtual address>)
int32be(<offset or virtual address>)

uint8be(<offset or virtual address>)
uint16be(<offset or virtual address>)
uint32be(<offset or virtual address>)

Default is little-endian. If you want to read a big-endian integer use the corresponding function ending in be.


The <offset or virtual address> parameter can be any expression returning an unsigned integer, including the return value of one the uintXX functions itself.

rule IsPE
{
  condition:
     // MZ signature at offset 0 and ...
     uint16(0) == 0x5A4D and
     // ... PE signature at offset stored in MZ header at 0x3C
     uint32(uint32(0x3C)) == 0x00004550
}

for...of:


h. Applying one condition across many string

To apply a for loop that will check if a set of strings are true to a specific condition, use the following syntax:

for num of string_set : ( boolean_expression )

The boolean_expression is evaluated for every string in string_set and there must be at least num of them true.

One can also exchange num with other keywords such as all **or any**.

for any of ($a,$b,$c) : ( $ at elf.entry_point  )

The "$" represents all of the strings in the set. In this example, it's strings $a, $b, and $c.

You can also employ the symbols # and @ to make reference to the number of occurrences and the first offset of each string.

for all of them : ( # > 3 )
for all of ($a*) : ( @ > @b )


i. Iterating over string occurrences

If you want to iterate over offsets and test a condition, one can do the following:

rule Three_Peat
{
    strings:
        $a = "dummy1"
        $b = "dummy2"

    condition:
        for all i in (1,2,3) : ( @a[i] + 10 == @b[i] )
}

This rule says that the first three occurrences of $b should be 10 bytes away from the first three occurrences of $a. Another way to write this is the following:

for all i in (1..3) : ( @a[i] + 10 == @b[i] )

We can also use expression as well. In this example, we are iterating over every occurrence of $a (remember that #a represents the number of occurrences of $a). This rule is specifying that every occurrence of $a should be within the first 100 bytes of the file.

for all i in (1..#a) : ( @a[i] < 100 )

You can also set it so it's a set amount of occurrence for the first 100 bytes.

for any i in (1..#a) : ( @a[i] < 100 )
for 2 i in (1..#a) : ( @a[i] < 100 )

5. Referencing other rules


Just like in C when referencing functions, the function, or in this case the rule, must be defined prior to being used.

rule Rule1
{
    strings:
        $a = "dummy1"

    condition:
        $a
}

rule Rule2
{
    strings:
        $a = "dummy2"

    condition:
        $a and Rule1
}


6. Yara Essentials


a. Global Rules

Allows users to impose restrictions in all the rules. If you want all your rules to ignore the files that exceed a certain size limit, you could go rule by rule making the required modifications to their conditions, or just write a global rule like this one:

global rule SizeLimit
{
    condition:
        filesize < 2MB
}

You can define as many global rules as you want. They'll run before the other rules.


b. Private Rules


Private rules don't have an output when they match. When paired with referencing other rules, this can allow for a cleaner output. Such that, to get to superMalicious, maybe one private rule is that file must be ELF. Once that is confirmed, then the next rule will execute. But we don't want to see ELF, in output. We just want to know if it's superMalicious or not. To create a private rule, just add private in front of rule.

private rule PrivateRule
{
    ...
}

c. Rule tags


You can tag your rules in case you only want to see the output of type ruleName.

rule TagsExample1 : Foo Bar Baz
{
    ...
}

rule TagsExample2 : Bar
{
    ...
}

d. Metadata

This allows for additional data to be stored in a rule.

rule MetadataExample
{
    meta:
        my_identifier_1 = "Some string data"
        my_identifier_2 = 24
        my_identifier_3 = true

    strings:
        $my_text_string = "text here"
        $my_hex_string = { E2 34 A1 C8 23 FB }

    condition:
        $my_text_string or $my_hex_string
}

e. Using Modules


Some modules are officially distributed with YARA like PE and Cuckoo. They can be imported just like python, but add double quotes.

import "pe"
import "cuckoo"

Once imported, you can use the feature by using its name prior to the function.

pe.entry_point == 0x1000
cuckoo.http_request(/someregexp/)

f. Undefined Values


Some values are left as undefined when they are ran. If the following rule executes on a file that's of type ELF but it finds the string, it will result in something like TRUE & Undefined.

import "pe"

rule Test
{
  strings:
      $a = "some string"

  condition:
      $a and pe.entry_point == 0x1000
}

Be careful.


g. External Variables


External variables allow you to define rules which depend on values provided from 'the other side'.

rule ExternalVariable1
{
    condition:
       ext_var == 10
}

ext_var is an external variable whose value is assigned at runtime, (use -d on the command line and parameter of compile and match methods in yara-python). External variables could be of types: int, str, or boolean.


External variables can be used with the operators: contains and matchesContains returns true if the string contains the specified substring. Matches returns true if the string matches the given regular expression.

rule ExternalVariable2
{
    condition:
        string_ext_var contains "text"
}

rule ExternalVariable3
{
    condition:
        string_ext_var matches /[a-z]+/
}

Contains **is True for ExternalVariable2 and **matches is True for ExternalVariable3

You can also use regex modifiers along with the matches operator.

rule ExternalVariableExample5
{
    condition:
        /* case insensitive single-line mode */
        string_ext_var matches /[a-z]+/is
}

This will match for case-insensitive due to the i.

Remember, you must define all external variables at run-time. This can be done with the **-d **argument.


h. Including files

Of course, you can include other files in yara, using the C-type import, #include...but without the # and with double quotes. You can use relative paths, absolute paths, and if windows, paths with drives.

include "Migos.yar"
include "../CardiB.yar"
include "/home/user/yara/IsRapper.yar"
include "c:\\yara\\includes\\oldRappers.yar"
include "c://yara/includes/oldRappers.yar"




The Tech Platform

0 comments

Comentários


bottom of page