YARA

语法

规则

1
2
3
4
5
6
7
8
9
10
rule dummy		// rule前面加上global/private可表示全局/私有规则
{
condition:
false
}

rule TagsExample1 : Foo Bar Baz // 添加标签
{
...
}

元数据 | metadata

1
2
3
4
meta:
my_identifier_1 = "Some string data"
my_identifier_2 = 24
my_identifier_3 = true

注释 | comments

1
2
3
4
5
/*
This is a multi-line comment ...
*/

// ... and this is single-line comment

字符串 | strings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
strings:
// 十六进制字符串 | hexadecimal strings
$hex_string_01 = { E2 34 ?? C8 A? FB }
$hex_string_02 = { F4 23 [4-6] 62 B4 } // 中间包含 4-6 bytes
$hex_string_03 = { F4 23 ( 62 B4 | 56 ) 45 } // 包含 F42362B445 / F4235645

// 文本字符串 | text strings
$text_string_01 = "foobar"
$text_string_02 = "foobar" nocase // 忽略大小写
$text_string_03 = "foobar" fullword // 完全匹配foobar,前后没有字母/数字

// 宽字符字符串 | wide-character strings
$wide_string = "Borland" wide
$wide_and_ascii_string = "Borland" wide ascii

// XOR strings
$xor_string_01 = "This program cannot" xor
$xor_string_02 = "This program cannot" xor(0x01-0xff)
$xor_string_03 = "This program cannot" xor wide ascii

// 正则表达式 | regular expressions
$re1 = /md5: [0-9a-fA-F]{32}/
$re2 = /state: (on|off)/

// 私有字符串 | private strings
$private_string = "foobar" private

// 懒得写字符串名称时
$ = "lazycatz"

条件 | conditions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
condition:
($a or $b) and ($c or $d)
#a == 6 and #b > 10 // #a表示计算a出现的次数

$a at 100 and $b at 200 // 表示a出现在偏移100的位置,b出现在偏移200的位置(文件地址/虚拟地址,十进制)
$a in (0..100) and $b in (100..filesize) // a出现在偏移0-100的位置。b出现在偏移100的位置到文件末尾
// 通过@a[i]可以获取a的第i次出现时的地址
// 通过!a[i]可以获取d的第i次出现时的长度

filesize > 200KB // 文件大小大于200KB

// entrypoint已弃用,需从pe.entry_point中调用

// 获取某个地址中的数据
// 默认小端序,支持 u?int(8|16|32)(be)? 类型
// MZ signature at offset 0 and ...
uint16(0) == 0x5A4D and
// ... PE signature at offset stored in MZ header at 0x3C
uint32(uint32(0x3C)) == 0x00004550

// 字符串集合
2 of ($a,$b,$c) // 至少出现其中两个字符串
all of them // 出现所有字符串
any of them // 出现任意一个字符串
all of ($a*) // 出现所有$a开头的字符串
any of ($a,$b,$c) // 出现$a, $b, $c中任意一个
1 of ($*) // $* 表示所有字符串,1 of ($*)等同any of them


// for expression of string_set : ( boolean_expression )
for all of them : ( # > 3 ) // 所有字符串的出现次数都要大于3
for all of ($a*) : ( @ > @b ) // 所有$a开头的字符串的地址要大于字符串$b的地址
// for expression identifier in indexes : ( boolean_expression )
for all i in (1..3) : ( @a[i] + 10 == @b[i] ) // i从13
for all i in (1..#a) : ( @a[i] < 100 )

$a and Rule1 // 引用Rule1



使用模块 | using modules

1
2
3
4
5
6
7
8
9
10
import "pe"

rule Test
{
strings:
$a = "some string"

condition:
$a and pe.entry_point == 0x1000
}

外部变量 | external variables

1
2
3
4
5
condition:
bool_ext_var or filesize < int_ext_var
string_ext_var_01 contains "text"
string_ext_var_02 matches /[a-z]+/ // /[a-z]+/后加i或s表示忽略大小写/单行识别
// 外部变量需在命令行中给出

文件包含 | including files

1
include "./includes/other.yar"

yara特征提取

yarGen

环境要求:4GB RAM / 8GB RAM(使用—opcodes分析操作码)

依赖包安装

1
sudo pip install pefile scandir lxml naiveBayesClassifier

数据库下载更新

1
python yarGen.py --update

数据库下载地址

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
https://www.bsk-consulting.de/yargen/good-exports-part1.db
https://www.bsk-consulting.de/yargen/good-exports-part2.db
https://www.bsk-consulting.de/yargen/good-exports-part3.db
https://www.bsk-consulting.de/yargen/good-exports-part4.db
https://www.bsk-consulting.de/yargen/good-exports-part5.db
https://www.bsk-consulting.de/yargen/good-exports-part6.db
https://www.bsk-consulting.de/yargen/good-exports-part7.db
https://www.bsk-consulting.de/yargen/good-exports-part8.db
https://www.bsk-consulting.de/yargen/good-exports-part9.db
https://www.bsk-consulting.de/yargen/good-imphashes-part1.db
https://www.bsk-consulting.de/yargen/good-imphashes-part2.db
https://www.bsk-consulting.de/yargen/good-imphashes-part3.db
https://www.bsk-consulting.de/yargen/good-imphashes-part4.db
https://www.bsk-consulting.de/yargen/good-imphashes-part5.db
https://www.bsk-consulting.de/yargen/good-imphashes-part6.db
https://www.bsk-consulting.de/yargen/good-imphashes-part7.db
https://www.bsk-consulting.de/yargen/good-imphashes-part8.db
https://www.bsk-consulting.de/yargen/good-imphashes-part9.db
https://www.bsk-consulting.de/yargen/good-opcodes-part1.db
https://www.bsk-consulting.de/yargen/good-opcodes-part2.db
https://www.bsk-consulting.de/yargen/good-opcodes-part3.db
https://www.bsk-consulting.de/yargen/good-opcodes-part4.db
https://www.bsk-consulting.de/yargen/good-opcodes-part5.db
https://www.bsk-consulting.de/yargen/good-opcodes-part6.db
https://www.bsk-consulting.de/yargen/good-opcodes-part7.db
https://www.bsk-consulting.de/yargen/good-opcodes-part8.db
https://www.bsk-consulting.de/yargen/good-opcodes-part9.db
https://www.bsk-consulting.de/yargen/good-strings-part1.db
https://www.bsk-consulting.de/yargen/good-strings-part2.db
https://www.bsk-consulting.de/yargen/good-strings-part3.db
https://www.bsk-consulting.de/yargen/good-strings-part4.db
https://www.bsk-consulting.de/yargen/good-strings-part5.db
https://www.bsk-consulting.de/yargen/good-strings-part6.db
https://www.bsk-consulting.de/yargen/good-strings-part7.db
https://www.bsk-consulting.de/yargen/good-strings-part8.db
https://www.bsk-consulting.de/yargen/good-strings-part9.db

yara特征提取

1
python -m <dir> -o <output file>

references