1 Star 0 Fork 1

mars1986/tabby

forked from tbb/tabby 
加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

tabby

Platforms Java version License

TABBY is a Java Code Analysis Tool based on Soot.

It can parse JAR/WAR/CLASS files to CPG (Code Property Graph) based on Neo4j.

TABBY是一款针对Java语言的静态代码分析工具。

它使用静态分析框架 Soot 作为语义提取工具,将JAR/WAR/CLASS文件转化为代码属性图。 并使用 Neo4j 图数据库来存储生成的代码属性图CPG。

Note: 如果使用中存在什么问题,欢迎在discussions提问!

Note: Welcome to new a discussion at discussions about TABBY!

#1 使用方法

使用Tabby需要有以下环境:

  • JDK8的环境
  • 可用的Neo4j图数据库 Neo4j环境配置
  • Neo4j Browser 或者其他可以进行Neo4j可视化的工具

具体的使用方法参见:Tabby食用指北

#2 Tabby的适用人群

开发Tabby的初衷是想要提高代码审计的效率,尽可能的减少人工检索的工作量

使用tabby生成的代码属性图(当前版本1.1.0)可以完成以下的工作场景:

  • 挖掘目标代码文件中潜藏的Java反序列化利用链
  • 搜索符合特定条件的函数、类,譬如检索调用了危险函数的静态函数
  • 从某个端点到sink点的调用路径,用于检索可能的漏洞触发路径(如weblogic xmldecoder相关从processRequest到XMLDecoder.readObject的路径)

以前对Jar/War/Class的分析方法,往往为先反编译成java文件,然后再通过人工搜索特定函数来进行分析。

而有了Tabby之后,我们可以先生成相应的代码属性图,然后使用Neo4j的查询语法来进行特定函数的搜索,特定条件的利用路径检索等

#3 成果

#4 问题

1. 关于代码属性图的设计思路?

[1] Martin M, Livshits B, Lam M S. Finding application errors and security flaws using PQL: a program query language[J]. Acm Sigplan Notices, 2005, 40(10): 365-383.

[2] Yamaguchi F, Golde N, Arp D, et al. Modeling and discovering vulnerabilities with code property graphs[C]//2014 IEEE Symposium on Security and Privacy. IEEE, 2014: 590-604.

[3] Backes M, Rieck K, Skoruppa M, et al. Efficient and flexible discovery of php application vulnerabilities[C]//2017 IEEE european symposium on security and privacy (EuroS&P). IEEE, 2017: 334-349.

如上三篇论文在代码属性图的构建方案上做了相关尝试,但这些方案均不适用于Java语言这种面向对象语言。为什么?

首先,我们希望代码属性图最终能达成什么样的效果?对我来说,我希望我能利用代码属性图找到完整的路径,从而无需代码的实现去做可达路径的查找

所以,依据这个想法,我们需要解决的一点是Java语言的多态特性。在反序列化利用链中,可以发现的是很多利用链均是不等数量的gadget"拼接"起来,而这个"拼接"的操作就是多态特性所有具体实现函数的枚举

但是在图上来看,其实不同的gadget之间其实是分裂的

为了解决上面的问题,我提出了面向Java语言的代码属性图构建方案,包括类关系图、函数别名图、精确的函数调用图。

这其中函数别名图将所有的函数实现关系进行了聚合,这样在图的层面来看,ALIAS依赖边连接了不同的gadget,从而解决了Java多态的问题。

具体的细节可以看我的毕业论文,或是直接看代码。

2. 设计的代码属性图存在哪些问题?

Tabby的实现肯定会存在分析遗漏或错误的情况,但当前版本的tabby生成的代码属性图可以覆盖大多数现有的利用链,详见成果部分

从程序分析的角度,tabby的实现必然会存在可控性分析遗漏的问题,有时候遗漏会造成精确函数调用图的不精确,这部分将持续进行更新优化。

而从使用体验来看,函数别名图的使用会导致如下情况的误报

class B {
    public void func(){

    }
}
class A extends B{
    public void func(){}

    public void func1(){
        this.func();
    }
}
class C extends B{
    public void func(){}
}

假设A对象的func继承了B对象,并且重载了函数func。那么此时会出现什么问题?

首先,func1函数中会存在函数调用func1-[:CALL]>A.func,并且func函数存在ALIAS依赖边关系A.func-[:ALIAS]-B.func

那么,从图检索的角度来看,会存在这样一条通路func1-[:CALL]>A.func-[:ALIAS]-B.func-[:ALIAS]-C.func

但是,我们看代码,这条通路肯定是不可能的,因为A.func1实际调用的是A.func,并不存在本身对象被替换为C对象的可能。

所以此时也就造成了误报。那么怎么解决这个误报问题呢?这里就看第4个问题吧

3. 我该怎么利用Tabby生成的代码属性图

Tabby生成的代码属性图实际上是由类关系图、函数别名图和精确的函数调用图组成的。它并不会直接输出类似利用链的联通路径,需要你使用相关的图查询语法进行查询而得出。

Tabby生成的代码属性图支持两种模式,一是人工判断,二是编写污点分析的自动化利用脚本。

首先,对于人工判断,利用图查询语言边查询,边人工对照具体代码来进行分析。这里其实工作量是比较大的,所以也提供了自动化的机制

然后,是自动化脚本的方式。Tabby对每一条函数调用边CALL,均计算了当前调用本身的可控性,具体参数为CALL边的POLLUTED_POSITION

举个例子,当POLLUTED_POSITION为[0,-1,-2,1]时,其中数组的index分别指向调用者本身、函数参数集等,数据的值指代的是当前受污点的变量指向

以当前例子来说明,数据的第一个位置指代的是当前函数的调用者本身的执行,当前为0,0指代调用者来自函数参数

数组第二个位置指代的是调用函数的参数的第一个参数为-1,-1指代类属性

数组第三个位置指代的是调用函数的参数的第二个参数为-2,-2指代当前位置的变量不可控

数组第四个位置指代的是调用函数的参数的第三个参数为1,1指代当前位置的变量来自函数参数的第2个

即数组内容

  • -2 => 不可控
  • -1 => 类属性
  • 0-n => 函数参数的位置

利用这些信息,可以进行从底向上的污点分析。sink函数处提供了先验知识,通过与调用边的POLLUTED_POSITION进行比较得出当前调用是否是可控的

4. 关于自动化的利用,看起来很复杂,会不会出相关的案例

对于检索出来的可联通路径,我们还需要进行进一步的判断。这里可以人工直接跟着代码去分析判断,也可以使用上面的自动化分析方案进行通路的分析(这部分也能直接解决前面函数别名图的误报问题,即提前判断下一个节点是否是允许具体实现枚举的)

当前还没有想好怎么来实现这部分的代码,可能是编写Neo4j的UDF来完成自动化利用,也可能是直接tabby实现,这里暂时TODO

5. 关于性能问题

tabby本来预想的是以阶段性的方式,以分析后的基础库分析结果为基础,后续分析依赖h2数据库中的先验知识。但是实际在使用的时候h2数据库的性能还是不太行,这一部分以后有时间对构架重新编写吧

所以这里推荐的使用方式是依赖内存去进行一次性的分析:每次分析前先将上一次的分析结果(cache/graphdb.mv.db以及rules/ignore.json)删除,然后再进行代码属性图的生成。

tabby实验的时候大概6gb的内存可以处理4w+类

然后关于运行速度,tabby当前仍是以单进程顺序分析的方式进行的,且本质上分析任务是计算IO类型的,多线程是否能提高效率这里存疑。

目前,tabby对JDK 19个Jar文件(3w多个类)分析需要7分种多点(450s左右),所以可以take a cup of coffee XD

我的测试机是比较老版本的mac pro,所以以上测试数据可以作为一个参考。

6. 常见报错

常见报错主要是soot产生的

  • 基础类缺失,这部分可以从soot的报错信息看到具体补救方式,tabby提供了basicClasses.json用于解决这一问题
  • soot getBody convert error,这个错误暂无解决方案,是soot的解析问题,只能将当前这个会报错的jar文件移除。譬如weblogic 12g的wlthint3client.jar文件会有这个问题,只能等soot更新。
  • 其他由tabby产生的bug,譬如空指针异常,可以直接提issue给我并附上产生错误的jar文件。

此外,tabby主要经过了MACOX的测试,暂未在其他的平台进行测试。嗯,不确定win平台行不行(主要是获取jdk依赖的方式需要适配)。

7. 使用小trick

其实,在属性图生成的过程中,许多代码分析其实是无用的,但是由于程序没办法判断是否是无用的,所以该全量分析就得全量分析。

但是,如果遇到及其消耗内存或cpu计算能力的情况(即卡在了函数处理进度处)

可以使用以下方法对分析进行优化:

  1. 运行jar时加上debug,-Dlogging.level.tabby=DEBUG,然后看它最终在那个函数处消耗特别大或就卡在那里了
  2. 打开IDEA,加lib,找到具体的实现,如果这个函数经过人工分析后,是认为可以被忽略的,那么添加至knowledge库
  3. knowledges.json,添加ignore规则,比如致远的一个函数<com.seeyon.ctp.common.parser.BytesEncodingDetect: void initialize_frequencies()>,ignore规则如下
{"name": "com.seeyon.ctp.common.parser.BytesEncodingDetect", "rules":[
    {"function": "initialize_frequencies", "type": "ignore", "vul": "","actions":{}, "polluted":[], "signatures":[]}
  ]}
  1. 添加完ignore规则后,再运行tabby就可以跳过该函数的分析

#5 初衷&致谢

当初,在进行利用链分析的过程中,深刻认识到这一过程是能被自动化所代替的(不管是Java还是PHP)。但是,国内很少有这方面工具的开源。GI工具实际的检测效果其实并不好,为此,依据我对程序分析的理解,开发了tabby工具。我对tabby工具期望不单单只是在利用链挖掘的应用,也希望后续能从漏洞分析的角度利用tabby的代码属性图进行分析。我希望tabby能给国内的Java安全研究人员带来新的工作模式。

当然,当前版本的tabby仍然存在很多问题可以优化,希望有程序分析经验的师傅能一起加入tabby的建设当中,有啥问题可以直接联系我哦!

如果tabby给你的工作带来了便利,请不要吝啬你的🌟哦!

如果你使用tabby并挖到了漏洞,非常欢迎提供相关的成功案例XD

如果你有能力一起建设,也可以一起交流,或直接PR,或直接issue

2021.12.02 Updated:

目前看,tabby确实能发现一些现实环境中的安全问题。

但算法实现存在漏报(个人认为是比较严重的问题),目前回过头看,代码实现也过于ugly。

目前决定不重改当前构架,新版思路已实现1/4,但开源时间无法预知XD

临时添加了全量函数调用图,会增加误报,但不会出现漏报,可以添加--isFullCG参数来生成。

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

暂无描述 展开 收起
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/mars1986/tabby.git
git@gitee.com:mars1986/tabby.git
mars1986
tabby
tabby
master

搜索帮助