Metadata-Version: 1.1
Name: pyrefo
Version: 0.1
Summary: a fast regex for object
Home-page: http://github.com/yimian/pyrefo
Author: zhangjinjie
Author-email: zhangjinjie@yimian.com.cn
License: GPLv3+
Description: ### pyrefo: a fast regex for object
        
        This project is based on [refo](https://github.com/machinalis/refo) and the paper [Regular Expression Matching: the Virtual Machine Approach](https://swtch.com/~rsc/regexp/regexp2.html), it use cffi to extend python with c to speed accelerate processing performance.
        
        This project has done the following work:
        
        1. full compatiable with refo api, support all patterns and match, search, finditer methods;
        2. fix c source bug included in the paper;
        3. use cffi to extend python with c;
        4. add new feature which supports partial match;
        5. add new `Phrase`pattern which can realize `'ab'`match `['a', 'b', 'c']`list;
        
        
        
        ### performance test
        
        #### prerequisites
        
        ```python
        import jieba
        text = '为什么在本店买东西？因为物流迅速＋品质保证。为什么我购买的每件商品评价都一样呢？因为我买的东西太多了，积累了很多未评价的订单，所以我统一用这段话作为评价内容。如果我用了这段话作为评价，那就说明这款产品非常赞，非常好！'
        tokens = list(jieba.cut(text))
        ```
        
        #### CPython
        
        - pyrefo
        
        ```python
        from pyrefo import search, Group, Star, Any, Literal
        %timeit search(Group(Literal('物流') + Star(Any()) + Literal('迅速'), 'a'), tokens)
        ```
        
        ```shell
        95.9 µs ± 472 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
        ```
        
        - refo
        
        ```python
        import refo
        %timeit refo.search(refo.Group(refo.Literal('物流') + refo.Star(refo.Any()) + refo.Literal('迅速'), 'a'), tokens)
        ```
        
        ```shell
        1.03 ms ± 7.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
        ```
        
        - re
        
        ```python
        import re
        %timeit re.search('(物流.*速度)', text)
        ```
        
        ```shell
        989 ns ± 4.69 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
        ```
        
        #### PyPy
        
        - pyrefo
        
        ```python
        from pyrefo import search, Group, Star, Any, Literal
        %timeit search(Group(Literal('物流') + Star(Any()) + Literal('迅速'), 'a'), tokens)
        ```
        
        ```shell
        53.4 µs ± 28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
        ```
        
        - refo
        
        ```python
        import refo
        %timeit refo.search(refo.Group(refo.Literal('物流') + refo.Star(refo.Any()) + refo.Literal('迅速'), 'a'), tokens)
        ```
        
        ```shell
        78 µs ± 35.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
        ```
        
        - re
        
        ```shell
        import re
        %timeit re.search('(物流.*速度)', text)
        ```
        
        ```shell
        347 ns ± 3.26 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
        ```
Keywords: regex
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Text Processing :: Linguistic
