Skip to content

Files

Latest commit

Dec 22, 2016
8b5c5f9 · Dec 22, 2016

History

History
This branch is 6 commits behind lining0806/PythonSpiderNotes:master.

Captcha1

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Dec 22, 2016
Dec 22, 2016
Dec 22, 2016
Dec 22, 2016
Dec 22, 2016
Dec 22, 2016
Dec 22, 2016

验证码识别项目第一版:Captcha1

本项目采用Tesseract V3.01版本(V3.02版本在训练时有改动,多shapeclustering过程)

Tesseract用法:

  • 配置环境变量TESSDATA_PREFIX =“D:\Tesseract-ocr\”,即tessdata的目录,在源码中会到这个路径下查找相应的字库文件用来识别。

  • 命令格式:
    tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

  • 只识别成数字
    tesseract imagename outputbase -l eng digits

  • 解决empty page!!
    -psm N

    7 = Treat the image as a single text line
    tesseract imagename outputbase -l eng -psm 7

  • configfile 参数值为tessdata\configs 和 tessdata\tessconfigs 目录下的文件名:
    tesseract imagename outputbase -l eng nobatch

验证码识别项目使用方法1:

  • 将下载的图片放到./pic目录下,

    验证码图片名称:get_random.jpg
    价格图片名称:get_price_img.png

  • 命令格式:

    验证码图片识别:python tess_test.py ./pic/get_random.jpg
    价格图片识别:python tess_test.py ./pic/get_price_img.png

打印出识别的结果

若要将结果存在临时文本文件temp.txt中,则修改pytessr_pro.py中代码"cleanup_scratch_flag = True"改为"cleanup_scratch_flag = False"