【PaddleOCR改造】在模型串联部分将文本检测修改为YOLO目标检测

使用YOLO作为PaddleOCR的文本检测模块。

陶情适性

1983人浏览 · 2023-10-12 18:32:29

陶情适性 · 2023-10-12 18:32:29 发布

文章目录

一、概要
二、改造流程
三、最终输出

一、概要

因为最近工作接到一个项目，需要用到OCR，以前进行OCR开发也是使用paddleocr，所以继续使用了paddleocr。
这个项目算是paddleocr的实例项目中提到的【卡证类OCR】，只需要识别图片中的某几个关键信息，直接使用paddleocr虽然可以识别出所需内容，但是还需要结构化输出。
所以我打算使用ppocrlabel在数据上进行文本检测标注再用ch_ppocrv4_server的预训练模型微调，然后直接使用文本识别的推理模型。
但是！！！文本检测的效果很差，我要检测5个关键信息，微调好的模型漏检很严重。然后直接去paddleocr的官方github上提issues，也找了其他人的进行文本检测训练中问题，看到有人提到用YOLO来做目标检测，这时我想试试也好，正好YOLO的分类结果作为键，候选框丢给文本识别。
接下来是我所进行的工作。哦，使用的PaddleOCR版本应该是最新的，ppocrv4。

二、改造流程

2.1 修改predict_system.py

predict_system.py是在ppocr/tools/infer下的，那么主要修改哪里呢？

2.1.1 注释部分

注释1

import tools.infer.utility as utility
import tools.infer.predict_rec_v2 as predict_rec
# import tools.infer.predict_det as predict_det # 注释det的导入
import tools.infer.predict_cls_v2 as predict_cls

注释2

 # self.text_detector = predict_det.TextDetector(args) # 注释det实例化
self.text_recognizer = predict_rec.TextRecognizer(args)

2.1.2 修改部分

原代码

def __call__(self, img, cls=True):
        time_dict = {'det': 0, 'rec': 0, 'csl': 0, 'all': 0}
        start = time.time()
        ori_im = img.copy()
        dt_boxes, elapse = self.text_detector(img)
        time_dict['det'] = elapse
        logger.debug("dt_boxes num : {}, elapse : {}".format(
            len(dt_boxes), elapse))
        if dt_boxes is None:
            return None, None
        img_crop_list = []

        dt_boxes = sorted_boxes(dt_boxes)

        for bno in range(len(dt_boxes)):
            tmp_box = copy.deepcopy(dt_boxes[bno])
            if self.args.det_box_type == "quad":
                img_crop = get_rotate_crop_image(ori_im, tmp_box)
            else:
                img_crop = get_minarea_rect_crop(ori_im, tmp_box)
            img_crop_list.append(img_crop)
        if self.use_angle_cls and cls:
            img_crop_list, angle_list, elapse = self.text_classifier(
                img_crop_list)
            time_dict['cls'] = elapse
            logger.debug("cls num  : {}, elapse : {}".format(
                len(img_crop_list), elapse))

        rec_res, elapse = self.text_recognizer(img_crop_list)
        time_dict['rec'] = elapse
        logger.debug("rec_res num  : {}, elapse : {}".format(
            len(rec_res), elapse))
        if self.args.save_crop_res:
            self.draw_crop_rec_res(self.args.crop_res_save_dir, img_crop_list,
                                   rec_res)
        filter_boxes, filter_rec_res = [], []
        for box, rec_result in zip(dt_boxes, rec_res):
            text, score = rec_result
            if score >= self.drop_score:
                filter_boxes.append(box)
                filter_rec_res.append(rec_result)
        end = time.time()
        time_dict['all'] = end - start
        return filter_boxes, filter_rec_res, time_dict

修改后

def __call__(self, img, cls=True):
        time_dict = {'det': 0, 'rec': 0, 'csl': 0, 'all': 0}
        start = time.time()
        ori_im = img.copy()
		
        dt_boxes, detbox, img_crop = detonnx.DETONNX(args_text)(ori_im)
        elapse = time.time() - start
        time_dict['det'] = elapse

        img_crop_list = []
        re_detbox = {}
        for i in range(len(detbox)):
            if str(type(detbox[list(detbox.keys())[i]]))!="<class 'NoneType'>" and img_crop[i] is not None:
                img_crop_list.append(img_crop[i])
                re_detbox.update({list(detbox.keys())[i]:detbox[list(detbox.keys())[i]]})

        if self.use_angle_cls and cls:
            re_detbox, angle_list, elapse = self.text_classifier(re_detbox)
            time_dict['cls'] = elapse
            logger.debug("cls num  : {}, elapse : {}".format(
                len(img_crop_list), elapse))

        rec_res, elapse = self.text_recognizer(re_detbox)
        time_dict['rec'] = elapse
        logger.debug("rec_res num  : {}, elapse : {}".format(
            len(rec_res), elapse))
        if self.args.save_crop_res:
            self.draw_crop_rec_res(self.args.crop_res_save_dir, img_crop_list,rec_res)
        end = time.time()
        time_dict['all'] = end - start
        return dt_boxes, img_crop_list, rec_res, time_dict

具体差别自己研究吧

修改内容的一些解释
因为我没用到串联的main()，所以main()没有修改，只调用了TextSystem()类。

args_text：YOLO的参数。比如阈值，图片宽高，模型路径等。
dt_boxes：YOLO的结果，即【坐标，类别，score】，主要为了可视化画框。
detbox：键值对，{类别：图片数组}，主要丢给文本识别的，这里的图片数组就是img_crop对应类别的图片数组。
img_crop：各个关键信息的候选框，是个图片数组列表，相当是把原图中的对关键信息的候选框输出为关键信息区域图。

后面的代码就好理解，自己研究下。

2.2 修改predict_rec.py

主要修改了__call__()函数，我把img_list改名成img_dict：

2.2.1 修改1

for img in img_list:
	width_list.append(img.shape[1] / float(img.shape[0]))
	# Sorting can speed up the recognition process
	indices = np.argsort(np.array(width_list))
    rec_res = [['', 0.0]] * img_num
    batch_num = self.rec_batch_num
    
修改为

for img in img_dict.values():
	width_dict.append(img.shape[1] / float(img.shape[0]))
	# Sorting can speed up the recognition process
	indices = list(img_dict.keys())
	rec_res = {i: [['', 0.0]] * img_num for i in indices}
	batch_num = self.rec_batch_num

2.2.2 修改2

rec_algorithm我只留了我自己用的模型的，要是使用其他模型按照一样的位置加入即可，就是for ino in range(beg_img_no, end_img_no)下的if-elif-else分支我只保留了else分支。

for ino in range(beg_img_no, end_img_no):
	norm_img = self.resize_norm_img(img_dict[indices[ino]],max_wh_ratio)
	norm_img = norm_img[np.newaxis, :]
	norm_img_batch.append(norm_img)

后面都一样了，cls模块部分我也修改了，和修改rec一样的方法。

三、最终输出

结构类似这样的：
{“类别1”:“识别结果1”,“类别2”:“识别结果2”,……}

满足结构化输出，虽然使用原来的PaddleOCR结合UIE-X也能达到同样的效果，但用YOLO作为检测模型也是一种方法。

结束！

MCP技术社区

欢迎加入 MCP 技术社区！与志同道合者携手前行，一同解锁 MCP 技术的无限可能！

更多推荐

8种封装的1700V国产碳化硅(SiC)功率模块产品介绍及应用

MCP技术社区

（一篇入门）汽车电子电器之电机MCU控制器四

MCP技术社区

OpenCV C++ 入门实战：从基础操作到类封装全解析

Mat数据结构、图像读写、颜色空间转换、像素操作、算术运算和轨迹栏交互，以及面向对象封装思想。图像处理：学习滤波（高斯滤波、中值滤波）、边缘检测（Canny、Sobel）、形态学操作（腐蚀、膨胀）。目标检测：掌握轮廓提取（）、特征匹配（）、Haar 级联分类器。视频处理：使用读取视频，对帧进行实时处理。性能优化：学习多线程、GPU 加速（cv::cuda模块），提升处理速度。通过持续实践，可逐步掌