
Detect image orientation angle based on text direction(根据文本方向检测图像方向角度)



我正在执行一项 OCR 任务,以从多个身份证明文件中提取信息.一个挑战是扫描图像的方向.需要固定 PAN、Aadhaar、驾驶执照或任何身份证明的扫描图像的方向.

已经在 Stackoverflow 和其他论坛上尝试过所有建议的方法,例如 OpenCV minAreaRect、霍夫线变换、FFT、单应性、具有 psm 0 的 tesseract osd.没有一个有效.

逻辑应返回文本方向的角度 - 0、90 和 270 度.附上0、90、270度的图片.这与确定偏度无关.



  • 将图像转换为灰度和高斯模糊
  • 获取二值图像的自适应阈值
  • 使用轮廓区域查找轮廓和过滤
  • 在蒙版上绘制过滤轮廓
  • 根据方向水平或垂直分割图像
  • 计算每一半的像素数



为了确定角度,我们根据图像的尺寸将图像分成两半.如果 <代码> 宽度 >height 那么它必须是水平图像,所以我们垂直分成两半.如果 <代码> 高度 >宽度 那么它必须是垂直图像所以我们水平分割成两半

现在我们有两半,我们可以使用 cv2.countNonZero() 来确定每一半的白色像素的数量.以下是确定角度的逻辑:

如果是水平的如果左 >= 右度->0别的度->180如果垂直如果顶部 >= 底部度->270别的度->90



右 3975

因此图像是 0 度.这是其他方向的结果


离开 3975

右 9703

我们可以得出结论,图像翻转了 180 度



前 3947 个

底部 9550


导入 cv2将 numpy 导入为 npdef 检测角度(图像):掩码 = np.zeros(image.shape,dtype=np.uint8)灰色 = cv2.cvtColor(图像,cv2.COLOR_BGR2GRAY)模糊 = cv2.GaussianBlur(灰色, (3,3), 0)自适应 = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,15,4)cnts = cv2.findContours(自适应,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)cnts = cnts[0] 如果 len(cnts) == 2 否则 cnts[1]对于 cnts 中的 c:面积 = cv2.contourArea(c)如果面积 <45000 和区域 >20:cv2.drawContours(掩码,[c],-1,(255,255,255),-1)掩码 = cv2.cvtColor(掩码,cv2.COLOR_BGR2GRAY)h, w = mask.shape# 水平的如果 w >H:左 = 掩码[0:h, 0:0+w//2]右 = 掩码 [0:h, w//2:]left_pixels = cv2.countNonZero(左)right_pixels = cv2.countNonZero(右)如果 left_pixels >= right_pixels 则返回 0 否则 180# 垂直的别的:顶部 = 掩码[0:h//2, 0:w]底部 = 掩码[h//2:, 0:w]top_pixels = cv2.countNonZero(top)bottom_pixels = cv2.countNonZero(底部)如果 bottom_pixels >= top_pixels 则返回 90,否则返回 270如果 __name__ == '__main__':图像 = cv2.imread('1.png')角度 = 检测角度(图像)打印(角度)

I am working on a OCR task to extract information from multiple ID proof documents. One challenge is the orientation of the scanned image. The need is to fix the orientation of the scanned image of PAN, Aadhaar, Driving License or any ID proof.

Already tried all suggested approaches on Stackoverflow and other forums such as OpenCV minAreaRect, Hough Lines Transforms, FFT, homography, tesseract osd with psm 0. None are working.

The logic should return the angle of the text direction - 0, 90 and 270 degrees. Attached are the images of 0, 90 and 270 degrees. This is not about determining the skewness.


Here's an approach based on the assumption that the majority of the text is skewed onto one side. The idea is that we can determine the angle based on the where the major text region is located

  • Convert image to grayscale and Gaussian blur
  • Adaptive threshold to get a binary image
  • Find contours and filter using contour area
  • Draw filtered contours onto mask
  • Split image horizontally or vertically based on orientation
  • Count number of pixels in each half

After converting to grayscale and Gaussian blurring, we adaptive threshold to obtain a binary image

From here we find contours and filter using contour area to remove the small noise particles and the large border. We draw any contours that pass this filter onto a mask

To determine the angle, we split the image in half based on the image's dimension. If width > height then it must be a horizontal image so we split in half vertically. if height > width then it must be a vertical image so we split in half horizontally

Now that we have two halves, we can use cv2.countNonZero() to determine the amount of white pixels on each half. Here's the logic to determine angle:

if horizontal
    if left >= right 
        degree -> 0
        degree -> 180
if vertical
    if top >= bottom
        degree -> 270
        degree -> 90

left 9703

right 3975

Therefore the image is 0 degrees. Here's the results from other orientations

left 3975

right 9703

We can conclude that the image is flipped 180 degrees

Here's results for vertical image. Note since its a vertical image, we split horizontally

top 3947

bottom 9550

Therefore the result is 90 degrees

import cv2
import numpy as np

def detect_angle(image):
    mask = np.zeros(image.shape, dtype=np.uint8)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (3,3), 0)
    adaptive = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,15,4)

    cnts = cv2.findContours(adaptive, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]

    for c in cnts:
        area = cv2.contourArea(c)
        if area < 45000 and area > 20:
            cv2.drawContours(mask, [c], -1, (255,255,255), -1)

    mask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
    h, w = mask.shape
    # Horizontal
    if w > h:
        left = mask[0:h, 0:0+w//2]
        right = mask[0:h, w//2:]
        left_pixels = cv2.countNonZero(left)
        right_pixels = cv2.countNonZero(right)
        return 0 if left_pixels >= right_pixels else 180
    # Vertical
        top = mask[0:h//2, 0:w]
        bottom = mask[h//2:, 0:w]
        top_pixels = cv2.countNonZero(top)
        bottom_pixels = cv2.countNonZero(bottom)
        return 90 if bottom_pixels >= top_pixels else 270

if __name__ == '__main__':
    image = cv2.imread('1.png')
    angle = detect_angle(image)


