Quick line segmentation

From the Equation OCR front line…

Problem:

Splitting up handwritten calculations into single lines.

Algorithm description:

First map all the strokes on the y-axis. Then travel along the axis and record the the ink density (number of strokes underneath). Remember local maximums. If the density falls below a certain percentage (tolerance) of a last local maximum – insert a break line.

C# code:

[csharp]
// Positions of line dividers
List<double> LineDivisions = new List<double>();

// Top & Bottom edges of each Stroke
SortedDictionary edgePoints = new SortedDictionary();

// How much of ink density do we tolerate between the lines?
// This is usefull when the lines are overlapping
// 0 means there must be a clear interval between the lines
// 12% means that if the density goes below 12% of local peak then we’re dealing with a line break
// Too high value might result in false line breaks.
double tolerance = 0.12;

double lastDownhillPosition = 0;
double densityLocalPeak = 0;
double density = 0;

foreach (Stroke stroke in InkCanvas.Strokes)
{
Rect rect = stroke.GetBounds();

if (edgePoints.ContainsKey(rect.Top))
edgePoints[rect.Top]++;
else
edgePoints.Add(rect.Top, 1);

if (edgePoints.ContainsKey(rect.Bottom))
edgePoints[rect.Bottom]–;
else
edgePoints.Add(rect.Bottom, -1);
}

// Travel the density map & mark the all the line breaks
foreach (KeyValuePair edgePoint in edgePoints)
{
// This part is in front because we want to peek whether the next point will bring us high or low
// If it goes lower, then we don’t want to insert line break yet – let’s wait…
// If higher (edgePoint.Value > 0), that means we just passed a local minimum and that’s a potential line break.
if ((density < densityLocalPeak * tolerance)
&& (edgePoint.Value > 0) // we just passed local minimum
&& (lastDownhillPosition != 0)) // don’t put the line break above the top line
{
LineDivisions.Add((edgePoint.Key + lastDownhillPosition) / 2);
densityLocalPeak = 0;
}

density += edgePoint.Value;

if (edgePoint.Value < 0)
lastDownhillPosition = edgePoint.Key;

if (density > densityLocalPeak)
densityLocalPeak = density;
}

// mark the end of the document (usefull for further processing)
if (InkCanvas.Strokes.Count > 0)
LineDivisions.Add(InkCanvas.Strokes.GetBounds().Bottom + 4);
[/csharp]

Sample results

My friend’s calculations (posted without asking for any permission :)
Used Tolerance level: 12%.
Note the ink density visualized on the left edge. For better overview a different color was applied for each line.

Sample A - quick segmentation

Sample A - quick line segmentation

Sample B - quick line segmentation

Sample B - quick line segmentation

Comparison with Microsoft.Ink.Divider results

Since the API for equation recognition engine is not available in Win7 I used the regular handwriting processing library.
Results for the same samples (note the differences between various runs!):

Sample A - Microsoft.Ink.Divider run 1

Sample A - Microsoft.Ink.Divider run 1

Sample B - Microsoft.Ink.Divider run 1

Sample B - Microsoft.Ink.Divider run 1

Sample A - Microsoft.Ink.Divider run 2

Sample A - Microsoft.Ink.Divider run 2

Sample B - Microsoft.Ink.Divider run 2

Sample B - Microsoft.Ink.Divider run 2

Sample A - Microsoft.Ink.Divider run 3

Sample A - Microsoft.Ink.Divider run 3

Sample B - Microsoft.Ink.Divider run 3

Sample B - Microsoft.Ink.Divider run 3

The variations in outputs are probably result of indeterministic Neural Network algorithm. The method also does character analysis and most likely uses this context information too. All that funky magic costs quite some execution time – from 100 to 800 ms.

Algorithm described at start takes 1 – 2 ms for the same samples and frankly speaking I like it’s results more. The fact that it has adjustable tolerance level and provides consistent deterministic results makes me feel better too. However it still got some limitations:

  • produces false break lines for small samples (i.e. for 1/2 fraction alone) – this can be mitigated by requiring a certain ink density level inside each line or taking height/width ratio into consideration
  • won’t work for non-horizontal formatted writing – the document need to be normalized (rotated) correctly
  • can’t handle scribble-styled calculations (i.e. many columns with each having different line structures) – would need to introduce vertical segmentation too

All of the above limitations can be easily fixed. The goal of this post was to introduce the idea in a form as simple as possible.

Tags: ,