The Transcriber pattern
Introduction
I would like to share a small and very easy to write class that has helped me in the past a lot. It can help greatly when transcribing one piece of text to another, applying some transformation along the way. Without further ado:
class Transcriber:
def __init__(self, source):
self.source = source
self.destination = []
self.ptr = 0
# Copy
def copy_until(self, until):
self.destination.append(self.source[self.ptr:until])
self.ptr = until
def copy(self, length):
self.copy_until(self.ptr + length)
def copy_until_end(self):
self.copy_until(-1)
# Skip
def skip_until(self, until):
self.ptr = until
def skip(self, length):
self.skip_until(self.ptr + length)
# Add
def add(self, text):
self.destination.append(text)
def get_destination(self):
return "".join(self.destination)
First example
Lets see it in action. Here we replace all lines that contain the word ‘error’ to uppercase:
def capitalize_error(source):
transcriber = Transcriber(source)
for line in source.splitlines():
if 'error' in line:
transcriber.add(line.upper())
transcriber.skip(len(line))
else:
transcriber.copy(len(line))
transcriber.copy(1) # Copy the '\n'
return transcriber.get_destination()
print(capitalize_error("""
line one
line two
line three with error
line four
""".strip()))
# <<< line one
# <<< line two
# <<< LINE THREE WITH ERROR
# <<< line four
Explanation
To describe what this does with text (should probably be a docstring), we could say:
Monitors the gradual transcription of
source
todestination
. During the process, you can copy or skip chunks ofsource
and interject your own chunks.
As is frequent with pieces of code I am fond of, the nice thing about the
Transcriber
is not that it does something very complicated or clever. Rather,
it exposes an interface that abstracts something very tedious, allowing you to
write good-looking and unobscure code.
So, the idea is that you initialize the Transcriber
object with a source
argument. Then you inspect the contents of source
in parallel, figuring out
interesting positions. Finally you use the copy
, skip
and add
methods on
the Transcriber
to either copy or replace parts of the source to your
destination.
In some use cases, the content we are trying to transcribe follows a specific
format (eg XML). It is usually preferable to use an appropriate library to
deserialize the content into a data structure, modify the data structure to
what we want and then use the library to serialize it back the desired format.
I agree, in most cases this is preferable. The Transcriber
can help a lot
when you want to preserve as much of the source content as possible.
More examples
Here we capitalize the contents of a <b>
tag in XML:
def capitalize_b(source):
transcriber = Transcriber(source)
ptr = 0
while True:
try:
# 'hello <b>john</b>, how was your day?'
# ^
ptr = source.index('<b', ptr)
except ValueError:
break
try:
# 'hello <b>john</b>, how was your day?'
# ^
ptr = source.index('>', ptr)
except ValueError:
break
# 'hello <b>john</b>, how was your day?'
# ^
start = ptr + 1
try:
# 'hello <b>john</b>, how was your day?'
# ^
end = source.index('</b>', start)
except ValueError:
break
transcriber.copy_until(start)
transcriber.add(source[start:end].upper())
transcriber.skip_until(end)
# 'hello <b>john</b>, how was your day?'
# ^
ptr = end + 4
transcriber.copy_until_end()
return transcriber.get_destination()
capitalize_b('hello <b class="red">john</b>, how was your <b>day</b>?')
# <<< 'hello <b class="red">JOHN</b>, how was your <b>DAY</b>?'
Lets try another one:
import ast
def get_pos(source, lineno, col_offset):
""" Resolve the actual position based on line and column offset. """
return (sum((len(line) + 1 for line in source.splitlines()[:lineno - 1])) +
col_offset)
class MyVisitor(ast.NodeVisitor):
def __init__(self, source):
super().__init__()
self.source = source
self.transcriber = Transcriber(source)
def visit_Call(self, node):
if isinstance(node.func, ast.Name) and node.func.id == "foo":
start = get_pos(self.source,
node.func.lineno,
node.func.col_offset)
end = get_pos(self.source,
node.func.end_lineno,
node.func.end_col_offset)
self.transcriber.copy_until(start)
self.transcriber.add("BAR")
self.transcriber.skip_until(end)
self.generic_visit(node)
def replace_foo_with_BAR(source):
v = MyVisitor(source)
v.visit(ast.parse(source))
v.transcriber.copy_until_end()
return v.transcriber.get_destination()
source = """
foo("Simple use-case")
foo = "When not used as a function call it should not be transformed"
func("neither when used as an argument", foo)
if some_condition:
foo("Works in any place of the code")
foo("Also when", kwarg=foo("nested"))
"""
print(replace_foo_with_BAR(source))
# <<< BAR("Simple use-case")
# <<< foo = "When not used as a function call it should not be transformed"
# <<< func("neither when used as an argument", foo)
# <<< if some_condition:
# <<< BAR("Works in any place of the code")
# <<< BAR("Also when", kwarg=BAR("nested"))