Review of TensorFlow Sugar Coatings
NB: This is an opinionated review, which I put together to help me make a decision.
As someone who has worked most extensively with
Lasagne (with a bit
Blocks thrown in), looking at raw
TensorFlow makes me want to reach out
for some additional helper layers. There are lots of contenders, several of which
are backed by Googlers, each with a different emphasis and style.
To help make the choice, I assembled a (‘Rosetta Stone’ of TensorFlow implementations of CNN MNIST](ai/2016/11/26/layers-on-top-of-tensorflow) using each of the libraries.
If there’s anything wrong / inaccurate / highly-slanted in the following, please let me know in the comments -
I’m just interested in making a decent choice to speed up my
TensorFlow itself is too raw : Something that would help smooth out the defaults, and
figure out the sizing, etc would be useful. Even better if there were several of the more
sophisticated modules prebuilt (Bi-LSTM, attention, dropout, sane initialisation, batch normalisation, etc).
Keras, with ~10,000 stars and ~3,000 forks, is an obvious contender, however :
- Because it has switchable backends, Keras must effectively ‘take over’ the computational graph
- While it is possible to add custom layers, that will require getting familiar with Keras’ API, rather than
using TensorFlow directly - particularly since the backend interfaces to layers are implemented
- However, I feel that getting access to the ‘raw layers’ may be helpful in some circumstances,
which means that
Kerasis basically ruled out as a contender.
TF-Slim is part of the main
TensorFlow repo, and initially looked promising. However,
it appears that the main focus is on CNN applications -
and the breadth of other modules isn’t really there. For instance,
there has been a GitHub Issue
requesting an RNN implementation copied over from an identical request opened over 6 months ago without
any positive response other than ‘we (i.e. Google) will get to it soon’ - which is pretty unhelpful for a nominally Open Source package.
Also, it looks like many function arguments are positional, which doesn’t help readability.
tf.contrib.learn goes to great lengths to make itself accessible to people familiar with
and presents a mich higher level for training. But perhaps that makes it less interesting to me,
since I’m more interested in a ‘hands-on’ experience, rather
than using the trained model as part of a bigger data pipeline (however attractive that proposition sounds).
PrettyTensor is apparently produced by Googlers, but not in the official
TensorFlow mainline. One
very attractive feature is its chainable syntax, which makes it feel very functional in design. On the other hand,
its documentation is very disheartening (which is such a pity, since if I went down this road,
I would certainly help in improving the documentation). So, it’s kind of a tipping point question,
and (at the moment) I’m just not close enough to the tipping point to make me think that this is
going to be mainstream enough to want to be an early adopter. Such a nice syntax,
though… Also noteworthy : Last commit was 6 months ago (Update: Lots more commits in December-2016,
PrettyTensor was used as sugaring for the InfoGAN paper from OpenAI).
It seems that these candidates are external to Google, and may be more open to actual outside contributions:
TensorLayer- ~800 stars, 200 forks
tf.contrib.learn) - ~4,300 stars, 600 forks
sugartensor- ~120 stars, 12 forks
TensorLayer named arguments are good. And there are recent commits (and has documentation both in English and Chinese,
which is interesting). Their own blurb :
A frequent question regarding TensorLayer is that why do we develop this library instead of leveraging existing ones like Keras and Tflearn. TensorLayer differentiates with those with its pursuits for flexibility and performance. A DL user may find it comfortable to bootstrap with Keras and Tflearn. These libraries provide high-level abstractions that completely mask the underlying engine from users. Though good for using, it becomes hard to tune and modify from the bottom, which is necessary when addressing domain-specific problems (i.e., one model does not fit all).
In contrast, TensorLayer advocates a flexible programming paradigm where third-party neural network libraries can be used interchangeably with the native TensorFlow functions. This allows users to enjoy the ease of using powerful pre-built modules without sacrificing the control of the underlying training process. TensorLayer’s noninvasive nature also makes it easy to consolidate with other wrapper libraries such as TF-Slim. However, flexibility does not come with the loss of performance and heterogeneity support. TensorLayer allows seamless distributed and heterogeneous deployment with its first-class support for the TensorFlow roadmap.
TensorLayer is a major ongoing research project in Data Science Institute, Imperial College London. TensorLayer contributors are from Imperial College, Tsinghua University, Carnegie Mellon University, Google, Microsoft, Bloomberg and etc.
Looking at the code :
TensorLayer consists of a number of rather long class-specific files. Similarly to
Lasagne, it operates on its own classes (such as
Layer) that can carry around sets of information
in addition to what
TensorFlow itself uses.
TFlearn has a very similar style to
lasagne, which (to me) rates as a good feature. And the
documentation site is a positive too. Another positive claim :
TFLearn is also intended to stay up-to-date
with latest deep learning techniques. And the GitHub repo has more traction than
However, it looks like many function arguments are positional, which doesn’t help readability - and that may
be a big deal.
Looking at the code :
TFlearn seems to be split into a decent hierarchy of modules. More interestingly, many functions
are structured as functions on
tf.Tensor objects directly - not interposing its own object representations. As such,
it’s purely a (large) set of helper functions for regular
TensorFlow. This is captured in their statement :
Full transparency over Tensorflow. All functions are built over tensors and can be used independently of TFLearn.
SugarTensor looks like a one-person project. And it makes the ‘interesting’ choice of creating the
sugaring functions directly on the
TensorFlow Tensor objects. The benefit of this is that regular
TensorFlow code can be mixed-and-matched easily. But the syntax also allows for chaining of
methods (which is nice). The examples directory consists of MNIST experiments done in lots of different architectures.
Documentation is basically ‘read the code’ (which is itself pretty terse because the sugaring magic is available
on every tensor object…)
Fairly quickly (for my use-case) I can discard
tf.contrib.learn as being too high-level.
Lasagne in the same ways on
Theano as it is vs my ‘ideal’ framework on
(though I have to admit that a lot of good people are involved, so it certainly has momentum).
prettytensor seems to have very few commits recently.
Looking at ‘raw’
TensorFlow, a lot of people are using
TF-slim as just an extra toolkit on top,
plus there are other contributions in
tf.contrib.XXX which can sugar raw
probably the holes will be filled in soon enough. But apparently that’s not so soon that all the recent papers
are able to be implemented in it (at least in the public tree).
Interestingly, according to a recent speaker at a Google TensorFlow event, the TensorFlow GitHub mainline lags 1.5 days behind Google’s internal repo. Which is both (a) good, that it is so short, and (b) worrying, given it indicates that Google is maintaining a separate code-base (rather than having their modifications in a sibling repo, overlaid on top).
TensorLayer occupy the same kind of level in a hierarachy of helper libraries as
Theano. However, since they seem to be basically neck-and-neck feature-wise, it’s difficult to
make a decision between them.
Perhaps, to simplify the decision, I’ll pick the wild-card and code a few things using
SugarTensor, since the
chainability is attractive, and the person running development seems to be implementing the same selection of
papers that I’d like to explore (and is ready to accept contributions). Keeping close to base
without committing to one of the other more imposing layer mechanisms, may be a decent interim step while
getting familiar with
Also found along the way…
blog comments powered by Disqus