Using Textract with node.js on Heroku

One of the projects we’re working on at the moment relies heavily on the use of file attachments. Not a problem, LDC Via does that! But we also needed to be able to render some preview text from the contents of the files, so if a PDF, Word document, Excel sheet, or even an image has some text in it, we wanted to be able to show a brief summary of the contents to the user.

Enter the Textract plugin. This is a Python utility which can dig into the contents of files and extract them. It’s also really simple to use in your local environment: you just need to make sure Python’s installed and npm install does the rest for you.

But then we get to the real world. A dev machine is one thing, but when your app needs to be deployed to Heroku things get a little more complex. Typically, your app will be of one type: node.js, Python, Ruby or whatever. But we now need to have an app that contains two different code-bases. Luckily it’s a relatively simple procedure to set it up.

First, you’ll want to configure your project. In our case today, we’re dealing with a node.js app, so we already have our package.json file which defines all our dependencies. However, because we also want to add some Python dependencies, we need to add a new file to the project, which is called requirements.txt:

# This file contains all Python dependencies that are required
# by the Textract package in order for it to properly work.

argcomplete==1.8.2
chardet==2.3.0
python-pptx==0.6.5
#pdfminer.six <-- go back to this after the shebang fix is released (see https://github.com/goulu/pdfminer/issues/27)
https://github.com/goulu/pdfminer/zipball/e6ad15af79a26c31f4e384d8427b375c93b03533#egg=pdfminer.six
docx2txt==0.6
beautifulsoup4==4.5.3
xlrd==1.0.0
EbookLib==0.15
SpeechRecognition==3.6.3
https://github.com/mattgwwalker/msg-extractor/zipball/master
six==1.10.0

Next, we need to add another file called Procfile. This is basically an instruction file for Heroku that tells it what to do when it’s starting your application. Normally, when you only have a single code-base you don’t need the file as Heroku can make some assumptions, but with two different strands of code we need to give some guidance. So, in our case the file will simply look like this:

web: npm start

Once you’ve committed these files to your code repository… nothing will happen. We’ve made all the code changes we need to, but now we need to go to Heroku. You can obviously do all this with the CLI, but for pretty picture’s sake, you’ll want to go to your application settings page and scroll down to the Buildpacks section:

Buildpacks

Click the Add Buildpack button and for this case, choose “Python” from the list of options.

Once that’s added, we can simply re-deploy the application in the normal way and Textract will magically work!

My editor is WebStorm

Header image: WebStorm, logo

Continuing our series where we talk about our preferred code editor

A lot has been said, written, and ranted, about programming editors and IDEs over the years. Allow us to add to the noise.

This week Mark has his rant about WebStorm


Traditionally I have used the big clunky IDEs based on Eclipse such as IBM RAD and My Eclipse However on a recommendation from Ben I switched to IntelliJ IDEA for my Java and Scala work. After that it was but a short step to IntelliJ’s companion IDE, WebStorm

My main criteria in an IDE is not to cost me time or make me scream in anger: WebStorm manages beautifully here. It is light and easy-going, opening and restoring with no fuss, and whilst I’m aware that I don’t use it to its full capacity, a few of the things I love about it are:

  • It does not fight with your source control. Given the long-term nature of a lot of my clients, their source control systems vary hugely over time. In addition, with so many cooks in the mix, you often have to clean up big mistakes — I find a file explorer as the easiest way to do this, so I use Tortoise svn and Tortoise git to deal with my source control. Unlike many other IDEs, WebStorm has no issue with this and doesn’t try to take control, it just keeps me informed of the changes without any set-up.
  • Search results update as you change items. This sounds like a simple thing, but when you are searching for an item across multiple files and are changing each one after a brief investigation, having live search results which keep track of those changes is great.
  • “Context joining” is very, very clever: jumping around files following object and function links works very smoothly, particularly for a non-static language like JavaScript. I find that this speeds up my work a great deal, although it makes you a little lazy on your function size as large functions don’t get confusing anything like as quickly.

Screenshot: Webstorm

As my work tends to involve a wide range of issues, not just code, I use a stack of “secondary” programs too:

  • SOAP UI - Still the best program for messing around with XML-based web services.
  • EditPad Pro - even with all the new contenders out there, on the Windows platform this is still the best text editor.
  • SQuirreL SQL Client - Not the best SQL client per se, but far and away the best one for debugging Java data source issues.
  • Keystore Explorer - This has saved my sanity more times that I can remember when it comes to dealing with complex SSL key issues.

Chrome extensions

  • Restlet - How anyone does any REST services work without this is beyond me.
  • Salesforce Advanced Code - picking up code from existing organisations is a right pain, this code searcher helps hugely.
  • Grammarly - Because both my spelling and grammar suck.

So there you have it: more an overview of my entire toolbox than a simple IDE post!

My editor is VS Code

Header image: VS Code, logo

Continuing our series where we talk about our preferred code editor

A lot has been said, written, and ranted, about programming editors and IDEs over the years. Allow us to add to the noise.

It’s Matt’s turn this week to talk about his choice, VS Code


Built on the same Electron framework as Mr. Poole’s choice, VS Code is the first IDE I’ve actively chosen that is developed by Microsoft. Although it shares the Visual Studio moniker, it’s a totally different beast to the full blown C# IDE that Windows developers use. This is a lightweight text editor with added bells and whistles. Screenshot: VS Code running on macOS I only switched over about six months ago from Atom, but I’ve found that it’s made my dev experience far more flexible. From the tighter integration of Git, an integrated terminal window and many many plugins, there is everything I need for my daily node and JavaScript development needs. Like Atom, there are so many plugins. Some of my favourites are…

  • Beautify which tidies up code for you.
  • Docker for working with container-ising your application.
  • ESLint the almost required JavaScript linting utility.
  • JSON Tools useful utilities for working with JSON.

I’ve got others, but I feel like I’m going on a little too much about VS Code 😃

But I also thought it would be good to mention the other place I spend a lot of my time, and that’s Chrome. Just a plain install is great for the web developer, but once you add some extensions, it becomes truly useful.

Chrome Extensions

The ones I can’t live without are…

  • 1Password - my password manager of choice
  • Full Page Screen Capture - which takes a screenshot of a full web page even when it’s too tall for the browser
  • Browserstack - for testing different browsers
  • React Developer Tools - React is my chosen front end framework these days and these tools are really useful
  • EditThisCookie - cookies are a necessary evil of web development, this tool lets you manage them
  • Show me the styles! - a relatively new addition to my suite, it lets you select an element on a page and shows you useful information about it

So these are some of my every day development tools. What are some of yours?

My editor is Atom

Header image: Atom, code editor

The first in an occasional series looking at the tools we use in our day-to-day work

A lot has been said, written, and ranted, about programming editors and IDEs over the years. Allow us to add to the noise.

Mr. Poole is kicking off with a brief post about his code editor of choice, Atom


If you haven’t yet ventured into the world of this “hackable text editor for the 21st century” (their words), then it might be worth a go. Atom is an extremely usable, responsive, and distraction-free editor for code, and I’ve used it to the exclusion of all others for my text editing, JavaScript / web coding, bash scripting, and even the odd bit of Java (IntelliJ IDEA is still my IDE of choice for more heavyweight Java programming and testing).

Why Atom? Well yes it’s open-source, free of cost, and has a massive developer-friendly organisation behind it. That’s a cracking start, but the fact that Atom is built on top of tried-n-tested web technology (Electron) is the real clincher. If you don’t like something in the editor, all it takes is Ctrl-Shift-i (Windows) or Cmd-Opt-i (Mac) and you will have a familiar inspection pane at your disposal, a means for tweaking whatever it is that gets on your pip.

Screenshot: Atom running on macOS

Atom ships with a number of key packages, not least the very wonderful Autocomplete-plus, and there are hundreds more to choose from. Minimap is a must for navigating larger files, and the various Linter packages are indispensable too. Here are a few others I recommend:

  • Semantic Colo(ur) for sensible syntax colouring
  • atom-ternjs: “JavaScript code intelligence” which sits on top of the core Autocomplete-plus package
  • linter-eslint fronts up ESLint for Atom and Linter.
  • Todo-show Shows up all TODO, FIXME and related tasks in a project or whole workspace.

We don’t all agree about programming editors at LDC Via Towers: what are your favourites?

Getting started

A reflective post on a warm Friday afternoon

As new projects get underway and existing code is refreshed, refined and refuted at LDC Towers, the idea of “getting started” — together with the associated procrastination such as Googling, reading the news, making coffee and so on — is often uppermost in one’s mind.

Last month Ben wrote a post entitled Time well spent, which covered the movement of the same name (Now Center for Humane Technology). We’d like to continue examining this theme (there’s stuff to do you see): how do you get going? What tricks do you have for embarking upon new work? The task can often seem insurmountable, especially in this online age.

Fundamentally of course the solution is simple: the best thing to do is just start. Take steps, and before you know it, you’ve got somewhere. This is covered in the blog post Starting with failure is good for creativity, as long as you get started, which is well worth a read, and examines Kevin Ashton’s essay, How to Fly a Horse. We especially like this bit from the essay:

Good writing is bad writing well edited; a good hypothesis is whatever is left after many experiments fail; good cooking is the result of choosing, chopping, skinning, shelling, and reducing; a great movie has as much to do with what ends up on the cutting room floor as what does not.

… it echoes that maxim from the late Steve Jobs:

Real artists ship

Don’t try to make things perfect (that’s our excuse and we’re sticking to it): just make things.