Friday, November 28, 2014

How does a covering index on a database work?

Today I was having a conversation with a co-worker who lives in Italy. He said that Christmas in Italy starts on December 8 which is a holiday in Italy but he didn't know the English name for the holiday.

I went to Google and typed in "december 8 holiday italy" and got back this result:


I didn't need to  click on the result of the first (or any) item that turned up in the search result because all the information that I needed was right there: The Feast of the Immaculate Conception.

This is a covering index. When the results from the index alone can satisfy the information that you seek then they are said to be "covered in the index" and no further lookup is required.

In my search analogy I only made one request to the Google index and never opened the page that it pointed to.

In a database I would pull back all the information that I needed from the table's index and not have to retrieve the record from the table to complete the information search.

Leaving the land of Google analogy you might have a table with an index on userId. Finding that ID in the index allows the database to quickly locate the record in the table from which it can pull the name, for example.

If you had an index on both userId and userName then only the index would need to be searched if all you are after is userName and the second lookup in the table would not be needed.

Thursday, October 23, 2014

Retro Forking with Git and GitHub

Problem: You cloned a repository that you don't have write access to and you've done some work on it. Now you want to commit that work and push it to GitHub but you never did a fork.

1. Fork the repository, for my example I'm using https://github.com/rjrodger/seneca-mvp

2. In the console in the local repository folder do "git remote -v" to find out what the remote address is:

$ git remote -v
origin    git@github.com:rjrodger/seneca-mvp.git (fetch)
origin    git@github.com:rjrodger/seneca-mvp.git (push)

3. Add this repository as the upstream repository with "git remote add":

git remote add upstream https://github.com/rjrodger/seneca-mvp

4. "git remote -v" should now show:

$ git remote -v
origin    git@github.com:rjrodger/seneca-mvp.git (fetch)
origin    git@github.com:rjrodger/seneca-mvp.git (push)
upstream    https://github.com/rjrodger/seneca-mvp (fetch)
upstream    https://github.com/rjrodger/seneca-mvp (push)


5. Change the origin to your fork:

git remote set-url origin git@github.com:guyellis/seneca-mvp.git

6. Confirm the change:

$ git remote -v
origin    git@github.com:guyellis/seneca-mvp.git (fetch)
origin    git@github.com:guyellis/seneca-mvp.git (push)
upstream    https://github.com/rjrodger/seneca-mvp (fetch)
upstream    https://github.com/rjrodger/seneca-mvp (push)

Saturday, September 20, 2014

Node.js Workshop with HTTP Status Check

I've been using HTTP Status Check for Node.js Workshops to help coach developers wanting to learn or get better at Node.js and JavaScript development.

HTTP Status Check is a Node.js utility that takes a list of URLs and checks that their statuses and other properties are what you would expect. As a web developer we usually have a number of domains, URLs and sites that we need to "keep-an-eye" on and this utility will quickly check the health of all our sites and report it back to us.

The benefits of using this project for a workshop are:
  • Most Node.js development is around web development. As a web developer you need to know and understand the HTTP protocol. This utility is all about that.
  • It's small and easy to understand.
  • It's something that you can (and should) use each day to keep an eye on your web properties. This means that you'll directly benefit from any changes you make to it.
  • Which brings us to changes. It's easy and quick to make changes in your fork (branch) of the project if you need it to do more than what it currently does.
  • In addition to learning Node.js/JavaScript you get to learn how to use Git, GitHub, Npm, how to contribute to Open Source and how to build your reputation and resume on GitHub.
To get the most out of a workshop you need your laptop to be in a state that allows you to dive in and start learning Node.js and JavaScript. This means that you need some basic applications installed and accounts setup and configured. Before attending, do the following. If any of the steps are unclear then post a comment below and I'll clarify.

("repo" is short for repository or code-repository. The code that you run is in a repo and you will pull it down to your machine. More on that to come...)
  1. Setup a GitHub account and log into it: https://github.com/join
  2. Fork the HTTP Status Check repo. (Forking is the process of making a copy of the code-base in your GitHub account that you have complete control over to do with as you please.)
    1. You must be signed into GitHub. Go to the repo: https://github.com/guyellis/http-status-check
    2. If you're signed in then you will see some buttons towards the top right of the page: Watch, Star, Fork. Click on the Fork Button.
    3. The repo will now be forked (copied) to your account and you will be redirected to it, something like: https://github.com/<your-account-name>/http-status-check
    4. Congratulations, you've successfully forked your first GitHub project and taken your first step to contributing to open source.
  3. Install Git on your computer. You now need to get that repo (the code-base) from your fork on GitHub to a directory on your computer. This is done through a process called cloning which as the name suggests creates a clone of your fork on your local machine. Before we can do this though we need to install Git:
    1. Use the official Git download page to find and install the right Git Client for your OS: http://git-scm.com/downloads
    2. There is also a link on that page that will take you to popular Git GUI clients.
    3. Once that's done come back here.
    4. Open a command window and type git and hit enter. You should be presented with a list of git commands. This confirms that git has been successfully installed.
  4. Clone the repo.
    1. Create a directory where you want to keep your source code. You don't need to create a directory for the HTTP Status Check project, just one that will hold your projects. For example /source/ or /code/ or /myrepos/ or something like that.
    2. Now open a command window in that directory or open a command window and change to that directory.
    3. Clone the repo by executing this command after you have replaced your-github-account-name with your GitHub account name. (This is where you forked the repo to in the Fork step above.):
      git clone https://github.com/<your-github-account-name>/http-status-check.git
      • Some notes about this:
      • You can find the correct link to use on your GitHub page by looking at the forked repo and on the right hand side you'll see something that says HTTPS clone URL. Right below that is a text box that you can copy the link from.
      • If you want to use SSH then you can click the SSH link below that box to switch the link to the correct SSH link.
    4. If this is successful then you'll see something that looks like this. The numbers that you see will be different:

      Cloning into 'http-status-check'...
      remote: Counting objects: 395, done.
      remote: Total 395 (delta 0), reused 0 (delta 0)
      Receiving objects: 100% (395/395), 56.90 KiB | 0 bytes/s, done.
      Resolving deltas: 100% (224/224), done.
      Checking connectivity... done.
    5. A directory would have also been created for you called http-status-check.
  5. Code Editor. You need a code editor to edit the code that you've just cloned. If you don't already have one installed then try one of these which are available on almost all Operating Systems:
    1. WebStorm: 30 day trial and then $49 to buy and $29/year renewal after that (for non-commercial use).
    2. Sublime Text: Free evaluation and then a $70 single payment for continued use.
  6. Install Node.js and Npm. I deliberately put this step at the end. I want you to be able to immediately run something after you've installed Node.js and see it in action which is why you got the code setup first.
    1. Go to the Node.js site and install it for your Operating System. This will also install Npm which you'll need later.
    2. Once installed open a command window and change directory to the HTTP Status Check directory and run node.
    3. You should see a > command prompt. This confirms it was installed. Hit Ctrl+C twice to exit.
  7. Run Npm
    1. In the same command window in the http-status-check directory run npm install. (You can also run npm i which is a shorter version of this.)
    2. You should now see all the dependencies being downloaded and added to the project. Here is an example of some of what you'll see in the console window. Some of the numbers might be different:

      debug@2.0.0 node_modules/debug
      └── ms@0.6.2

      chai@1.9.1 node_modules/chai
      ├── assertion-error@1.0.0
      └── deep-eql@0.1.3 (type-detect@0.1.1)

      lodash@2.4.1 node_modules/lodash
  8. Run HTTP Status Check.
    1. Now type node index.js and hit enter and the HTTP Status Check utility will run. You should see something like this:

      _ Google (http://google.com) testing disabled.
      _ HTTP Status Check on Guy's Blog (http://www.guyellisrocks.com/2014/06/http-status-check.html) working as expected.
      _ Missing URL example (http://www.guyellisrocks.com/2014/06/will-this-get-written.html) working as expected.
      _ LinkSilk WWW (http://www.linksilk.com) working as expected.
      _ LinkSilk (http://linksilk.com) working as expected.
      A total of 5 URIs were tested.
      Failure count:  0
      Success count:  4
      Disable count:  1
  9. Done! You've successfully forked, cloned, installed and run your first Node.js application. Now you're ready to start making changes to it to customize how it works. We'll cover code changes in the workshop. In the mean time you can change the list of web sites that it checks to check your sites and immediately provide you with some value:
    1. Copy the samplesites.js file (in the root of the project) to a file called checksites.js. (checksites.js will take precedence over samplesites.js if it is present.)
    2. Edit the checksites.js file and replace the sample web sites with your own websites and rules. The file is heavily commented (lines starting with // are comments) to guide you to what changes you can and should make.
    3. Now run node index.js again and you'll see your sites being checked.
  10. Workshop and Questions. If you weren't able to get to this point then post comments to this blog post. You should also bring your questions to the workshop. If you're not able to edit your checksites.js file and run it against your sites before the workshop then you'll miss out on making changes to the code and learning about Node.js and JavaScript.

Bonus Tasks

  1. Run the tests.
    1. In a console window in the http-status-check directory run npm test.
    2. All the tests should pass.
  2. Run code coverage.
    1. In a console window in the http-status-check directory run npm run istanbul.
    2. The tests will run (as above) and code coverage will be calculated. The output should end with something that looks a little bit like this:

      Statements   : 100% ( 170/170 )
      Branches     : 88.73% ( 63/71 )
      Functions    : 100% ( 23/23 )
      Lines        : 100% ( 169/169 )
    3. Now open the coverage/lcov-report/index.html file that was created as a result of this and look at how the tests cover the solution.
  3. Learn more about how Git and GitHub work. Install Ungit and use its graphical interface in the browser to visual understand the repo's structure and work with the repo.
  4. New features in HTTP Status Check.
    1. Is there a bug that needs to be fixed or feature that you think should be added to HTTP Status Check? Then add it as an issue: HTTP Status Check Issues. (Even if you intend to work on this you should add it to the issues first and then assign it to yourself.)
    2. Want to work on existing issues in HTTP Status Check. Then find them in the same place: HTTP Status Check Issues

Wednesday, June 18, 2014

Pull Requests instead of Emailing Code



If you modify code from an open source repository, such as GitHub or BitBucket, here are the reasons why you should submit a pull request to get your code back into the main repo.



  • To avoid paying the stupid tax.
    • In short, your feature or fix will be available in future versions that you might want to use. If your code doesn't get into the master branch then it makes it difficult or impossible to keep up with future versions.
  • Make open source better.
    • If you've fixed a bug or added a feature then it's highly likely that others will need that.
  • Improve your resume.
    • More and more recruiting managers are looking at what you've done when they're hiring you. Having contributed to an open source project is a great way to show that you've done real work that people are using.
  • Get Credit
    • You've done the work. Now get your name on that repo and take credit for some of the work that you've done.
  • Use the Tools
    • Do you use Git? Have you ever submitted a pull request? These are questions you might be asked at an interview. Try it and practice it so you can demonstrate this skill. Even if I was hiring a junior right out of school I'd expect them to have at least done this.
  • Understand someone else's code-base
    •  Making a contribution to someone else's code-base forces you to read their code and understand their style and way of architecting a project. This is an invaluable skill as 80% of the work you do is reading over code. Even if you wrote it you'll have forgotten it to the extent that it becomes foreign in a few months time.
 This list came from a conversation that I was having with the owner of the ABot project on GitHub. ABot is a web crawler (spider) built for speed and flexibility.


He was telling me how other developers will add features to ABot and then email the code to him. If he decides to accept the code and integrate it into the project then it's all getting committed by him under his name. He was telling me how he doesn't want the credit for this work. By not submitting your code through a pull request on GitHub it makes it very difficult to give the credit to the person who did the work.

Tuesday, June 17, 2014

HTTP Status Check



I recently created a Node.js utility that will cycle through a list of URLs and check their HTTP statuses:

Npm: http-status-check

GitHub: http-status-check

This utility came out of a need to keep daily tabs on a number of URLs and the statuses that they were returning.

I plan on expanding it with other input and output adapters. Obvious input adapters would be databases. What else can you think of? Obvious output adapters would be email and any other type of messaging system.

Pull requests are welcome.

Update 4/July/2014

Added an excludedHeaders option. This is a list of headers that you want the check to fail on if they are present in the response from the server.

At first blush this may seem like a strange check to make. The common use case is the X-Powered-By header. This header allows the server to advertise the technology that is powering it. As a security concern, when possible, I'll remove this from sites that I publish. I feel that telling malicious attack bots what you're running on will help them exploit any vulnerabilities your stack might have.

Update 5/July/2014

Added an expectedText option. This is text that we expect to be present in the body of the response from the server. By default it is case insensitive but you can change that by supplying an object instead of a string.







Saturday, June 7, 2014

Using WebSockets when your Reverse Proxy doesn't allow it

As developers we don't always get to choose where our software runs. We often face economic or other restrictions based on infrastructure that already exists.

I recently moved a Node.js application from Linux server to a Windows Server 2008R2. Crazy right? It's working surprisingly well in the Windows environment. IIS 7.5 already owned port 80 so I had to setup a site on IIS, bind the domain to that site and use it as a reverse proxy to the Node app which was running on an arbitrary port.

In this case it happened to be IIS in any other case it might be Ngnix, Apache or any other server or reverse proxy that is between your Node.js application and the web. The problem that I faced is that this version of IIS does not support WebSockets so it looked like I couldn't use that and had to allow socket.io to fall back to long polling for this application.

There is, however, a solution, and it's rather simple.

Your site's facade, let's call it mdomain.com, is running on port 80 on IIS which is configured to run as a reverse proxy passing all traffic through to port (say) 4444 where your Node application is running.

When a client (a browser) connects to your site you provide it with the usual payload of HTML, CSS and JavaScript and in that you also provide it with the port number or sufficient information for the WebSocket part of the client to make a direct connection to your Node.js server and bypass the reverse proxy completely.

Using this little trick our site can remain on the default port going through the reverse proxy and all our WebSocket traffic can run over the application specific port.