FOURPROC

Compiling older Proteomics tools with newer GCC versions

At LabKey, I run into this problem over and over again. I assist a customer migrate their Proteomics pipeline to new hardware (and a newer linux distribution) and none of the older Proteomics tools compile properly. I received an email from a customer, yesterday asking for assistance in compiling PepMatch. The customer was trying to compile the software on RHEL6.2. They were getting the error

 

[labkey@server pepmatch]$ make
...
g++ -O2 -DGCC   -c -o InspectResultsParser.o InspectResultsParser.cpp
In file included from PepResultsParser.h:4,
                 from InspectResultsParser.cpp:7:
saxhandler.h: In member function ‘bool SAXHandler::isElement(const char*, const XML_Char*)’:
saxhandler.h:61: error: ‘strcmp’ was not declared in this scope
saxhandler.h: In member function ‘bool SAXHandler::isAttr(const char*, const XML_Char*)’:
saxhandler.h:64: error: ‘strcmp’ was not declared in this scope
InspectResultsParser.cpp: In function ‘char* prepareAppend(char*)’:
InspectResultsParser.cpp:12: error: ‘strlen’ was not declared in this scope
InspectResultsParser.cpp: In member function ‘bool InspectResultsParser::parse()’:
InspectResultsParser.cpp:66: error: ‘strstr’ was not declared in this scope
InspectResultsParser.cpp:118: error: ‘strcasecmp’ was not declared in this scope
make: *** [InspectResultsParser.o] Error 1

 
I vaguely remembered this occurring before, but could not find anything in my notes or email about it. After a bit of googling, I found GCC 4.3 Release Series Porting to the New Tools, which says,

As detailed here (Header dependency streamlining), many of the standard C++ library include files have been edited to only include the smallest possible number of additional files. As such, many C++ programs that used std::memcpy without including <cstring>, or used std::auto_ptr without including <memory> will no longer compile.

 

Usually, this error is of the form:

 

error: 'strcmp' was not declared in this scope

 
Sure enough, I added #include <cscript> to the saxhandler.h file and the compile successfully completed.

I posted this here so I can find it the next time either I or a customer experience a problem like this.

.....

How to install Grive on Ubuntu 10.04 LTS

LabKey uses Google Apps for email, calendar and document sharing. With the release of Google Drive, I got a request to sync some files from an internal file share to the Google Doc/Drive.

The file server is on a linux server(Ubuntu 10.04 LTS), running SAMBA. There is currently not a linux client for Google Drive. But there is an open source client, Grive that I used for testing.

Given that Ubuntu 10.04 is getting a little long in the tooth, it took a bit of effort to get Grive compiled on the server.

Install the required software

There is not whole lot of documentation, but the Grive webpage gives an overview of the required libraries. I started from there and installed

apt-get install build-essential cmake make openssl
apt-get install libstdc++6 libstdc++6-4.4-dev
apt-get install libjson0 libjson0-dev expat libexpat1-dev
apt-get install libssl-dev libssl0.9.8
apt-get install libcurl4-openssl-dev
apt-get install binutils-dev

Install BOOST

The BOOST libraries are required, however, the compile failed when using v1.4.0, which is installed via APT. After some digging, I found that compilation problems were fixed in later versions. I downloaded and installed BOOST 1.4.9.

Install the perquisite software

apt-get install bzip2 libbz2-dev zip unzip 
apt-get install python-dev

Download the latest version of BOOST

cd /usr/local/src 
wget http://softlayer.dl.sourceforge.net/project/boost/boost/1.49.0/boost_1_49_0.tar.gz
tar xzf boost_1_49_0.tar.gz

Build and install the BOOST libraries

./bootstrap.sh
./b2
./b2 install

This installs the libraries into /usr/local/lib and the header files into /usr/local/include

Build the grive software

Download and expand the software distribution

cd /usr/local/src
wget https://github.com/match065/grive/tarball/master
mv master match065-grive-b6fb4a6.tar.gz
tar xzf match065-grive-b6fb4a6.tar.gz

Build and install the software

cmake CMakeLists.txt
make 
make install 

This installs the binary at /usr/local/bin/grive and the libraries into /usr/local/lib

Using grive software

In order to use the grive software, you will need to set the LD_LIBRARY_PATH to include /usr/local/lib. ie something like

export LD_LIBRARY_PATH=/usr/local/lib

Initialize

  1. Create a directory to hold the files from your Google Drive
  2. Follow the instructions in the Usage section at the Grive webpage to grant Grive access to your Google Drive.

NOTE: The authorization credentials for accessing your Google Drive account are stored in a file named .grive. I strongly recommend

  • securing this file so it can only be read by the owner of the directory
  • blocking access to this file via SAMBA, if this is going to be a shared drive. (See hide files option)

Syncing your files and directories

To sync your files form Google Drive down to the local directory, all you need to do it run

/usr/local/bin/grive 

Grive will sync in both direction and seems to work great. This is not a daemon, you will need to manually perform the sync after any files have been changed or your could use a CRON job to periodically perform the sync.

.....

Force Tomcat to use IPv4 on server with both IPV6 and IPV4 configured

When using Tomcat6 on a server/desktop that is running both IPv4 and IPv6 networks, there are times when the Tomcat connector will bind to either the IPv4 or IPv6 interface and not both. This only occurs on kernels where the IPv4 and IPv6 stack do not share a common listener. Personally, I have had this occur on Windows7, Windows Server 2008, and RHEL6 servers.

There are some tricks to get around this.

Trick #1:

Add the following to each connector in your server.xml file.

address="0.0.0.0"

This configuration works reliably on Windows servers, but on RHEL 6.x this does not work

Trick #2:

This one I found buried in a blog entry. Add the following to your CATALINA_OPTS variable in your startup script.

-Djava.net.preferIPv4Stack=true

This worked on RHEL6.

.....

Create a new Issue on your LabKey Server using python

LabKey as a company was started by a bunch of developers and is still run by these same developers. If you asked then how they want to manage tasks, their first response will be in the bug list. For some of our business related tasks, instead of fighting this, we decided to embrace it. To embrace the bug list, we needed a way to pragmatically create new bugs(Issues) from an existing business system.

A problem we had to overcome was how to programmatically create new Issues. LabKey Server currently does not have a client API for creating or managing Issues, so I had to do it the old fashion way.

Creating new LabKey Server Issues with Python

The first thing we have to do is authenticate to the server (See this blog post for more information on authenticating to LabKey Server).

import urllib2
import urllib
import cookielib

labkey_servername = 'test.labkey.com'
labkey_project = 'sampleproject'
labkey_useranme = 'emailAddress'
labkey_password = 'password'

# Create the cookie jar 
cj = cookielib.LWPCookieJar()
cookie_handler = urllib2.HTTPCookieProcessor(cj)

# Create the opener 
opener = urllib2.build_opener(cookie_handler)
return opener

# Create the authentication string 
authHeader = base64.encodestring("%s:%s" % (labkey_email, labkey__password))[:-1]
authHeader = "Basic %s" % authHeader

# Create the URL for submitting the new Issue
labkey_url = 'https://' + labkey_servername + '/issues/' + labkey_project.rstrip('/').lstrip('/') + "/insert.view?"

The second step is to create the post data. To create a new Issue we need to post the following information

  • Title
    • field name = title
    • What is it: Title of the issue
  • Assigned To:
    • field name = assignedto
    • What is it: The UID of the user to which the new Issue will be assigned. You can find the UID for a user by looking at the Site Users page.
  • Area:
    • field name = area
    • What is it: The issue area. See the issues list for list of available areas.
  • Type:
    • field name = type
    • What is it: The issue type. See the issues list for list of available types.
  • Priority
    • field name = priority
    • What is it: The issue priority. See the issues list for list of available priorities.
  • Comment
    • field name = comment
    • What is it: Issue description, repro steps, etc
  • Action
    • field name = action
    • value = org.labkey.issue.IssuesController$InsertAction
    • The value is always the same.
  • Issue ID
    • field name = issueId
    • value = 0
    • For new issues, the value is always = 0

An example of creating the post data is below

issue_title = 'Task: Task #1'
post_comment = "A new task has been created. Below you can find the task information:  \n\n" + \
                "  - Task Name = Task #1 \n" + \
                "  - Submitter = Brian \n" + \
                "  - Email = my@email.email\n" + \
                "  - Company/Account = My Fictious Company \n" + \
                "  - Description: " + issue_description + "\n\n\n" + \
                "Created by get_task_push_to_issues.py \n"
post_data = [
    ('title', issue_title),
    ('assignedTo', int(3397356)),
    ('type', 'Task'),
    ('area', 'Business'),
    ('priority', int('3')),
    ('comment', post_comment),
    ('action', 'org.labkey.issue.IssuesController$InsertAction'),
    ('issueId', int('0'))
]

All that is left to do is submit it to the server.

sys.stdout.write('Create New Issue for Task #1..... ')
try: 
    resp = opener.open(urllib2.Request(labkey_url, None, {"Authorization": authHeader }),urllib.urlencode(post_data))
    html = resp.read()
    #print html
    #print resp.info()
    #print resp.getcode()
    #
    # Check response to see if there was an error during the submission
    # A HTTP 200 response code does not always indicate a successful creation of the issue.
    # So we will need to review response and verify
    success_string = issue_title
    if html.find(success_string) != -1:
        sys.stdout.write("SUCCESS\n")
    else: 
        sys.stdout.write("FAILED \n")
        print "\t Issue creation for " + issue_title + " failed"
        print "\t Submit URL = "+ resp.geturl() + "\n"
        #print html 
        sys.exit(1)
except urllib2.HTTPError as e:
    sys.stdout.write("FAILED \n")  
    print "\tThere was problem while attempting to create the new task to " + e.geturl()
    print "\t - The HTTP response code was " + str(e.getcode())
    print "\t - The full error message and HTTP response is below: \n\n"
    print e.read()
    print e.info()
    print "\n\n"

That is all you need to do to programmatically create new Issues on a LabKey Server.

.....

Adding virtual hosting for new domains to my mail server

A number of years ago, when GMAIL was just starting to get popular; there was privacy controversy over the fact that Google was scanning the contents of your emails in order to show contextual ads. Just as the tech press was starting to go ballistic over this story, I was making the decision about where to host email for personal domains.

There were two options

  1. Build my own email server
  2. Use a hosted service.
    • At this time, given my list of requirements, GMAIL was the only option.

Using GMAIL was and still is the obvious answer. Google runs a mail service so much better than I can. And all the time I would spend installing patches and figuring out why SPAM is getting through, for example, could be spent doing better things like skiing or spending quality time in a hammock in my backyard.

In my head I understood that GMAIL was the way to go, but I could not shake the feeling that running my own server was the best and only option. Looking back I realize the reasons for this were:

  1. I wanted to own my email messages. I wanted to make sure I could access messages from yesterday, from 1 year ago and from 15 years ago. They are mine and I do not want a company or government take-down order to block my access to them.
  2. (This one is a bit irrational(as email is like a postcard, anyone can read it at any point in the journey)) By hosting my own email server, I could make sure there was one less large corporation reading my email.

As you could guess, I ended up building and running my own email server. It started out at Slicehost and it now resides over at Rackspace Cloud.

With the decision to run a blog at http://www.fourproc.com, I needed to add a few more domains to this email server. In order to do this I had to enable Virtual Hosting support in sendmail. This is surprisingly easy to do.

Add Virtual Hosting support in sendmail

Enable virtusertable feature

Virtusertable is essentially an alias file. It allows you to route incoming email addresses from multiple domains to the appropriate mailbox. (See the virtusertable section at http://www.sendmail.org/m4/features.html for more information)

Edit the sendmail configuration file

vi /etc/mail/sendmail.mc
[added]
    FEATURE(`virtusertable')dnl

Create and configure the virtusertable file. (Below is a sample file)

vi /etc/mail/virtusertable
[added] 
    # Email addresses for fourproc, 4proc
    user1@fourproc.com      brian
    user2@fourproc.com      user2-fourproc
    user3@fourproc.com      user2-fourproc
    user4@fourproc.com      user4-fourproc
    webmaster@fourproc.com  brian
    
    @4proc.com              %1@fourproc.com
    
    # Catch-all addresses for each domain 
    
    @fourproc.com           catchme

Create the virtusertable db file

makemap -r hash virtusertable.db < virtusertable

This table will route incoming messages in the following way

  • Incoming messages to user1@fourproc.com will be delivered to brian’s mailbox
  • Incoming messages to user2@fourproc.com will be delivered to user2-fourproc mailbox
  • Incoming messages to user3@fourproc.com will be delivered to user2-fourproc mailbox
  • Incoming messages to user4@fourproc.com will be delivered to user4-fourproc mailbox
  • Incoming messages to webmaster@fourproc.com will be delivered to brian’s mailbox
  • Incoming messages to any other email address in fourproc.com domain will be delivered to the catchme mailbox
  • Incoming messages to any email address in 4proc.com domain will be delivered corresponding user in the fourproc.com domain.

Enable generic_entire_domain feature

The genericstable and generics-domain features control the routing of outgoing email address. If the genericstables and the GENERICS_DOMAIN_FILE features are enabled, then sendmail will masquerade the from address and the envelope if the FROM addresses domain is in the GENERICS_DOMAIN_FILE file. (See the genericstable section at http://www.sendmail.org/m4/features.html

Edit the sendmail configuration file

vi /etc/mail/sendmail.mc
    [added]
    FEATURE(`genericstable')dnl
    GENERICS_DOMAIN_FILE(`/etc/mail/generics-domains')dnl

Create and configure the genericstable. (Below is a sample file)

touch /etc/mail/genericstable 

For my case, no entries in the table were needed. Create the generics table db file

makemap -r hash genericstable.db < genericstable

Create and configure the generics-domains file. (Below is a sample file)

vi /etc/mail/generics-domains
[added]
    domain1.com
    fourproc.com
    4proc.com
    domain2.com
    domain3.com

Start sendmail using the new configuration

The first thing I need to do is create the new sendmail.cf configuration file

cd /etc/mail
make -C /etc/mail

Now I can restart the sendmail server

/etc/init.d/sendmail restart 

That was it. It was pretty simple. Took me a few hours total to research how to do it, test the configurations on a spare AWS instance and rollout to the production server. This was probably 8X longer than it would have taken me to rollout these domains using Google Apps. Maybe sometime in the future, my reasons for running my own email server will no longer matter to me and I will move over to GMAIL or another service. But for now, I am sticking with running my own.

.....