As we all know, the world is inundated with data about practically everything we do and it’s an extremely exciting time to be working in a field trying to make sense of all of it. However, as I and others have pointed out, there’s a lot of effort in our discipline put toward what I feel are sort of “bourgeois” applications of data science. On the other hand there are lots of NGOs and non-profits out there doing wonderful things for the world, but who don’t have experts on staff to deal with their data. At the same time, the data / dev communities love hacking together weekend projects, but they usually just culminate in a blog post or some Twitter buzz. Wouldn’t it be rad if we could get these two sides together?
Continue reading Doing Good With Data – Data Without Borders
I had the esteemed privilege of speaking at mongoNYC this week where I gave a talk about how we’re using mongoDB in our workflow at the NYT R&D lab to do wonderful and interesting things. The conference was a great chance to let people know what we’ve been up to at the lab as well as to beam about how great mongoDB’s been to us. Moreover, I learned a ton about mongo and learned that the people at 10gen are absolutely the most helpful, open, and generous group of devs I’ve met in a long time. Did you know they have office hours where you can go an ask questions about your schema / queries / what have you? Who does that? Everyone was ridiculously awesome and I walked away incredibly inspired to become a mongo ninja. You can snag slides of my talk here and there should be video up soon, but in the meantime I tried to emphasize the following 5 reasons mongoDB is great for a hybrid research / dev environment:
Continue reading NYT R&D + mongoDB: My Talk at mongoNYC
First off, an apology to my readers (both of you) for the infrequent updates to this blog. I know you’ve come to rely on my brief, inarticulate articles about boring tech stuff and for depriving you of that, I’m sorry. That said, here’s an update of some exciting goings-on as of the last couple of months that anyone interested in data, the New York Times, or some combination thereof, should get a kick out of…
Continue reading Project Cascade and openpaths.cc
I ran across this great Times article that focuses on a topic I was just discussing with some friends the other day – the death of the phone call. Because I’m constantly connected to e-mail, texting, Facebook, and Twitter through my phone, all media that involve communicating via “turn-based” messages that I can receive instantly but return at my leisure, making a phone call has insidiously and surprisingly entered my conscious as a “rude” way to communicate. Yet I can’t help but feel like we’re losing something in offloading our conversations to the purely digital.
Continue reading Don’t Call Me, I Won’t Call You
Why am I just learning that this exists as an empty box? I feel like I just found a dead unicorn. I also like that “The First Candy for Beer Lovers” implies that beer lovers refers not to people who merely enjoy beer, but specifically to people who cannot stand the taste of any other food (candy included) unless it also tastes like beer. Because who can choke down the disgusting taste of chocolate, sugar, and butter so prevalent in today’s “regular” candy? Finally, our days of dunking Snickers in Sam Adams are over. Thanks, Beercandy!
Continue reading But Can I Kill My Liver AND My Pancreas?
I’ve just barely started playing with RStudio, but I’ve already decided to make it my main R IDE. It’s taken so many of the tasks that used to frustrate me in the standard distribution and made them super simple. It’s nearly identical to using R but with a lot of new bells and whistles so, if you’re going to use R, you may as well use RStudio. That said, here are some of my pros and cons so far.
Continue reading Why I <3 RStudio
In the final part of my Strata recap I wanted to talk about the vast array of scraping, cleansing, graphing, plotting, visualizing, sharing, selling, searching, filtering, and otherwise gerunded tools that people showcased or used in their talks. As I mentioned in my first post, we are living in an intensely exciting time in which we have unprecedented access, borne by the power of the Internet and a thriving open source community, to datasets and tools for working with data. I was overwhelmed with the number of languages and software libraries people were using to chop up, remix, and process their data and, I have to admit, I felt a bit out of touch sitting there with C++ dangling on my finger like a stuck yo-yo while people did aerial acrobatics with Ruby and put on pyrotechnics shows with Tableau around me. This post is both an attempt for me to outline the tools I should at least be familiar with, if not using on a daily basis, as well as to save anyone who’s reading the time to round up the latest blades in the data scientist’s swiss army knife themselves. Full disclosure: I haven’t used many of these, so I can’t attest to their quality any more than to say that they intrigue me. That said, here are the data tools (and some services) from Strata that I think you (read: I) should know.
Continue reading Strata Round Up Part 3: Tools (and some services) You Should Know
My friends over at (media company who wants to remain anonymous) are looking for someone with good C and R knowledge to help them add some features into their software. Stats background is pretty much a must-have as the project will entail navigating and building on code that deals with statistical modeling and parameter estimation. The new features you’d be adding would also involve some stats know-how as well as the coding chops to implement them in C for use in R.
Continue reading Short-term Opening for C/R + Stats Person
There were a load of really great talks at the Strata Big Data conference (and certainly a fair share of pitches), so I wanted to distill a shortlist of keynotes that I found particularly inspiring. You can find the entire body of Strata videos and slides at Strata’s main site and I’d encourage anyone who’s interested to peruse the wonderful collection O’Reilly has put together there. If you just want a quick snapshot though, I’d say look to these.
Continue reading Strata Round Up Part 2: 5 Keynotes You Should Watch
I spent a couple of days at the Strata Big Data conference in lovely Santa Clara, California the other week talking shop about massive datasets and what to do with them. I had a wonderful time rubbing elbows with all the smart and interesting people out there with “data scientist” on their cards, but I was struck by the common theme that none of us really knew exactly what that, or “big data”, for that matter, really meant. With interest in “data science” buzzing, I thought I’d give a quick review of some high level ideas that I took away from the conference and what they say about the future of data and data science.
Continue reading Strata Round Up Part 1: Overview and Takeaways