Adopting Test-Driven Development (TDD) is critical for a tech company. TDD serves as a strong safety net against bugs caused by code changes. TDD becomes especially important for large code bases that are changed frequently. At the time of this writing, the Trustious code base is around 20,000 commits large, with over 80,000 lines of code, code that changes rapidly. The frequent changes are a natural consequence of the Lean methodology. As a company we firmly believe in Lean and are continuously running Build-Measure-Learn cycles. This requires us to make frequent experimental changes to the user experience which map to frequent changes to the code base (mostly to the front end). With such a rapidly evolving product, writing and maintaining tests that will truly keep us safe, could easily get out of hand.
Last week at Trustious, Islam gave a Talk about Games! But wait, its NOT your typical card games, board games or even video games. Islam’s talk was about impartial games with perfect information and a win-lose outcome. Self- explanatory right?
Let’s get to know these kinds of games a little bit more. “Perfect information” means all you need to know about how to win is in front of you, there is no chance, no hidden cards or anything of the sort. In “impartial games” the allowed moves for each player are the same and depend only on the current position. Basically what differentiates the 2 players is who goes first. This is what usually determines the winner and the loser, no draws hence the “win-lose outcome”.
This article is a cautionary tale for people using Amazon Web Services (AWS) and a testament to the awesome amazon customer support.
Thu, Jul 3, 2014 at 8:58 AM: The Shock
The day started bright with a light cloud cover and cool refreshing breeze. A rare and lovely day in the blistering summers of Cairo. I was hacking away in the quite morning hours when I noticed that the month’s AWS bill has arrived. I took a look and was rattled to see that it was for over 1800 USD!
This week’s Trustious talk was given by Saher El-Neklawy (engineering lead at Trustious) and was about node.js. Here is a summary of the material covered. Don’t hesitate to ask questions or share your own insights in the comments section. Enjoy!
We at Trustious have an unspoken motto; Users are the heart and soul of what we do. The very purpose of the app is to make people’s lives easier and their shopping decisions faster and more successful. But more importantly, and in order to fulfill this purpose, we need them to feel at ease with the app itself; we don’t want the Trustious experience to be a burden, or a “necessary evil” :) Which is why we are keen on meeting as many people as we can – whether current users of the app or people who have never laid eyes on it – to get their feedback on their experience with Trustious. This is where her role comes, our brilliant User Experience Engineer; Eman El Koshairy.
Among a bunch of other things she does, Eman meets with current or potential users of Trustious in what we call ‘User Meetings’, in order to get their feedback on the app. How does this meeting go? We’ll let Eman her tell you herself.
“In a one-hour session I get a chance to take our users on a tour around the app and ask him/her to try out different scenarios and observe how they find the experience. Sometimes I ask them questions to get more insights on how they feel dealing with different pages on Trustious”, says Eman.
“Sometimes I also show them some of the in-progress features we’re working on before they are live on the app, and get their feedback on them as well”, Eman adds. “You can’t imagine how much this helps us fine tune our features and designs to be able to give people the most enjoyable experience we can”.
To everyone who has visited Trustious since the day we started, and given us feedback in user meetings or even by email or on Facebook. We want to THANK YOU… Thank you for making Trustious Awesome!
As for the colorful notes on the board, those are little encouraging messages our users write for us before they leave. They mean the world to us :) (by the way, those are just a fraction of the number of users we’ve met since we started, we just started the note collection a little late).
Our work is a never ending process, and we want to meet a thousand and one people before we are able to say “This is Perfect”. So are you willing to come on board and help us make a better Trustious? We would love to meet with you :)
If so, please fill in this form and we’ll contact you for a meeting.
The primary objective of the Trustious Restaurants platform is to help people discover where to eat. In more concrete terms, Trustious needs to answer questions like “What are good sushi places near by?”, “What are good breakfast spots in Heliopolis?”, “Where can I order delivery late at night?”, “Where can I go to have good healthy food?”. An obvious pre-requisite, is to build a comprehensive model of a food spot that covers aspects like: What cuisine does it offer? When is it open? Is it good for a family outing, breakfast, large groups? Does it have out door seating, a good view? .. you get the idea. The question now is: How to obtain this information? .. and this is where it gets interesting.
We basically had three approaches:
- License an existing data set
- Rely on “experts”
- Develop a data driven solution
Our first impression was to see if some third party has clean fresh data for food spots in Egypt and to see if we can license it. After an extended period of research we arrived at the blunt conclusion that beyond basic location information, none of the available data sets completely fits our purpose for two primary reasons: (1) Most of that data is aggregated manually by people calling the food place to ask a set of questions. As you may have guessed, this approach could work for “where exactly are you located?” kind of question but not for “Are you good for breakfast?” kind of questions. Data acquired in this way does not differentiate between a place that allows people to eat before noon and a choice breakfast spot. (2) The data sets rarely covered the level of detail we need. For instance, none had information about what the place is really “good for” or when do patrons typically visit.
The approach we ultimately developed is a hybrid data driven pipeline with expert supervision. The idea is to analyze user data to find out what a food spot is really good for as seen by people not as advertised by the food spot. The expert’s role is to guide the data analysis and help spot flaws that we as engineers may not perceive.
In summary, the data driven pipeline works as follows:
- Develop a model for each “theme”. For instance: People typically go early to a restaurant that is good for breakfast, reviews about the place should comprise a certain set of keywords (e.g. breakfast, omelette).
- For each restaurant, create a document that includes all available data about the place. This includes; Trustious user reviews (currently over 10,000), the menu, Foursquare tips, Foursquare check-in times among others.
- For each theme and restaurant pair, measure the quality of fit between the theme’s model and the restaurant’s document.
In the rest of this post, we will cover how that works in some detail.
We came up with a list of interesting themes that we want to model. The first type of features associated with a given theme is its Relevant keywords.
We applied Keyword Frequency Analysis on all documents for all the restaurants. This resulted in a list of thousands of keywords sorted by frequency. Here’s a sample preview of the keywords list:
chicken - 1971
grill - 1204
sandwich - 966
salad - 906
sushi - 865
cafe - 750
burger - 741
cheese - 729
Clustering techniques were used, such as:
- Phonetic Fingerprint: Useful for ignoring common phonetic spelling mistakes that may occur in user reviews or posts.
- Nearest Neighbor Methods: Applying a membership class to an item based on its k-neareast-neibors (kNN)
- Levenstien Distance: The number of edit operations needed to change one string into the other.
We inspected the clustering results, ending up with a group of keywords for every theme. A sample of the themes extracted is shown here:
romantic = ['romantic', 'nile view', 'late night', 'midnight']
breakfast = ['break-fast', 'breakfast', 'break fast', 'omlette', 'omlete', 'morning', 'فطار', 'افطار']
healthy = ['weight watchers', 'healthy food spot', 'diet']
While exploring the Foursquare venue API, another valuable theme feature was noticed. Namely, opening hours and popular check-in times of a restaurant. This makes the second type of theme features used in our data-driven approach.
Popular check-in times indicate which times of the day (or which days of the week) does a venue get frequent check-ins. This is offered through the Foursquare API as an array of time windows for each day of the week (as seen here).
Let us consider the case that a certain venue consistently gets many check-ins early in the day (where early is until Noon). This significantly increases the probability that this place is visited often for breakfast. This signal is much stronger than simply checking the menu of restaurants for breakfast items. This allows us to confidently conclude that this place, not only offers breakfast, but also has good breakfast. This can be applied for many themes such as: late romantic dinners, breakfast spots, outings after work, or weekend hangouts.
For each theme T, we draw a timing curve C. covering the time of day or week related to T. C is then added as a theme feature for T. For example, assume T is the late-night romantic dinner theme, then C is the Normal Probability Density Function over the range of time [9:00PM – 1:00AM].
Our Trustious dataset is rich with valuable Restaurants specific information. This includes user reviews and ratings, restaurant menus, and other associated metadata. Besides our own, several apps on the internet provide an API for the results of their data crunching. Whether that is most checked-in restaurants on Foursquare or popular hashtagged photos on instagram. We aggregated data from several such open API sources.
Challenge: Item Matching
A natural obstacle when merging similar data from multiple independant sources is how to match items. To associate any external information to a Trustious Restaurant or item, we try to match the given titles with the names of the items on Trustious. For example, these titles come from the name of the venue on Foursquare or the title of a blogpost. In most cases the names are not an exact match, which introduces a problem since checking for string equality of both names is not enough.
To solve this problem, we used our existing Elasticsearch cluster to find the best matching item on Trustious. We considered the name of a Foursquare venue as a search query given to our search engine. The result is the best matching document in our search index. Elasticsearch (more specifically Lucene) allows this easily through its fuzzy matching and quality scoring for a query. We chose a quality threshold that optimizes percision more than recall, in order to keep the matching pipeline as automated as possible. At the same time, a moderator verifies any remaining candidate matches with a low confidence.
We represent each restaurant as a model composed of the following:
- Trustious reviews
- Trustious restaurant menus
- Foursquare tips
- Popular check-in times of day/week
This information was gathered from the relevant API endpoints such as:
We aggregate data from these endpoints and for each restaurant generate a “Document”, which is the appropriate scientific term used in Information Retrieval. The next step is to find the best matching theme for each document.
The final step is to match all pairs of documents and themes. For the textual features, the matching was performed via several methods, such as: exact, n-gram, and fuzzy matching. As for comparing themes’ timing signals with restaurants check-in data, we used a modified p-value normality test. This is to judge how busy does a restaurant get during a given time duration. If a document D matches a given theme T. then the restaurant R associated to D is classified with T.
Our data-driven approach works well for associating a theme for restaurants. We currently use it for generating lists of exciting top places sharing the same theme. It can be used with different datasets to classify different entities. The definitions above can be modified to build a suitable model for a different domain. If you have any questions about any of the models or techniques mentioned above, please don’t hesitate to let us know in the comments.
When you log into Trustious, you will be greeted by a set of, what we call, Highlights. These are products and places that match your interests and preferences selected just for you! We all want to find the good stuff out there, right? Well, now the good stuff finds you!