Entries tagged with “bing” from O'Reilly Radar
Why Google and Bing's Twitter Announcement is Big News
Tweets will finally become first class web citizens
by James Turner | comments: 11
Lurking innocently on Google's blog this afternoon, like many of their big announcements, was the bombshell that they have reached an agreement with Twitter to make all tweets searchable. This followed an earlier announcement at the Web 2.0 conference by Microsoft that Bing has also arranged to make tweets searchable.
This is not only a huge thing for Twitter, it is also well past due. Until now, Twitter really hasn't been a first class web citizen, because you're not really part of Web 2.0 until you're searchable by Google (and, I suppose, Bing). Sure, you can read someone's tweets from Twitter, or get a thread via a #tag, but the full text searching capabilities that make things really usable on the web, largely powered by Google, have been missing.
Making tweets searchable is a major usability improvement as well. Twitter handles are cute, but sometimes obscure as well. Perhaps people will start using more full names in their tweets in addition to @ references, which would let you find tweets about people without having to know what their handle happened to be.
It appears that Twitter is going out of their way not to play favorites in the search space, by cutting deals with both Microsoft and Google. Microsoft seems to be ahead of the game right now, since they have a live site up, whereas the announcement from Marissa Mayer of Google only hints at things to come over the next few months.
The Bing interface is interesting, it seems to be a hybrid of a web search engine and a twitter search. Typing in a term gets you back both the latest tweets that match the keywords, as well as web pages that more than one tweet share in common that also match the keywords. This is a tacit acknowledgement that a lot of the useful content of Twitter is found in the web pages that are linked from the Tweets.
If I had to guess, I'd say that Tweets will show up more traditionally on Google, as just another kind of search result, that can be narrowed in the same way that you can narrow results to just images or movies. I guess we'll have to wait and see on that.
tags: bing, google, microsoft, twitter
| comments: 11
submit:
Four short links: 31 July 2009
NoSQL, Goldman Sachs, Yahoo! Developer Products and Bing, and Alternate Reality
by Nat Torkington | @gnat | comments: 3
On this day in history, Mt Fuji exploded (781), Daniel Defoe was put in the stocks for seditious libel but was pelted with flowers (1703), the first U.S. patent was issued (1790), and the radio show The Shadow aired for the first time (1930).
- Tokyo Cabinet: Beyond Key-Value Store -- description of Tokyo Cabinet and code examples in Ruby. More on the nosql move to leave relational databases behind for certain modern problems (such as scaling).
- The Great American Bubble Machine (Rolling Stone) -- I know it's old hat, but read it for the poetry if for nothing else. The first thing you need to know about Goldman Sachs is that it's everywhere. The world's most powerful investment bank is a great vampire squid wrapped around the face of humanity, relentlessly jamming its blood funnel into anything that smells like money.
- Yahoo!'s Developer Program and Bing -- note from Yahoo! to developers, saying that YQL, YUI, and Pipes are safe. For SearchMonkey and BOSS they currently do not have anything concrete to tell you. I assume (and hope) that Delicious is a top-level product, not something under "search". (via Simon Willison)
- Preparing Us for AR -- (Schulze & Webb) round up of some apps and toys that show what AR might be, unfettered by current day technological constraints.
tags: alternate reality, big data, bing, finance, financial crisis, nosql, yahoo, yahoo pipes
| comments: 3
submit:
Bing's Sanaz Ahari on System Feedback (2 of 2)
by Brady Forrest | @brady | comments: 2
A couple of weeks ago Bing had a small search summit for analysts, bloggers, SEO experts, entrepreneurs and advertisers. It was held in Bellevue; they put us up in the hotel and fed us. While there we received demos from Bing project teams. I was able to snag an interview with Sanaz Ahari, Lead PM on Bing. She led the team that developed the categories you see on a Bing web search. The interview was based on the slides from her presentation at the event. I have posted the significant images from her slides. The first portion of the interview focuses on how the Bing team handles Query level categorization and some of the problems they face. The second portion focuses on the systems used to generate the categorization.
Disclosure: I was on the MSN Search team (now the Bing team) from 2004- March, 2006. I knew Sanaz at that time.
Brady Forrest: Now on this image, it shows the ranking model and then it shows engagement and measurement.
Sanaz Ahari: Yes.
Brady Forrest: How does engagement and analytics factor into tweaking the ranking, measurement and engagement algorithms?
Sanaz Ahari: So the key thing about engagement is really there's two things: A, how often do people click on the different categories and then B, once they click on it, what do they do after that? So we basically feed that back into figuring out, "Okay. Did we actually put up the right thing? If something lower down is getting clicked on more, does it deserve to be higher? If something is not getting enough engagement, does it need to be bumped down?" And as we really expand the system, I'd have to say for us as a team, this is really the first step towards what we want to do. And, ideally, we want to get to the point for where we have enough understanding about every single query that we can really help you refine your tasks and your categories. So the engagement model can also help us in the future as we go in deeper into queries for helping people. We shouldn't just say, "Seattle, I'm going to Seattle restaurants." You should be able to go to Seattle restaurants and go in really deep and say, "I want restaurants in this neighborhood. I want of this price range, et cetera." So all of the engagement metrics can actually help us figure out what are the follow on tasks that users engage in the most as well.
Brady Forrest: And so what is the second flow chart?
Sanaz Ahari: So the second area, so once we felt that we could deliver intense understanding at a level of quality that we felt comfortable with, then we tackled the second area of problems which is equally difficult, which is really around, okay, how do we know that J Lo is a musician in the first place. And this is really around the query understanding aspect of things. And this is an area where we, again, explored multiple different approaches. We could've done a very kind of clustering on the entire corpus of our quarries. Or we could've said, "We're going to start a little bit more targeted and only go after the domains that we really want to go after." Like we said, "Let's just go after health and see if we can solve a small problem before trying to take on the entire corpus of the web."
For the Bing release, we focused -- and this was just like a principle that we had of the team was we really wanted to start small and see if we can get the level of quality that we wanted before trying to take on a lot more different challenges. And so, in this case, we definitely -- we went after the types of domains that we knew were strategic for us. So, all of the sudden, our corpus of quarries that we were interested in was a lot smaller. And we already have abilities to classify quarries into domains and understand, okay, this query is a music query or this query is health query, et cetera, et cetera. And so the other problems that fall out of that is, okay, when people do do health quarries, what are the categories that fall out of that? Like how do we know that people are going to care about diseases and symptoms, et cetera, et cetera. And then the next problem after that is how do we know that we have a comprehensive understanding of all diseases? So we may be able to understand that there are N different diseases, but how do we know that that’s actually a comprehensive list?
And then lastly, there's a problem about -- and this is one of the fascinating search problems -- is users query for the same thing in many, many different ways. So an example that I had was, for example, health is actually a very complicated one where the ALS disease is also known as Lou Gehrig's disease. And it's also known as one other thing which sounds kind of complicated. I don't even know how to say it. But there's lots of different ways that people basically query for the same thing. And so those were the three different problems that we really had to tackle in the query understanding space. So the two areas that we basically looked at was A, if we are able to identify a C set of quarries in a category, how can we actually really expand that out and be able to understand that we have a comprehensive list expander? Like if we start with N items, are we able to expand it out and get a more comprehensive list of items that are very similar to an existing C set that we started out with. And that's really the query expansion problem.
Brady Forrest: And what type of numbers are you talking about? Is it 100 or 1,000 or 100,000?
Sanaz Ahari: Oh, for the C sets? It completely varies. It completely varies. There are some categories that are small. There are some that are large. Like if you try to tackle musicians as a whole, that's huge. Whereas if you try to tackle like sports teams or something, that's pretty small. So it varies.
Brady Forrest: And are you pulling category names? Like are you pulling Wikipedia? Like proper nouns in the case of musicians or are you also pulling raw queries from the logs?
Sanaz Ahari: There's definitely both. We use a whole bunch of different features. We do a lot of work from logs. We do a lot of work on document extraction as well. What's very interesting is logs can give you a lot of great information where we have enough information. So it doesn't necessarily help you address the tail with precision. And document extraction can potentially help you with more comprehensiveness. And one of the things I would say is we also realize the good thing about approach on a whole actually, both on the intense extraction side and on query understanding side has been that it was an amazing learning experience for the team to tackle the problems one at a time because we realized there were so many intricacies that there are some things where we can build a generic system and it can help every category. But there were also cases where we would find a lot of intricacies in some categories where we had to do --
Brady Forrest: So what's a query that you're proud of that was like really hard and you feel like -- like an example of a query that really came a long way?
Sanaz Ahari: I actually don't have one at the tip of my -- I do like the experience for Jennifer Lopez because she has a lot of different attributes.
Brady Forrest: What's one that you really want to improve but you didn't want to tweak by hand?
Sanaz Ahari: Actually, the Jaguar one was one (Bing search), the one this morning that we talked about. That was a great query. And in some ways, I actually think we do a lot of positive things with that query. Like in one sense, I would say that we definitely deliver a diversified experience. And we at least capture the different intents. Whereas without the left rail altogether, you get the -- most users don't really go past the third algorithm result. And that in and of itself doesn't really give users enough diverse to creations [word] to say, "Okay. This is really my intent. And this is what I really want to dig down to." So on one hand, I like what we have done. But in the ideal scenario, I envision us being able to enumerate all of the different intents and all of the different tasks that actually fall under every single intent. So ideally, we should be able to call out animal, team, car, et cetera and then call out the individual tasks that the users want to do beneath every single one of them. There is -- the two areas that I really, really want us to improve is one, around that. I think that disambiguation is a pretty hard problem where we've barely scratched the surface. And then the second area is the depths of our coverage. You know, I really want us to have a much deeper experience where if I type in Indian restaurants in Fremont (Bing search), I should be able to still get a categorized experience where I can still dig in deeper.
Brady Forrest: What percentage of queries categorize the experience?
Sanaz Ahari: So today, 20 percent of our queries have a categorized experience. And the team is actively working on our next release where we are working on increasing both the quality and the coverage and specifically going more into longer queries.
Brady Forrest: Okay. Well, thank you very much, Sanaz.
Sanaz Ahari: Thank you.
tags: bing, san ahari, web 2.0
| comments: 2
submit:
Bing's Sanaz Ahari on Query Level Categorization (1 of 2)
by Brady Forrest | @brady | comments: 0
A couple of weeks ago Bing had a small search summit for analysts, bloggers, SEO experts, entrepreneurs and advertisers. It was held in Bellevue; they put us up in the hotel and fed us. While there we received demos from Bing project teams. I was able to snag an interview with Sanaz Ahari, Lead PM on Bing. She led the team that developed the categories you see on a Bing web search. The interview was based on the slides from her presentation at the event. I have posted the significant images from her slides. The first portion of the interview focuses on how the Bing team handles Query level categorization and some of the problems they face. The second portion (up shortly) focuses on the systems used to generate the categorization.
Disclosure: I was on the MSN Search team (now the Bing team) from 2004- March, 2006. I knew Sanaz at that time.
Brady Forrest: Hi, this is Brady Forrest with O'Reilly Radar, and I'm here with Sanaz Ahari, Lead PM on Bing Search. And she's going to lead us through the categorization process that you see on every page. Hey, Sanaz.
Sanaz Ahari: Hey, Brady. So I'm going to walk you through basically kind of just the journey that we went through for coming up with our categorized experience. And so the categorized experience is basically the left rail experience that you see on Bing today. It doesn't show up for every single query today, but when it does show up, it's really about helping the users complete their task essentially. So just to take a step back, when we started on the project, we had done a lot of analysis on queries just in vacuum. And queries are always a part of users completing a task. And in a lot of the analysis we did, we noticed that a lot of the tasks are common. And it's really just common sense. When you're looking for a car, you're either researching it; you already own it; you want to buy one. When you're looking for a musician, you want to see if they're on tour; you want lyrics, songs, albums, et cetera.
And so our challenge was can we apply some of that essentially structured aspect to queries. And this is really similar to what you see on sites like Amazon, IMDB, et cetera. They do just a really kick ass job of categorizing their content. The challenge is that A, those sites are really about one domain. And then B, those sites are really operating on top of already structured data. And so the challenge that we have with search is that A, we are a general purpose search engine, and then B, the data that we have is not structured. So the goal that we started out with was we wanted to start very simple. And categorization on clustering, et cetera are nothing really new in the search space. There are a lot of people for years that have been working around the space in the research and computer science space.
So what we started out with was one of the key things that we wanted were two principles. One of them was A, can we achieve aspects and categories that were really, really user intuitive. And B, can we achieve this across a query class. One of the things that we really wanted was in order for us to build a habit for our users, we needed to deliver a predictable consistent experience across a query class. So if I went and told my dad, "Hey, Dad, try any car," I really want him to get a categorized experience for any car. So those are the two kind of constraints that we really set for ourselves. We said, "Unless we meet these two criteria, it's not really successful." And so we started out with a lot of prototyping around, "Hey, can we actually extract intent from queries?" So we started from the intent aspect. And I'll walk you through an example just to show you a simplistic view and how it gets very easily complicated.
So in the example that you see here, we started out with musicians. So with musicians as a whole, the categories and the tasks essentially that the users do generically are fairly straightforward, you know, people want lyrics, songs, tabs, tour dates, ring tones, et cetera, and the list goes on.
Brady Forrest: And are musicians judged as a category?
Sanaz Ahari: Yes, so musicians here is, for example, a category. Yes. Now this is fairly -- what I would say, it's a fairly meaty high-level category though, because as you dig in deep, there are a lot of different attributes about musicians. So the three different examples I have here are -- well, two of them are my favorite bands, but not J Lo exactly. And they kind of cover a wide range. So you've got Jennifer Lopez (Bing search) and she's a pop musician, but she's one of those people that does a whole bunch of other things as well. You've got Gotan Project (Bing search), a little bit more tail. And they're a trip pop band. And then third, you've got Rodrigo Y Gabriella (Bing search) who are more of rhythmic guitarists. And you can think about all different sorts of attributes. You've got musicians that may not be alive anymore, et cetera. So there's all sorts of different attributes that fall out of even just a single musician's class. And so in this example, ideally, you should nail the right categories that apply to these three different examples.
So in one case, you've got the guitarist's ideally for this case, you know, tabs are pretty relevant. Lyrics definitely don't make a whole lot of sense. And then you've got J Lo and she is multifaceted, and we should really try and capture most of her facets. She's a fashion designer. She's an actress, and she's a musician, et cetera. So this shows you kind of the types of problems that we have to solve. A is a query might fall under different classes. B is that even if you're under a single class, the intent from that class, it may not be the same. And then there's the problem of head queries and tail queries, ones were we have a lot of data for and ones where we don't have a lot of data for. So from here on, we go on to basically our approach for solving this problem. I should say that this is an area where we had a brilliant set of folks working on it. We collaborated pretty closely with research. We had a brilliant set of engineers working on it. And the model that we converged on is one where we basically do category level inference as well as query levels. So in this case, in the category level, we want to figure out -- I've given a class of queries that are all similar. What are the top things that users are interested in?
In this case, our algorithms basically we used a whole bunch of different features, everything from query clustering, query clicks, session analysis, document extraction, contextual analysis, et cetera. And all of these things were things that we -- the features that we added were based on -- we did a lot of quick iteration to figure out what is good; what is bad and then where do we fall short to figure out what are the extra things that we really need to add in to our algorithm. So measurements was a very key process into our system because we really, really wanted to achieve categories that the users could make a lot of sense out of.
So algorithms don't often give you things that users really understand. So we really, really wanted to deliver things that it made sense to the users. And then on the second level, we really wanted to understand everything about just a query standalone as much as possible. And this is to balance the whole, "Okay. What are the top things people care about in a whole category?" If I've got this bag of categories that users care about, now how do I pick the right ones that only apply to this query? And that is why we had an approach at a category level and also at our query level. Lastly, we did a lot of work around determining if we know that a query is in a category, is that actually the primary intent for that query. So, I don't know, like traffic may be a movie, but a lot of users when they type in traffic, they actually are just looking for how bad is the traffic right now. And that's an example of a query, even though belonging to a category, it may be an obscure intent.
Lastly, we have our ranking model. And our ranking model basically takes all of the different inputs at the category level and at the query level in order to do some modeling around what are the top intents that apply to our re-query. And, of course, we have a very tight feedback loop system from what are the things that users engage with to feed back into the ranking of the categories as well as discovering new ones.
Brady Forrest: And how fast do you have to make this calculation for each query?
Sanaz Ahari: I mean it's all pretty fast because we are scaling through millions of queries. So there's a combination of things for performance optimizations, we do some things offline and we do some things online. For things that don't change a lot and it makes sense for us to do it offline, we try to optimize it. But it's definitely a combination of the two. And our goal is with users, performance is just an expectation. So that's something that we can't compromise on. So everything happens in a matter of milliseconds basically for all of our computations.
Brady Forrest: And how much are you able to cache in case suddenly a query starts to trend up?
Sanaz Ahari: Right. For a lot of our headquarters, we definitely do a lot of caching, et cetera. And for real time spiky things, we have invested in an entire different system where we're constantly monitoring for spiky trends. So it's basically the two systems are basically kind of optimized both individually so that we always are aware of what are the things that are all of the sudden spiking a lot. And then being smart about the things that have already been -- you know, that are head queries, that people are re-querying for.
The second portion of this interview will be posted shortly.
tags: bing, internet, sanaz ahari, search
| comments: 0
submit:








