First look at Microsoft 365 Copilot - 9 minutes read
Microsoft has made tons of noise about their Microsoft 365 Copilot toolset. They’ve made a lot of promises about its capabilities to enhance individual and team productivity, but access to it has been very limited so far. This week, the product team gave Microsoft 365 MVPs access to a test environment with Copilot in it. This sandbox doesn’t have any data, so we had to bring our own. The product team said we could test the tools however we liked; this is the experience I’ve had so far.
To set up my sandbox, I uploaded some “work”-related documents. One large folder is manuals and documents from my future second career as a commercial pilot; the second is an archive of Practical 365 articles I wrote. I wanted to see how well summarization and Q&A worked. The Copilot family includes different assistant modalities for different applications– for example, Copilot for Outlook can do some things unique to Outlook, as can Copilot for PowerPoint, and so on. I didn’t test each of the individual modalities yet.
In my early tests so far, the results are… mixed. Some of these issues are probably due to user error and/or misunderstanding of what results I can reasonably expect. Some of them, I fear, are not.
Test 1: Find and summarize something in a OneDrive document
One of the documents I uploaded is the operating handbook for the Century III autopilot in my plane. I asked “Can you summarize how the glideslope coupler on a Century III autopilot operates?” The answer I got was “I’m sorry, but I don’t have any information about the glideslope coupler on a Century III autopilot in my internal knowledge or in your OneDrive.” There are 4 pages in one of the documents in my OneDrive that detail the operation, but it’s possible that Copilot hasn’t indexed things properly yet. I’ll try some other tests after a decent interval has passed.
Test 2: answer some fact-based questions from a document
Here’s what should be a simple fact based question: “what’s the single-engine service ceiling of a Baron 55?”
As you can see, the first answer is precise… but it is completely incorrect. I supplied a URL that has a better answer, which Copilot read and summarized. Then I asked the question again, with different and more precise wording. Different models, and model years, of Baron have different engines, which will affect their single-engine service ceilings.
This answer is also completely incorrect, but at least it’s different from the previous incorrect answer. anyone who plans a flight with a single-engine service ceiling of 15000′ or more in this aircraft will be in for a very rude surprise if they fly into mountainous terrain. Copilot did correctly explain what the single-engine service ceiling actually is, which is nice.
Here’s another example:
The original answer of “154 knots” is wrong, so I tried being helpful again by pointing Copilot directly to a better source. The specific page I referenced contains a table that lists the correct Vle speed. I realized after taking the screenshot that the table is a little confusing, because it lists two speeds for two different serial number ranges. But neither of those speeds is “154”.
Then I tried a different question, still related to the same document: I asked what the minimum controllable airspeed was for a Baron 55. Copilot’s answer was again not awesome:
According to the Pilot’s Operating Handbook for the Baron 95-B55 and 95-B55A, the minimum controllable airspeed for a Baron 55 is not explicitly stated. [PR: false; it’s shown in the table on 2-3] However, the handbook does provide information on the stall speed, [PR: also false; these speeds are not shown in the POH] which is the minimum speed at which the aircraft can maintain controlled flight. The stall speed for a Baron 55 with flaps up is 73 knots and with flaps down is 64 knots. [PR: this is correct but there is no citation shown]
As I said earlier, I thought it was possible that Copilot needed more time to read documents. I repeated some of these tests a day later and, surprisingly, the quality of the answers got worse. I thought maybe giving the exact serial number of the aircraft might help produce a more precise answer… but instead, Copilot forgot the answer it was able to find the previous day. Repeating the previous query repeated the same results.
In this case, Copilot subtracted value compared to a simple Internet search. Google Search produced the correct answer as its second result (after an ad). Bing Search didn’t show the correct answer on the first page of results and then I got tired of looking… I suspect this performance is a clue to Copilot’s inability to discover the correct answer.
Test 3: Document summarization
I alluded to this in test #1, but I’m prepared to say that the autopilot handbook hadn’t been indexed yet. I gave Copilot an easy one this time, and it did quite well:
Accurate and complete summary, plus a reference… very nice!
I tried again with a different question, to test whether Copilot would prefer my own work documents over general Internet searches. The first attempt produced a correct and useful summary, but my “company” data wasn’t included until I explicitly told it to:
I plan to run some more summarization tests after uploading some more documents (including the book chapters from Office 365 for IT Pros). So far, though, I am not seeing the improvement in results quality I had hoped for given that I am providing “work” data for the model to use. This may be 100% due to delays in ingesting the source material though.
Test 4: Drafting email with Copilot for Outlook
Copilot for Outlook can operate in two modes: you can ask Copilot to draft emails for you, and you can get coaching as you write. Drafting mode works reasonably well.
I first tried a very simple prompt: “Draft an email to Adam Henry. Thank him for joining us at our customer dinner. Ask him if it would be possible to have a technical meeting with him during the week of 20 December to discuss our turbo encabulator.“
The result was decent, if very bland. It’s essentially the same thing as the prompt I gave it, so I saved zero time by asking Copilot to do the work for me.
Dear Adam, I wanted to thank you for joining us at our customer dinner. We appreciate you taking the time to be with us. I hope you found the event informative and enjoyable. We would like to schedule a technical meeting with you during the week of December 20th to discuss our turbo encabulator. Would it be possible to arrange this at your convenience? Thank you and best regards, Paul.
I wanted to test whether Copilot for Outlook could create an email that summarized a document. Sadly, it can’t. My first test was asking it to summarize one of the Practical 365 columns from my OneDrive, but Copilot responded that it didn’t see the file. I then tried attaching the file to the draft email and asking for a summary.
This is really unfortunate, and I hope it’s something that Microsoft is working on.
In fairness, there are several things that Copilot for Outlook supports that I didn’t test. Because this sandbox environment doesn’t have any real data in it, I didn’t try to test summarizing of email threads. Copilot for Outlook is also supposed to be able to read a meeting summary generated by Intelligent meeting recap and then attach it to an email.
Test 5: Coaching with Copilot for Outlook
I selected “Coaching by Copilot” and started typing. After a few sentences, nothing happened. This is probably user error on my part. I then created a new draft, typed some text, and chose “Coaching by Copilot” again. This is the result I got.
Fair enough; I guess Copilot coaching isn’t meant to talk people out of sending emotionally-laden rants. I then fed it an edited version of an email I sent to a customer the other day, asking for a technical meeting. My source text of “Is there any way you might be able to squeeze me in to your calendar? If not, no worries—we will catch you in early January—but if possible I’d really like to get your technical input sooner rather than later.” generated this result:
My own personal opinion, supported by nothing, is that the original text wasn’t demanding or impatient, but then again this is why I don’t usually use writing assistants; I don’t usually agree with their suggestions on tone and voice. It’s interesting that Copilot didn’t attempt to help with the rudeness of the first email but was willing to suggest improvements in the second one.
A good first step…
LLM-based generative AIs have a lot of potential, as Gartner, AI vendors, and the press never tire of telling us. The Copilot tools in Microsoft 365 and Outlook are quite well integrated; they’re easy to find and easy to interact with. Thus far, I have to say that I am not super impressed with the output or utility of these tools though. I can see the potential for productivity improvements, and I’ll keep testing as the product evolves and improves.
Update 1…
After I posted this on Twitter, Microsoft’s Abram Jackson pointed out that I needed to enable per-user web search. There are two settings that control this: the first is at the tenant level, and it was on. However, individual users also have to opt in to including web results. I did that, with some excitement, and repeated the tests. Sadly, this change made some of my results worse. Microsoft’s Scott Schnoll also pointed out that you can provide result feedback with the thumbs-up/thumbs-down icons (see above screenshots), so I’ll be smashing those with a vengeance.
Related
Source: Paulrobichaux.com
Powered by NewsAPI.org