Monday, December 14, 2015

Enable Data Deduplication on Windows 8.1

Recently I encountered the situation where I was needing to recover some data from an old test I had been doing with Storage Spaces on Windows Server 2012. However, my machine was running Windows 8.1. I really didn’t want to go through the process of setting up a new machine somewhere, so I figured I’d see if I could somehow enable Data Deduplication on 8.1.

I had been seeing a lot of files that had a size of, say, 1.82MB but a “size on disk of 0KB. That may be a giveaway that you’re looking at a disk that’s had data deduplication turned on.

Well, it works, and here’s where to go to find out how to do it:

http://ift.tt/1I3oYYE

Basically, you download the missing cabinet files (provided on that site) and then run 2 commands (note that first one is a one-liner, not a multi-liner):

dism /online /add-package /packagepath:Microsoft-Windows-VdsInterop-Package~31bf3856ad364e35~amd64~~6.3.9600.16384.cab /packagepath:Microsoft-Windows-VdsInterop-Package~31bf3856ad364e35~amd64~en-US~6.3.9600.16384.cab /packagepath:Microsoft-Windows-FileServer-Package~31bf3856ad364e35~amd64~~6.3.9600.16384.cab /packagepath:Microsoft-Windows-FileServer-Package~31bf3856ad364e35~amd64~en-US~6.3.9600.16384.cab /packagepath:Microsoft-Windows-Dedup-Package~31bf3856ad364e35~amd64~~6.3.9600.16384.cab /packagepath:Microsoft-Windows-Dedup-Package~31bf3856ad364e35~amd64~en-US~6.3.9600.16384.cab

dism /online /enable-feature /featurename:Dedup-Core /all



Wednesday, October 28, 2015

Perforce Notes

Right after you install perforce server, it’s in a sort of initialization mode where there is no password for the default “root” user. Anybody can connect and act as that root user by simply connecting as the user “root”. So, you want to lock that down a bit. “p4 protect” is how you do this.

p4 protect may seem like where you would do user permissions. I guess you *could*, but it’s intended more as a sort of global permissions layer, where you can set some hard limits. By default all “users” will have “write” permission on everything in the instance. You can then later specify different permissions in the Perforce administration app via a client.

If you comment out the line that specifies “user” permissions in p4 protect, it will actually remove the line entirely, and only the “root” user will be able to connect at that point.. So I’m thinking of p4 protect as a sort of way to lock everyone but “root” out of the instance without messing with a full p4 client, e.g. you’re at the host console and you just need to quickly do something drastic without hosing lots of stuff.

Note, to connect with p4 connect to a specific port with a specific password you need to set some environment variables really quick before running “p4 protect”:
export P4PORT=1667
export P4PASSWD=Changeme41
– these assume the user is the OS user you’re running the p4 protect command under.



Tuesday, October 27, 2015

Installing Perforce on CentOS 6.7

Install CentOS minimal

Start NIC
ifup eth0

yum update
yum install nano

Stop and disable firewall (I assume you have some other firewall):
service firewall stop
chkconfig firewall off

disable selinux:
nan /etc/selinux/config
-change to ‘disabled’
-reboot

Set up the Perforce repo as described here: http://ift.tt/1PPCkd6

Install perforce server:
yum install perforce-server.x86_64

Set up a config file for an instance of perforce server that we’ll call “capnjosh1”:
cp /etc/perforce/p4dctl.conf.d/p4d.template /etc/perforce/p4dctl.conf.d/capnjosh1.conf

edit that new conf file:
replace %NAME% with “capnjosh1”
replace %ROOT% with where you want the files to be (/capnjosh1-p4root)
replace %PORT% with 1666

create the directory and make owned by the user perforce (automatically set up by the yum installation):
mkdir /capnjosh1-p4root
chown perforce:perforce /capnjosh-p4root/

Start perforce server:
service perforce-p4dctl start

Run p4 protect to set up core permissions:
p4 protect
-by default it’ll give the OS account ‘root’ full perforce access
(if you set to a different port besides 1666, you will have to set this environment variable “export P4PORT=1667”, or whatever port number you put)

If it doesn’t start, it’ll say as much… likely check your p4root folder and make sure it’s owned by the ‘perforce’ user.

Here’s what’s cool, if you want to add more perforce instances, just copy-paste that .conf file and change the name, port, and p4root path.

When you uninstall perforce, it’ll auto-rename your .conf files so they won’t get auto-loaded if you reinstall perforce again.



Thursday, October 1, 2015

Configuring Confluence to Use Jira for Authentication when machines are behind a proxy

Here’s the key:

add the proxy configs to catalina.properties, just at the end, one per line. This is only mentioned in the Confluence-proxy article I found, but it works with Jira as well. Much cleaner.

/opt/atlassian/confluence/conf/catalinia.properties. Here’s what I added to the end of that file:

# Proxy Settings
http.proxyHost=10.22.1.2
http.proxyPort=8080
https.proxyHost=10.22.1.2
https.proxyPort=8080
http.nonProxyHosts=localhost\|10.22.18.45

NOTE: that nonProxyHosts doesn’t *seem* to work with wildcards, e.g. 10.*.*.*
When I did it this way I could never get Confluence to actually connect to Jira and Jira to connect to Confluence. You have to specify the entire IP address. Otherwise, you’ll get messages about “connection refused” or “the application doesn’t appear to be online”. Not very helpful at all, no sir.



Wednesday, September 30, 2015

Confluence setup error: Spring Application context has not been set

Here’s the error:
HTTP Status 500 – java.lang.IllegalStateException: Spring Application context has not been set

You’re trying to set up Confluence, but after “trying stuff”, you eventually get this error.

Here’s the fix:
Restart the Confluence setup wizard. How? Go to the following directory and delete the file confluence.cfg.xml:
/var/atlassian/application-data/confluence

That makes Confluenc run the setup wizard the next time you get there.

Here’s more of what you’ve likely been starting at on the error page. Hopefully this helped:

HTTP Status 500 – java.lang.IllegalStateException: Spring Application context has not been set

type Exception report

message java.lang.IllegalStateException: Spring Application context has not been set

description The server encountered an internal error that prevented it from fulfilling this request.

exception

com.atlassian.util.concurrent.LazyReference$InitializationException: java.lang.IllegalStateException: Spring Application context has not been set
com.atlassian.util.concurrent.LazyReference.getInterruptibly(LazyReference.java:149)
com.atlassian.util.concurrent.LazyReference.get(LazyReference.java:112)
com.atlassian.confluence.setup.webwork.ConfluenceXWorkTransactionInterceptor.getTransactionManager(ConfluenceXWorkTransactionInterceptor.java:34)
com.atlassian.xwork.interceptors.XWorkTransactionInterceptor.intercept(XWorkTransactionInterceptor.java:56)
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:165)
com.atlassian.confluence.xwork.SetupIncompleteInterceptor.intercept(SetupIncompleteInterceptor.java:40)
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:165)
com.atlassian.confluence.security.interceptors.NosniffSecurityHeaderInterceptor.intercept(NosniffSecurityHeaderInterceptor.java:21)
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:165)
com.atlassian.confluence.security.interceptors.XXSSSecurityHeaderInterceptor.intercept(XXSSSecurityHeaderInterceptor.java:21)
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:165)
com.opensymphony.xwork.interceptor.AroundInterceptor.intercept(AroundInterceptor.java:35)
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:165)
com.atlassian.confluence.setup.actions.SetupCheckInterceptor.intercept(SetupCheckInterceptor.java:32)
com.opensymphony.xwork.DefaultActionInvocation.invoke(DefaultActionInvocation.java:165)
com.opensymphony.xwork.DefaultActionProxy.execute(DefaultActionProxy.java:115)
com.atlassian.confluence.servlet.ConfluenceServletDispatcher.serviceAction(ConfluenceServletDispatcher.java:58)
com.opensymphony.webwork.dispatcher.ServletDispatcher.service(ServletDispatcher.java:199)
javax.servlet.http.HttpServlet.service(HttpServlet.java:727)



Thursday, September 24, 2015

Splunk – forwarding to a receiver that forwards to an indexer

Setup:
– Splunk Universal Forwarder on a server
– Pointed at a Splunk Enterprise instance that’s configured for receiving and forwarding (yeah, very easy)
– Receiver/Forwarder is pointed at another Splunk Enterprise instance that does the actual indexing

Note, if you have anything in props.conf on your indexer, you will have to put that on the receiver/forwarder. Otherwise it won’t work, and you’ll get the unclean rows. As soon as you put the same props.conf file on the receiver/forwarder instance, all is well again.

Figured I’d share. It could be a bit of a gotcha.



Wednesday, September 2, 2015

Merging of Real Life Economies and In-Game Economies

I think I need to clarify (or maybe it’s “revise) my stance on how in-game economies interface with real life economies. The issue seems to kind of boil down to



I’m linking to a Raph Koster post, as I’m sure all self-respecting gaming blogger should.  It’s entitled “A brief SF tale“.  It’s a short historical fiction on the nature of human ingenuity, a recurring theme for me over the past months.

Once Upon A Time, there were many sites dedicated to sharing photos, and videos, and for listening to music. But there was a war on, so the military blocked access to those sites because the traffic was huge, and soldiers kept leaking info they weren’t supposed to, and so on.
But soldiers, being trained to be smart and clever about working around limitations, found that for every Photobucket, there was a Flickr, and for every Pandora there was a private podcast, and so on.

I’d recommend reading the post in its entirety (it’s a quick read).  To me, the interestingly-worded theme is that when humans want something, like, actively want it, not just a passive “that would be nice”, generally, they figure out how to get it.  And when that something they want seems to them to be perfectly fine, there is very little that can get in their way.  So, those people in charge who make decisions on what is to be restricted.

This is not meant to be any sort of critique on the war in Iraq.  It’s not even meant to be a critique on war in general, though it’s included.  Incidentally, I bet most conflicts could be boiled down to a handful of people on one side of the conflict being pissed at the handful of people on the other side who are pissed at them.  When at least one of those two handfuls of people are top political “leaders” of their respective society, that’s when you get a war.  But that’s another topic entirely.

When a person wants something that makes perfect sense and is not blatantly, intrinsically wrong (like stealing a car or mugging someone), then not only can I guarantee you there will be many others wanting that thing, but I can promise you one of those people will figure out a way to get it.  The more obvious “things” today are music, movies, games, information-sharing, etc.  If the roadblock to getting said desired things is from some dictate, then those handing down that dicate need to revisit the entire situation.

In the case of online media distribution I think it’s the case of traditional distribution models being rendered obsolete, and the assumption such an event would be more bad than good, that prevents the wholesale embrace of online distribution.  Personal relationships figure in there to an unknown degree of course.  But the fact remains that people want music, tv shows, and movies online and what they want is not being made available in some way.  I think it’s likely a price issue, but that’s beyond the scope of this post.

In the case of soldiers inadvertently sharing sensitive intelligence, I’d argue the issue is a whole level of abstraction deeper.  Here’s what I mean.  If a person understands the situation in which they are in is of critical, immediate importance to their own well-being, then they will not share information that could compromise it. If there is a consistent problem of soldiers sharing sensitive information, then I think it’s obvious there is a general lack of conviction of the importance of the immediate situation in which those soldiers find themselves.  I think it’s quite possible this is due to the motivations, goals, and leadership of the war, and I am talking about the top political “leaders” that triggered and encourage the war, not the so much the military members.  And I think this points out what I think is a fact that most conflicts, whether wars or arguments between coworkers, are unnecessary.  They usually boil down to one or both sides refusing to swallow their pride, apologize for their part, and make it easy for the other side to do the same.  If caught early on, it’s relatively easy.  If left to fester for a while, it gets pretty difficult.  Didn’t we learn that in Kindergarten along with looking both ways before crossing the street?

this is not quite working…. .I’m getting distracted by all the aspects that seem directly relevant.



In my earlier post on items having static attributes or arbitrary attributes, the issue of how to organize items came up as something that must concurrently be considered when deciding on static, dev-created items or player-generated items.

In Eve Online, they have opted for static, dev-created items.  Crafters of a common item differentiate themselves only on the back end: profit margins.  All player-crafted 125mm Ionic Rail Guns, for example, are identical.  Now, the blueprints used to craft those rail guns, however, can be different.  But they are only different in how much time and materials are required to create the item.  A “better” crafter simply means he has a cheaper, more efficient way to produce the exact same item than a “lesser” crafter does.  A “better” crafter does not and cannot make an item with better attributes than a “lesser” crafter can.

Contract Eve Online with Star Wars Galaxies: the “better” crafter is the one who, in addition to sourcing materials cheaper, sources materials of better attributes and produces an item with better attributes than a “lesser” crafter can.



Dynamic Creature Population Update

I’m still working on the dynamic creature population system.  Granted, it’s on the back burner until the next version of Re



A democracy is a place where the majority sets the rules, an important characterstic of a democracy no doubt. But equally important, if those rules infringe on the human rights of the minority, it is no longer a democracy.

It strikes me the above line of reasoning may be a subset of a larger, more general rule.  Right now, the group whose human rights are important is the minority group.  What about cases where the human rights of the majority group are compromised?

Interestingly, the initial reaction focuses on the notion that the majority needs to be protected from themselves by some kind of legislation; and that something will typically be a positive something in a governmental role that acts as a kind of baby sitter. 

The point of the quote is that in order to protect the rights of the minority population there must be limits on what the majority can legislate.  However, take a step back and note that in order to protect the rights of the population in general there must be limits on what can legislated.  Put another way, there must be things government cannot mess with.  Or,



Reliability and How it Influences Economic Potential

Reliability is the single most important aspect of a successful economy.  This firstly originates from within the worldview of the population and secondly from the actions of the government.

In order for advanced financial products such as derivatives to exist, there must be a reliable trading market whose operational mechanisms are fully predictable.  In order for a trading market to exist there must be reliable banking and transaction systems in operation.  In order for banking and transaction systems to exist there must be a reliable rule of law that provides severe consequences for transaction and banking fraud or theft.  In order for a reliable rule of law to exist there must be a cultural and social desire to maintain a reliable rule of law, and there must be reliable sets of rules by which the enforcers abide.  It’s this last one that largely determines the strength of an economy.

The cultural and social desire to maintain a reliable rule of law.  I think you could look at this in terms of Game Theory, with the culture determining how a member of its population interprets the goal of “maximizing his returns”.  This is a touchy subject, since its logical extent lays the blame for economic failings at the feet of the cultural beliefs of the population(s) effected; and since any critique of one’s culture tends to be a very personal criticism, tempers typically begin to flare.



Climate Change

I’ve heard it said quite a few times that climate change is at this point inevitable.  I think there is some debate still over whether it’s us humans who are doing it or if it’s just the increased solar activity since the 60s, which incidentally, seems set to peak in 2010 or so. But that’s not what I’m writing about.  Pragmatically, the concerns over climate change center on increased weather fluxuations, rising sea levels, and “irridiation” of land.  Coming in on the same moment is the question of what we should do about it.

As far as governments go, nothing.  Governments should do nothing.  Let everyone figure it out.  Concentrate on maintaining law and order.

The thing is, climate change is going to happen slowly enough that the natural economy will easily be able to adapt to whatever environmental changes do occur.  Predictably, if hurricane activity increases, resulting in coastal areas getting destroyed, the [potential] cost of living or doing business in a coastal areas will further result in people moving inland.  Just give it 3 weather-related episodes.  Unfortunately, the brilliance of the collective intelligence that government is will result in exactly the kind of legislation being proposed in Mississippi and Louisiana that forces down the cost of living or doing business in coastal areas.  Humorously, however, the same people involved in that legislation are probably staying up nights worying about what should be done about climate change.

Check out the picture on this page. I guess my point is the kind of thing pictured won’t really happen without the express interference of government regulation.  Just let each person figure out what they need and let them do it.

All that laissez faire, libertarianism being said, I have no problems with certain efficiency targets.  However, I bet many inefficiencies are [still] present merely because of various other government-instituted regulation of dubious use.



Means of Expression and Transportation

A response to the following quote by Leon Trotsky here.

The real tasks of the workers’ state do not consist in policing public opinion, but in freeing it from the yoke of capital. This can only be done by placing the means of production – which includes the production of information – in the hands of society in its entirety. Once this essential step towards socialism has been taken, all currents of opinion which have not taken arms against the dictatorship of the proletariat must be able to express themselves freely. It is the duty of the workers’ state to put in their hands, to all according to their numeric importance, the technical means necessary for this, printing presses, paper, means of transportation.

What about when everyone has the means to express themselves freely as well as the means of transporation?  What if, presented with these means, a person still does not take up the cause against the bourgeois?  What if they simply write about celebrities’ misadventures,



Strict MMO Rules

Joe Ludwig asked a few questions I thought to be quite relevant in his Breaking rules post.  The first time I read it I kind of passed right through it, vaguely aware there was something good in that topic, but since at the time I was in the midst of some other train of thought I didn’t ponder it much.  Then, just now I went back and read it again.  Here’s the statement that got me thinking:

Of those, gray shards seem to be the most promising. If you could come up with a way to allow anybody to host a world instance and set the rules in that world, but still keep collecting revenue from those players, you could probably draw a lot of attention.

It struck me that allowing the playerbase to host their own shards is an excellent method of offloading game content and maintenance requirements.  World instances is one thing, but let’s take it a step further.  Why not let player groups create an entire map or set of maps, complete with quests, land forms, and a back story, all of which connect right up to the “real” world?  Well, then they will just make it really easy to obtain major loot in their world that you can sell back in the real one, and the entire game will turn into “who can make the map that’s easiest to make the most money”.  I guess the development company could just cut off rogue maps or something.  You would also have a possible discontinuity in world aesthetics.  Also, you would have to figure out how they would make money- number of players who visit that map, player-hours spent on the map, players that signed up for the game from that map’s web site?

Well, that kind of fizzled.  The “offload content development” ideal even to the point of world design, but a step back from “There” and “Second Life, was exciting for a moment there.  Certain propensities for imbalance, shall we say, would be a bit too much.

The content development capacity of the player base needs to be tapped.  In my experience “Second Life” and “There” are a frustration fest of waiting for all that unique content to download every time you travel further than 50 feet.  I don’t think an MMO needs to go as far as they do in order to have a sufficient capacity for players to feel they can create their own content.



If not “virtual world” then what’s the right term?

I’ve been an extra on the upcoming Bourne Ultimatum movie, so the long hours and early mornings discourage certain blogging activities.  But that’s beside the point.

This issue has been simmering for a few days now, and I feel I’ve made only some progress on wrapping my mind around how to communicate what I’m thinking when I say “virtual world” and how that relates to the “what should we be making” discussion.  Here’s where I’m at:

I think MMORPGs need to be designed such that they encourage significant amounts of emergent gameplay.  And, when presented with the question, “How will players compete for resources,” ideally the designers’ answer should be,” I’m not sure.  But I am sure that whatever does happen there will naturally emerge a reasonable equilibrium, and at that point everyone will be happy.  Everyone will have their sense of justice satisfied.”  If system designs incorporate interrelated feedback loop balancing mechanisms, then the developers don’t have to spend as much time balancing and tuning.  The players and environment will take care of that on its own.

As designers we don’t explicitly design the way player cities are used.  We just design how this mechanism called a “city” works, and we design it so that it can be used in all kinds of ways.



Marketing in MMOGs

This is a really interesting topic to me.  It seems like a very tough design challenge.  How do you get advertising dollars in an MMOG, and more specifically into a fantasy universe, a game universe that does not naturally lend itself to traditional advertising.  You can’t just throw in a billboard advertising Skechers shoes on the side of the road just outside Coronet.  Most of the direct marketing tactics would be immersion-breakers.  So how can it be done?

I think the only way to bring advertising dollars into MMOGs is to incorporate it into system designs.  And it’s going to have to be clever. In many cases I think advertising can only allude to certain products or services.  Does that break the entire possibility then?  Who would pay to have you allude to their product or brand?

The more obvious possibilities are ones that make certain objects in the game have similar shapes or color patterns as some real world product.  If all the best drink buffs are in containers with similar coloring as Coke cans, then it’s quite possible Coke will come to mind first when a player get a hankerin’ for a drink.

An interesting approach would be if certain master game design features were allowed to be funded by some sponsor.  Take all the “yeah, right.. like we will actually have the funding to make this” aspects of each system design and allow them to be funded by someone in exchange for advertising on the login screen that such-and-such a feature is being  developed due to funding from Actimel Yogurt Drinks, or Hyundai.  Boy, wouldn’t that be a lot of pressure to make whatever new feature is being added something good and not bad?  And you’d have to figure out how to quantify exactly how much the sponsor gets “pushed” to players.  And then that sponsor is associated with the game forever. Maybe

The goal is to figure out how the developer can allow the hoards of interested investors to help.  How can the developer leverage what a potential sponsor does really well to further the interests of the game.  In furthering the interests of the game, the sponsor gets more exposure, and since exposure is what the sponsor is wanting in advertising anyways…  Obviously, it’s exposure that’s converted into sales and profits eventually above and beyond the initial expenditure.



Virtual Ecology Implementation

I want to take another stab at how to do virtual ecologies, specifically how to implement them.

Add them to an existing world. I know I work best with an existing system, and I think this may be the best route to take in figuring out how virtual ecologies can be created (because from the literature I’ve read, nobody quite knows how).

I’ll take Eve Online as an example with which to work, because that’s what I’m playing at the moment. The economy is player-generated for the most part and it’s wonderfully dynamic. The PvP is pretty darned good, though there are some changes that could make it great- though they might be too broad-sweeping to be practical. The missions, from what I’ve experienced, strike me as very very static and scripted, with no *real* ramifications to success or failure. By “real” I mean you get the exact same mission multiple times from the same mission-giving agent in the course of a 4 hour session. There is an element in there that is ripe for introducing a virtual ecology: the rogue drones.

“Rogue drones” are the spawn of former human-made machines that spontaneously became conscious. They have run out of control and on occasion drive off human mining ouposts and infest the left-behind structures, salvaging parts to make more rogue drones. Currently, you are just given the occasional mission to “go destroy the rogue drone threat”. Each time it’s the same. Same script, same asteroids, same complexes, same graphics, same rogue drone placement (different solar system sometimes). Once you finish it you get the same commendation and reward. Blah.

Design a “Rogue Drone Level Progression Ladder”. Each level up defines stronger combat potential, better mining capabilities, and the qualifications to reach the next level. The drone level is determined by the amount of ores they mine from asteroid belts and the number of rogue drones killed by players compared to a level-defined spawn rate. Since various systems already have spawn spots set up for “pirate” NPCs in asteroid belts, just make it so those spawn spots can be “taken over” by rogue drones as determined by surrounding populations of rogue drones. Then, keep track of how many drones are killed in that spot per day, and then keep track of how much ore they mine as well. The object/entity whose performance we track here is the “Rogue Drone Spawn Spot”; not the drones themselves. These spawn spots look to reproduce into a neighboring astroid belt or system if they have mined a sufficient amount of ore over the last few days and if they have “reproduced” more drones than they have lost by some multiple.

Then, you would have to create a script that reviews the populations, levels, and performance of rogue drone activity surrounding various space stations scattered throughout the high security solar systems. Based on that analysis (that happens at server reset each day) these scripts feed inputs into “Rogue Drone Mangement” agents whose missions are then modified according to how “dangerous” the nearby rogue drone activity has been. If the drones have achieved pretty high levels then the missions are populated with high-level drone equipment. So, in addition to what the players actually experience in asteroid belts as far as rogue drone infestation, interference, and competition for ore, there will be a dynamic, responsive mission-customizing NPC element that gives out missions with rewards and difficulty adjusted to the actual threat level of the drones it is “monitoring”.

The beauty of it is that non-combat players and player corporations will have an incentive to pay fighters to keep real rogue drone populations at bay. Rogue drones will thrive in solar systems with little to no player activity and tend to spread out from there.



What does fluoride do for you?

Fluoride is said to do this:

1. Changing the structure of developing enamel, making it more resistant to acid attack – these structural changes occur if a child consumes fluoride during the period when enamel develops (mainly up to seven years of age)

2. Encouraging better quality enamel to form that’s more resistant to acid attack

3. Reducing plaque bacteria’s ability to produce acid, which is the cause of tooth decay

From here:
http://ift.tt/1NZxjwQ



Fluoride – Is it toxic?

This study looked at general organ and reproductive toxicity. Generally speaking, it didn’t find any issues with fluoridated water, even up to fairly high concentrations. It seems

http://ift.tt/1NZxjwO



GzipDecompressor: block size is too big when querying Hive table backed by gzip files

GzipDecompressor: block size is too big: 420818391

It seems the block is around 400MB. I guess that’s somehow too big.



Tuesday, September 1, 2015

Installing Atlassian Jira on an AWS EC2 instance

instance size: t2.micro
set storage to 30GB (that’s the max for the free tier)

update yum:
yum update

install wget:
yum install wget

install nano (because life is better with nano):
yum install nano

Change directories to tmp:
cd /tmp

Download the 64-bit Jira Linux installer from here: http://ift.tt/1ami0rs

Make that thing executable:
chmod u+x atlassian-jira-*.bin

Run the installer:
./atlassian-jira-*.bin

Hit ‘enter’ to select all the default options. This installs it to run on an HSQL database, which is perfectly fine for 10 users (or even more really). This installs Jira as a service too, so you can easily set it to start on boot.

Set it to start on boot:
chkconfig jira on

Turn off iptables (assumes you are using some other firewall, like AWS security groups):
service iptables stop
chkconfig iptables off

Browse to the instance on port 8080 to run the initialization stuff

wow… talk about a slick and utterly painless setup. Holy…. cow…. Just keep your team under 11 people 😉

notes:
you’ll get this error if you’re trying to install the 32-bit version of Jira on a 64-bit OS: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory
(from http://ift.tt/1raWaiF)

Not sure right now if you’ll need to disable selinux or set it to permissive, but if you do:
nano



Installing Atlassian Confluence on an AWS EC2 Instance

Turn off iptables (assumes you are using security groups as a firewall):
service iptables stop
chkconfig iptables off

You will need to add swap space (http://ift.tt/1KFFpK0):
dd if=/dev/zero of=/swapfile bs=1024 count=3096000
mkswap /swapfile
swapon /swapfile
add this in a new line to /etc/fstab:
/swapfile swap swap defaults 0 0

Download the Confluence installer from here:
wget http://ift.tt/1iUQ6Ip

Make that thing executable:
chmod u+x atlassian-confluence-*.bin

Run it:
./atlassian-confluence-*.bin

Press ‘enter’ to do all the default install options
Notes:
The swap space is necessary in order for Confluence to even start
You may need to turn off selinux (set /etc/selinux/config to have “disabled” instead of “enforcing”)
The step where it auto-pulls a trial key takes crazy-long
I’m not sure how truly performant this will be a on a t2.micro, so you may need to create an image and launch it on a t2.small or t2.medium for more CPU horsepower



Monday, August 31, 2015

Installing xWiki on Ubuntu – the easy way

Just use their apt-get stuff found here: http://ift.tt/1A7ugZJ

Soooo much easier. Except for the part where you have to increase the amount of memory Tomcat7 uses (default is too little and you get this sort of error: “rror number 4001 in 4: Error while evaluating velocity template colorThemeInit.vm”)

Here’s the steps I used:

wget -q “http://ift.tt/1JsPWnr; -O- | sudo apt-key add –

sudo wget “http://ift.tt/1JHPw04; -P /etc/apt/sources.list.d/

apt-get update

sudo apt-get install xwiki-enterprise-tomcat7-mysql

This then kicks off a bunch of downloads and installations. You will be asked for credentials to use for MySQL and XWiki and stuff.

Then, you have to do one more thing – increase the amount of memory Tomcat7 allocates:
in /etc/default/tomcat7 replace the JAVA_OPTS line with this (from: http://ift.tt/1A7ugZJ):
JAVA_OPTS=”-Djava.awt.headless=true -Xmx512m -XX:MaxPermSize=192m”



Sunday, March 8, 2015

Easiest Way to Upgrade Splunk to a New Version

In short, you just install it over the top of the existing install. As a precaution, you could copy everything in /opt/splunk/ before you do the upgrade.


This is a verbatim copy-paste of the easiest way I’ve found to install Splunk. Yeah, you’re literally just overwriting the existing files with the new install. All your configs, index references, searches, apps, etc. will not be touched (provided you don’t have anything particularly custom of course).


I’m using CentOS 6.6, but any Linux variant should work fine.


You want the .tgz file from here (use wget to downoad the file to the /tmp directory on your server):

http://ift.tt/18rHcT7. Mine looked like this:




wget -O splunk-6.2.2-255606-Linux-x86_64.tgz 'http://ift.tt/1FAMGVO'

--2015-02-26 01:30:13-- http://ift.tt/1FAMGVO


Extract splunk and put it in the /opt directory:



tar xvzf splunk-*.tgz -C /opt


Run Splunk for the first time:



/opt/splunk/bin/splunk start --accept-license


Set Splunk to start at boot (just to make sure):



/opt/splunk/bin/splunk enable boot-start





Saturday, March 7, 2015

What about changes to props.conf in Splunk Cloud?

I can’t really find anything definitive in the docs (I’m sure there’s something, but my searches were coming up dry… maybe that says something about me though ;)


Anyways, I think you have to submit a support ticket that includes the props.conf additions you want.





Friday, March 6, 2015

Thursday, March 5, 2015

ejabberd hard crash and caps_features.DAT showing as not a valid file

We had a hard, nasty crash of the VM running Ejabber.


So, we ended up copying all the files in /var/lib/ejabberd to an earlier cloned image of this same machine.


But ejabberd wouldn’t start with ejabberdctl start. It threw this initially:



{error_logger,{{2015,3,5},{21,25,9}},"Error when reading /var/lib/ejabberd/.erlang.cookie: eacces",[]}


Ultimately, the solution to that problem was found in restoring ownership of the /var/lib/ejabberd/.erlang.cookie to ejabber_user:ejabber_user.


Then, another problem came up, and the only way we found what it was was by running ejabberctl live and watching the output. Here was the output:



** FATAL ** {error,{"Cannot open dets table",caps_features,

[{file,"/var/lib/ejabberd/caps_features.DAT"},


Well, after opening up this file and comparing with another file from an instance that was running, we tried simply renaming it to caps_features.DAT.bak and starting up ejabberd. And it worked.


caps_features.DAT seems to not be critical data – it appears to just be info on clients; and it can be rebuilt over time.





Tuesday, March 3, 2015

Thursday, February 26, 2015

Quickest Steps to Install any Splunk version really really easily

I’m using CentOS 6.6, but any Linux variant should work fine.


You want the .tgz file from here (use wget to downoad the file to the /tmp directory on your server):

http://ift.tt/18rHcT7. Mine looked like this:




wget -O splunk-6.2.2-255606-Linux-x86_64.tgz 'http://ift.tt/1FAMGVO'

--2015-02-26 01:30:13-- http://ift.tt/1FAMGVO


Extract splunk and put it in the /opt directory:



tar xvzf splunk-*.tgz -C /opt


Run Splunk for the first time:



/opt/splunk/bin/splunk start --accept-license


Set Splunk to start at boot:



/opt/splunk/bin/splunk enable boot-start





Wednesday, February 25, 2015

Deploying VM Clones of Centos 6.6 in VirtualBox

Make a nice, clean CentOS VM, complete with all the utilities you like.


Make a script and put it in whatever default folder you go to when logging in:




nano reset-networking.sh


Put the following in:




# this script resets the networking, as in the case of a fresh cloned VM

# after the reboot, DHCP should pick up a new IP and you're off to the races


# assumes you have nano installed (yum install nano)

# assumes you have removed the "MACADDR=" and "HWADDR=" lines from

# /etc/sysconfig/network-scripts/ifcfg-eth0


# suggested: in /etc/sysconfig/network file, add DHCP_HOSTNAME=

# .... the hostname will then show up in your dhcp table


# remove the existing network interfaces

rm /etc/udev/rules.d/70-persistent-net.rules


# bring up an editor to change the hostname

nano /etc/sysconfig/network


# reboot to complete things

reboot


now make sure the script is executable:



chmod u+x reset-networking.sh


Open /etc/sysconfig/network-scripts/ifcfg-eth0:


Remove the MACADDR= line and the HWADDR= line


Shut down the machine

shutdown -P now


Now you can clone the VM.


When a new clone comes up, log in and execute the script – it’ll reboot the machine and then it should be all good:

./reset-networking.sh


Helpful links:


http://ift.tt/1A8hoRv


http://ift.tt/1EteBc3


http://ift.tt/1A8hoRw





Monday, February 23, 2015

nlog config changes are reflected in real time – and they won’t take effect if the change invalidates the config file

Here’s kind of a nice way to test…. on PROD!


If you need to change your logging configs, you can just do it on your production server. If your changes break anything, they won’t take effect; the already-running configs will stay in place. It won’t crash your application either.


Note, if you stop your application, make a change to the config that breaks the config, the application will not start up. So, hack way… but only while the application process is running!


Fun dangerous things are a blast.





Thursday, February 19, 2015

Redshift sql alchemy dialect installation

It was not obvious to me how to install this thing – I wanted to use csvkit to auto-generate table schema and execute it on a DB server.


Install csvkit:

pip install –upgrade csvkit


install client-side postgresql packages (needed for sqlalchemy dialect install):

apt-get install libpq-dev


install python dev (thanks to this: http://ift.tt/1Eci5Q2):

apt-get install python-dev


install the redshift sqlalchemy dialect (found you could do this from here: http://ift.tt/1GaLSYx):

pip install http://ift.tt/1GaLSYz





Wednesday, February 18, 2015

Tuesday, February 17, 2015

Add a new disk to a Linux machine (EC2 and Ubuntu in this case)

You need more space. Create a new volume in EC2 and then attach it to the running machine. Then, log in to the machine via Putty or whatever you prefer.




fdisk -l should show the new volume, but of course it's not partitioned. This gives you the /dev/xvdf path (or whatever)


fdisk /dev/xvdf

n

1 (likely will be default)

enter (default)

enter (default)

w (writes the partition to the disk)

mkfs -t ext3 /dev/xvdf1 (that "1" at the end refers to the partition number you typed in (or selected because it was default) just above)

tune2fs -m 1 /dev/xvdf1 (reserve only 1% of space for root user - it's 5% by default, so it prevents non-root users from using this reserved space)


Now you need to mount this new disk:



mkdir /backup (I'm mounting this disk to use as a backup volume)

nano /etc/fstab (edit the fstab file, which defines which disks are mounted where on bootup)

create a new line that looks like this (then save and exit):

/dev/xvdf1 /backup ext3 defaults 0 0

mount -a (mounts all the stuff in /etc/fstab)


Now you should see your new volume showing up in /backup





Monday, February 16, 2015

Run a linux process in the background so you can log out of the console or not have to keep it up

I had a loooong mysql dump to run, and I didn’t want to risk it getting cut off part way through just because my local machine’s wireless network connection timed out or something.


This did it:


1. Run the process (mysqldump cmd with options)

2. Ctrl+z

3. bg

4. disown -h

5. Exit the terminal





Saturday, February 14, 2015

Amazon RDS MySQL instance – when it’s really really big

750GB big. If you want to restore a mysql dump to an RDS instance, expect to encounter some issues on the first go-round. You’ll likely have to create a custom parameter group. You’ll likely have to increase your max_allowed_packet parameter for the instance.


So, do yourself a favor and create a parameter group up front and search for my.cnf tweaks that optimize the sort of workload your database will see.


By the way, the following error may be that your max_allowed_packet size is too small for your restore:

ERROR 2006 (HY000) at line 1466: MySQL server has gone away mysqldump: Got errno 32 on write





Wednesday, February 11, 2015

Recovering a Windows Administrator Password in Amazon EC2

Say you’ve lost the admin creds to your Windows EC2 instance. And then you lose the private key. Bummer. You can still recover the password.


http://ift.tt/1ApWQcd


Just do it step-by-step.


When it says in step 8. (Optional) If your temporary instance is based on the same AMI that the original instance is based on, and the operating system is later than Windows Server 2003, you must complete the following steps or you won’t be able to boot the original instance after you restore its root volume because of a disk signature collision.


What does that look like? Well, when you reattach the dirve and start up the instance, it will continually stay in the “initializing” state. You will see it sort of flash to something else then return to “Initializing”. It’ll do this forever, so if it’s been 5 minutes and it’s still in the “initializing” state, then you have to do step 8.


When, in step 8 it says to “In the Registry Editor, load the following registry hive into a folder named BCD: d:\boot\bcd.” Just open regedit.exe, select the root node, then search for “Windows Boot Manager”. It’ll find something, but look at the registry tree for the one named 11000001.


If you mount the disk to the original instance, by default it puts it at xvdf… you have to mount it at /dev/sda1, otherwise you will get an error message about there not being any root volume mounted.





Tuesday, February 10, 2015

Regular Expressions, REX, Eval and Splunk – some tips to make it easier on yourself

Splunk uses “PCRE” Regular expressions, so when you use this tool (and you really should) select that from the dropdown: http://ift.tt/1fMXgx6


Just paste in a sample event and it’ll match it in real time, right before your eyes! Granted, Splunk does have this sort of feature, but honestly, I find it helpful to sort of “back up and out” of what I’m doing in Splunk to solve a problem that’s not strictly a Splunk thing.





Sunday, February 8, 2015

Saturday, February 7, 2015

Rreplacing backslashes in Splunk

Say you are extracting data that has nested JSON. Splunk may auto-escape double quotes. You can’t then directly run spath on that field and get anything out of it. You have to remove the backslashes. You need to use the “eval” function and for some reason stuff in 4 backslashes. Like this:


| eval MyDataField=replace(MyDataField,”\\\\”,””)


Splunk answer about this:


http://ift.tt/1DOsu1z





Friday, February 6, 2015

Splunk props.conf – yeah, you have to restart splunk entirely when you make changes

On this page, it seems to indicate you could search with ” | extract reload=T” and it would reload your props.conf file. I didn’t see it work, at least not with extract-time transformations (it may work with search-time stuff).


http://ift.tt/1zpDC5H





Thursday, February 5, 2015

Multiple events logged to a single log line – how to work with it in Splunk

I have log lines that are really multiple lines of events (JSON in this case, and many events are batch-logged at once). I need Splunk to split them into individual events.


props.conf is where you have to muck around. And, yes, you have to restart splunk any time you make changes there.


Ultimately, this worked for me:


[name_of_my_sourcetype]

LINE_BREAKER = (\n)

SHOULD_LINEMERGE = false

TRUNCATE = 0

#TIME_PREFIX = “Timestamp”:”

#TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N

#TZ = UTC

KV_MODE = JSON


I’ve commented out the TIME…. things, but they do set the timestamp of each event to be whatever immediately follows that TIME_PREFIX regex.





Friday, January 30, 2015

Wednesday, January 28, 2015

Installing Graphite for Dashboards of System Metrics

I’m doing this on a CentOS minimal install, and all the Graphite “stuff” is going in the default locations. I hear you “don’t need as much space as you think”, but that you do want to use fast disks (read “SSDs”).


Graphite install steps:


http://ift.tt/1Ermfok


2 Keys:

1. Install EPEL

2. install pip via yum (not get-pip.py)


1. yum update -y

– yeah, this can take a while, but it’s worth it


2. yum install wget


3. Install EPEL (this is key; otherwise all the following steps will mysteriously fail):

wget http://ift.tt/1htTSIk

sudo rpm -Uvh epel-release-6*.rpm


4. yum install python-devel


5. yum install python-pip


6. pip install http://ift.tt/15ToC42


7. pip install whisper


8. pip install carbon


9. pip install graphite-web


From here follow this step-by-step:


http://ift.tt/1Ermfom


EPEL install:


http://ift.tt/1sfqNHu


I had some problems with installing carbon. some “gcc” comand failed in the very last step, throwing angry red text (like this: http://ift.tt/1ErmeRg). Anything that takes that much work is worth abandoning. Then I saw yum has it (in EPEL). Well, I couldn’t figure out how to uninstall pip. heh heh:


pip uninstall pip





Friday, January 23, 2015

Amazon AWS EC2 instance was configured with a larger drive but I don’t see that

Well, if you make the single drive larger than default, the extra space won’t automatically show up – well, it hasn’t in my experience.


You have to run resize2fs so “see” it.


Below is what it looked like for me. Notice there was no indication of free space on the drive.


[root@……..]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/xvde 7.9G 1.1G 6.5G 15% /

tmpfs 1.9G 0 1.9G 0% /dev/shm

[root@………..]# resize2fs /dev/xvde

resize2fs 1.41.12 (17-May-2010)

Filesystem at /dev/xvde is mounted on /; on-line resizing required

old desc_blocks = 1, new_desc_blocks = 32

Performing an on-line resize of /dev/xvde to 131072000 (4k) blocks.

The filesystem on /dev/xvde is now 131072000 blocks long.


[root@…….]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/xvde 493G 1.1G 467G 1% /

tmpfs 1.9G 0 1.9G 0% /dev/shm


After a while, it came back and I had the 500GB I had configured the instance to have.


While resize2fs is running, if you type “df -h” repeatedly you can see the volume growing. Good fun when you’re staring at an SSH console ;)





Thursday, January 22, 2015

Sharing OSX files over the network with Windows 7, 8, and 8.1

In OSX go to System Settings > Share > then look for sharing files with Windows. Heh heh.. That’s great and all. But in my case it didn’t work. “incorrect password” or whatnot. And I kept hitting that wall.


Well, the solution was to “uncheck all the things”, save, then “check all the things”. By that I mean to, say, uncheck/deslect/turn off the OSX accounts that are given permission to be used in SMB file shares from Windows machines. Then save. Then recheck/select/turn on them.. Yeah, it’s really that boneheaded ;)


But the reason is that somewhere under the hood, the configs got clobbered, probably by some system update. So, even though the OS is displaying that the user should work for fileshare, it’s not actually that way.





Wednesday, January 21, 2015

Tuesday, January 20, 2015

Saturday, January 17, 2015

Splunk: How to effectively remove a field from results if there are no non-null values in it

In my case, I needed to use rex to extract a “message” field that may or may not be present in an event, but if it was it could be really dirty (since it’s user-generated text). However, rex has the side effect of *always* creating the field you specify, even if there is no actual match in the event. As a result, every search had that rex-extracted field name, which was not desired (and confusing to see a blank “message” field for, say telemetry events).


Create a new field set to the value of the field you may want to get rid of then get rid of the original field, e.g.


say the field in question is named “possiblynull”




| eval maybenull=possiblynull

| fields – possiblynull

| rename maybenull as possiblynull


This way, if the original field is actually empty, your search results will not end up with it present.


A few Splunk Answers related to this scenario:


http://ift.tt/1xgQkiE


http://ift.tt/15bH4p7





Friday, January 16, 2015

Splunk subsearch where you want it to only return a single value

By default a Splunk subsearch returns something of the form “fieldname=24″. If you only want it to return the “24” part, just name the field in the subsearch “query”. Yeah, it’s a magic term for just such a scenario.





Splunk DB Connect limits

I wanted to use Splunk DB Connect to automatically incrementally query data from a database server based on data found in another database server.


So the idea was this:


Use DB Query to get the most recent date found in the destination database table, pipe that to a “map” command that allows me to specify the second dbquery on the source database table (specifying the results of the first dbquery as the starting date), and then doing some filtering and transforms, and then using dboutput to send the results to the destination table.


Well, that doesn’t work. It errors out with some null pointer excetption. It all works fine as soon as I remove the “map” command segment, which means I have no way to perfectly get only those records that are new since the last extraction.


My guess is that since the “map” command fires of one query per result row from the previous search, it means the dboutput can’t reference a single search to use as the source of data it will insert. I suspect it may be possible they could fix this so it looks at whatever the results were that are being piped into it. Currently, I am guessing it is looking for the base search.


I expect the current behavior is actually expected, so I bet the behavior I’m hoping for would be a feature request.


Here’s the Splunk Answers question I submitted for this:


http://ift.tt/1udaB7N





Thursday, January 15, 2015

Splunk and XML and JSON

Even with Splunk 4.3.3, it can automatically pull out XML and JSON fields at search time.


This means you can query a database table in real time, generate a table of data where each column is an XML and/or JSON element, then push it all to another DB table.


“spath” is the search command you want to use, and usage is something like this:






| spath input=MyFieldThatHasXMLInIt


Yeah, that’s all; it’s seriously great. Note by default spath only looks at the first 5000 characters in that field, so if you have larger fields you will need to override the system defaults by adding the below ini section to /opt/splunk/etc/system/local/limits.conf (adjust the value to whatever is appropriate for your situation). I got this section from /opt/splunk/etc/system/default/limits.conf and tried just pasting it in and restarting Splunk:




[spath]

# number of characters to read from an XML or JSON event when auto extracting

extraction_cutoff = 8000





Wednesday, January 14, 2015

Using Splunk to extract XML and JSON fields using spath, but the 5000 character limit prevents it from getting everything

Some events had xml that was longer than 5000 characters, and spath wasn’t extracting all the fields I knew were in there.


Here’s how to fix it:

Override the spath character limit in $splunk_home%/etc/system/local/limits.conf.


My exact edit was to add the below config section to /opt/splunk/etc/system/local/limits.conf (since it wasn’t there be default in 4.3.3). I pulled this from /opt/splunk/etc/system/default/limit.conf:


[spath]

# number of characters to read from an XML or JSON event when auto extracting

extraction_cutoff = 10000


There are a number of unanswered Splunkbase questions:


http://ift.tt/1AgfqPN


http://ift.tt/1AgfqPN


spath docs don’t mention an override available at search time: http://ift.tt/1AgfrTy


spath is fantastic, btw. It auto-extracts XML and JSON. It can even do JSON embedded within XML fields. Just do “spath input=fieldthatcontainsxmlorjson”. However, if you have potentially large XML fields you will need to increase the limit on the number of characters spath looks at.





Tuesday, January 13, 2015

Using Splunk as an ETL Tool for Data Residing in a Relational Database

Use Splunk to Extract, Transform, and Load data from an existing OLTP database into another OLTP database. It’s especially great if your source data has XML or JSON (imagine JSON stored in an XML field – Splunk can handle that no problem).


In my case, there was data in a table that stored web service requests and responses.


I used DB Connect (the Splunk app) to query the last 2 days’ worth of data, extracting the XML key/value pairs and cleaning up the data so the Splunk search output a clean table of data. I then filtered that data with a Splunk subsearch that returned the most recent request time already in my destination table (on a different SQL Server instance). Then I piped it all to dboutput pointed at the table. The table was created to ignore any duplicate keys, just in case. Then, I saved the search and scheduled it to run every hour.


Oh yeah, since none of this data is indexed in Splunk, it doesn’t count against your daily licensing limits :)


Here are what some of my Splunk search looked like (I’ll try to build it up in stages). I’m running Splunk Enterprise 4.3.3, btw:


First use dbquery to get the data from your source table.


1. Install DB Connect app (I got the latest version)


2. Set up the database connection to your source DB


3. Set up the database connection to your destination DB (make sure to uncheck the “Read Only” box, otherwise you’ll get messages about the database being read-only”.


4. My dbquery ended up looking something like this (queries from the Splunk external database connection named “DWH”):


| dbquery DWH "select * from LogDB.dbo.SourceServiceLog l (nolock) where l.RequestTime > dateadd(dd,-2,convert(date,getdate()))"


5. Use Splunk Search commands to clean up the data (I’ll leave this to you and the docs… but I’ll have more posts on clever uses)


6. Pipe your search results to dboutput. Mine looked something like this (it naively inserts the fields specified at the end of the command into the table splunktest in the database “MyDatabaseName”):


| dboutput type=insert database=MyDatabaseName table=splunktest key=Character RequestTime UserId RequestType


Note, when specifying the table to insert into with dboutput you cannot fully qualify a table name, e.g. MyDatabaseName.dbo.splunktest.


Note, the select query with dbquery can be any complex query you want. Definitely avoid any double quotes though.


This technique does not seem to count against your daily indexing license limits.





Monday, January 12, 2015

Puppet Forge Module Installation – make sure they’re in the right directory

I had been working on setting up the seteam-splunk module on Puppet 3.7 (for managing Splunk installations). Besides the fact that the module doesn’t purge changes to the inputs.conf files, I hit the following error when I went to install it in our production systems:

Could not autoload puppet/provider/splunkforwarder_input/ini_setting: undefined method `provider’ for nil:NilClass


In my case, this was due to the fact that when I installed the module, by default it was installed into different folders. I had to specify the target directory for each of the 3 modules involved:


puppet module install seteam-splunk –target-dir /opt/puppet/modules/


You may have to even uninstall the 2 dependency modules and reinstall them, explicitly setting the target directory (the –force flag is because it’s a dependency of another module:


puppet module uninstall puppetlabs-inifile –force

puppet module install puppetlabs-inifile –target-dir /opt/puppet/modules/


puppet mmodule uninstall nanliu-staging –force

puppet module install nanliu-staing –target-dir: /opt/puppet/modules/





Friday, January 9, 2015

What does this mean: Error: No such file or directory – getcwd

When working in Puppet, I’d occasionally get this at the top of a slug of scary red text errors:


Error: No such file or directory – getcwd


Well, it really just means your current working directory doesn’t exist. Usually, I had just deleted some directory and I was in that path at the time.


“getcwd” means “Get Current Working Directory”.


Just like “pwd” mean “Print Working Directory”


Figured the internet could always use another bit of information on something a new Linux user may not have encountered [yet].





puppetlabs-splunk Puppet Module does not purge non-specified ini file elements

Here is a post that outlines the problem I hit:


http://ift.tt/1wDxtxX


I have not been able to identify where the bug is.


In short, a Puppet module should allow you to specify the Splunk inputs.conf contents. However, this module currently seems to have a bug where it won’t remove something you haven’t specified.


I’d really rather just use the module and not have to modify it to make it work.





Wednesday, January 7, 2015

Impala 2.0 dies if you query on gzip files that are too big to fit in memory

Say you run Sqoop2 to import a very large table – say 6 billion rows – and you have it output to gzip.


Well, it’ll happily do all that, but you will end up with files of 3.5 GB or something. Yay! How space-efficient!


Then, you make an external table that points at those files and query with Hive. an hour or so later you’ll get results. Yeesh. Well, maybe Impala is faster….. hmm… error message of this:


“Bad status for request 708: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=5, errorMessage=None, sqlState=None, errorCode=None)”


What’s happened is you’ve killed Impala because you don’t have enough memory on the machines (and that’s what Impala is configured to do if Impala is what takes up all the memory on a node. You’ll see Impala is dead in the Cloudera Manager main dashboard, so restart it. Since the data is all in gzip files, it has to decompress each file in order to do stuff on it, and each of those 3.5GB files will blow up to, well, a whole lot bigger than 3.5GB.


You have to use something like Filecrush to break up those files into smaller pieces, preferrably something no larger than your HDFS block size (which Filecrush will happily do by default).


I’ve got some posts about setting up and using Filecrush:


http://ift.tt/14pOyE3


Filecrush project:


http://ift.tt/1gzGOis





Monday, January 5, 2015

Random numbers in TSQL queries

Say you wan to generate test and control groups directly in the results of a query, with each row being in either a control group or the test group.


Here’s one way to roll a random number between 0 and 1:

select ABS(CONVERT(BIGINT,CONVERT(BINARY(8), NEWID()))) % 2


Here’s how to roll a random number between 0 and 2:

select ABS(CONVERT(BIGINT,CONVERT(BINARY(8), NEWID()))) % 3


0 and 5:

select ABS(CONVERT(BIGINT,CONVERT(BINARY(8), NEWID()))) % 6


0 and 18:

select ABS(CONVERT(BIGINT,CONVERT(BINARY(8), NEWID()))) % 19





Friday, January 2, 2015

YARN not starting

Here’s the error I saw (from ‘Recent Log Entries” in Cloudera Manager after clicking on the Details for the failed YARN startup step when restarting the Cluster):


Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 13 missing files; e.g.: /tmp/hadoop-yarn/yarn-nm-recovery/nm-aux-services/mapreduce_shuffle/mapreduce_shuffle_state/000005.sst


There is a bug already submitted in Jira for YARN that seems to encompass this error I saw. It also seems to include a workaround:


http://ift.tt/141Wo6x


Fix:


In short, remove or rename the CURRENT file in these 2 paths and then restart YARN (or delete the files, or I think you could even just reboot each affected node since the /tmp folder may be cleared out on reboot):


/tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state


/tmp/hadoop-yarn/yarn-nm-recovery/nm-aux-services/mapreduce_shuffle/mapreduce_shuffle_state